Workflow: Ansible¶
Overview¶
This workflow describes configuring a simple HPC environment, consisting of:
Shared NFS directories for users, data and applications
SLURM queuing system for workload processing and management
Flight Env for managing configurationg and applications available in the environment
Prerequisites¶
This document presumes the following situation:
The cluster has a gateway node (for running various servers)
The cluster has multiple compute nodes (for executing jobs)
DNS is correctly configured to allow hostname connections between the nodes
Firewall connections between the gateway and compute nodes are open to allow various services to communicate (e.g. queuing system, nfs, etc)
SSH keys are correctly configured to allow the gateway to login to nodes (as root)
There is sufficient storage space on the gateway and compute nodes (for applications and data, recommended 16GB+)
Configure Environment¶
Install ansible (>v2.8.0):
$ yum install -y epel-release $ yum install -y ansible
Create hosts file:
$ cat << EOF > /etc/ansible/hosts [gateway] gateway1 [compute] node01 node02 EOF
Setup playbook:
$ yum install -y git $ git clone https://github.com/openflighthpc/openflight-ansible-playbook
Warning
It is highly recommended to inspect all roles and edit them to your requirement or, alternatively, write your own roles. These roles are provided “as is” and no guarantee is made that the roles will function properly in environments different to that of the example environment used in this documentation.
Run playbook:
$ cd openflight-ansible-playbook $ ansible-playbook openflight.yml
Note
The playbook may hang trying to verify the SSH fingerprints of the hosts if none of them have been logged into before from the ansible host. It is recommended to have already established a trusted SSH connection to all systems first.