Workflow: Ansible

Overview

This workflow describes configuring a simple HPC environment, consisting of:

  • Shared NFS directories for users, data and applications

  • SLURM queuing system for workload processing and management

  • Flight Env for managing configurationg and applications available in the environment

Prerequisites

This document presumes the following situation:

  • The cluster has a gateway node (for running various servers)

  • The cluster has multiple compute nodes (for executing jobs)

  • DNS is correctly configured to allow hostname connections between the nodes

  • Firewall connections between the gateway and compute nodes are open to allow various services to communicate (e.g. queuing system, nfs, etc)

  • SSH keys are correctly configured to allow the gateway to login to nodes (as root)

  • There is sufficient storage space on the gateway and compute nodes (for applications and data, recommended 16GB+)

Configure Environment

  • Install ansible (>v2.8.0):

    $ yum install -y epel-release
    $ yum install -y ansible
    
  • Create hosts file:

    $ cat << EOF > /etc/ansible/hosts
    [gateway]
    gateway1
    
    [compute]
    node01
    node02
    EOF
    
  • Setup playbook:

    $ yum install -y git
    $ git clone https://github.com/openflighthpc/openflight-ansible-playbook
    

Warning

It is highly recommended to inspect all roles and edit them to your requirement or, alternatively, write your own roles. These roles are provided “as is” and no guarantee is made that the roles will function properly in environments different to that of the example environment used in this documentation.

  • Run playbook:

    $ cd openflight-ansible-playbook
    $ ansible-playbook openflight.yml
    

Note

The playbook may hang trying to verify the SSH fingerprints of the hosts if none of them have been logged into before from the ansible host. It is recommended to have already established a trusted SSH connection to all systems first.