Manage Network Attached Storage

The RCC Cluster can integrate with Lustre and NFS file systems. You can add and remove network attached storage on-the-fly. This documentation will walk you through how to use the cluster-services command line interface to manage network attached storage.

Managing File systems with Terraform

The RCC Terraform deployment module allows you to deploy a cluster with Filestore and/or Lustre. If you have not provisioned your cluster yet, or you are ok with re-deploying your current cluster, you can provision network attached storage resources upon deployment.

Mounting a new file system on an existing cluster

Initialize cluster-services

If this is your first time running the cluster-services command on your RCC cluster, you need to first initialize this utility.

To initialize cluster-services,

  1. Log on to your cluster’s controller instance

  2. Go root by running :code: sudo su

  3. Run :code: cluster-services init. On Debian and Ubuntu RCC systems, you will need to reference the full path to cluster-services, e.g. /apps/cls/bin/cluster-services init

After initializing cluster-services, you can log out of the controller.

Add a new mount

The workflow for using cluster services is as follows :

  1. Create a cluster configuration file

  2. Edit the cluster configuration file

  3. Preview changes to your system

  4. Apply changes to your system

To use the cluster-services CLI, you will need to be a root user. For the steps given in this section, make sure that you are logged into your cluster’s login node.

Create a cluster-configuration file

To create a cluster configuration file, you will use the cluster-services list all command.

$ sudo su
$ cluster-services list all > config.yaml

On Debian and Ubuntu RCC systems, you will need to reference the full path to cluster-services, e.g. /apps/cls/bin/cluster-services init

Edit the cluster configuration file

Now that you have a cluster configuration file, you will edit the file to modify the network_storage. The network_storage dictionary key sets what file systems will be mounted to your login node and to compute nodes when they are created. Open config.yaml in a text editor and search for the network_storage key. Keep in mind that there is a cluster-wide network_storage and a partitions.machines.network_storage for each machine block; you want to work with the cluster-wide network_storage.

Edit the network_storage to set the following information

  • network_storage.fs_type - The file system type. It can be set to one of nfs or lustre

  • network_storge.local_mount - The path on your cluster where you want the file system to mount.

  • network_storge.mount_options - The mount options for the file system. See mount documentation for more details.

  • network_storge.remote_mount - The path on the remote file system that is exported for mounting.

  • network_storge.server_ip - The resolvable IP address or hostname for the file server. For Lustre, this is the IP address of the Lustre MDS server.

An example network_storage definition is given below.

network_storage:
- fs_type: nfs
  local_mount: /mnt/nas
  mount_options: rw,hard,intr
  remote_mount: /mnt/nas
  server_ip: 10.1.0.12

Save your changes to config.yaml and exit the text editor.

Preview changes to your system

Before making changes to your system, we recommend previewing and reviewing the planned changes to your cluster. To update mounted file systems on your cluster, you will use the cluster-services update mounts command and provide the modified cluster configuration file as the source for the update.

$ cluster-services update mounts --config=config.yaml --preview
  + network_storage[0] = {'fs_type': 'nfs', 'local_mount': '/mnt/nas', 'mount_options': 'rw,hard,intr', 'remote_mount': '/mnt/nas', 'server_ip': '10.1.0.12'}

Verify that the settings you have provided are as you intended before applying any changes.

Apply changes to your system

Once you have confirmed the settings for your network storage, you can apply the changes.

$ cluster-services update mounts --config=config.yaml
  + network_storage[0] = {'fs_type': 'nfs', 'local_mount': '/mnt/nas', 'mount_options': 'rw,hard,intr', 'remote_mount': '/mnt/nas', 'server_ip': '10.1.0.12'}

To verify that the network storage has been mounted as expected, you can run df -h to view all mounted file systems on your cluster’s login node. Similarly, we recommend submitting a Slurm job step to verify the network storage mounts to your compute nodes as well.

$ srun -n1 df -h
Filesystem                            Size  Used Avail Use% Mounted on
udev                                  3.7G     0  3.7G   0% /dev
tmpfs                                 748M  8.4M  740M   2% /run
/dev/sda1                              99G   32G   63G  34% /
tmpfs                                 3.7G     0  3.7G   0% /dev/shm
tmpfs                                 5.0M     0  5.0M   0% /run/lock
tmpfs                                 3.7G     0  3.7G   0% /sys/fs/cgroup
/dev/sda15                            124M  5.7M  119M   5% /boot/efi
demo-controller:/etc/munge             99G   32G   63G  34% /etc/munge
demo-controller:/home                  99G   32G   63G  34% /home
demo-controller:/apps                  99G   32G   63G  34% /apps
demo-controller:/usr/local/etc/slurm   99G   32G   63G  34% /usr/local/etc/slurm
10.1.0.12:/mnt/nas                    500G    3G  497G   1% /mnt/nas
tmpfs                                 748M     0  748M   0% /run/user/1001