RCC Architecture

Overview

The Research Computing Cloud (RCC) Slurm Cluster is designed to replicate traditional on-premise HPC resources. It consists of a Slurm controller node, login nodes, compute nodes, and networking resources. Optionally, you can add NFS and Lustre file systems to the cluster to increase your storage capacity and file IO performance.

Architecture Components

Controller

The Controller instance hosts the Slurm controller daemon and, by default, the Slurm database and database daemon, and /home, /apps, /etc/munge, and /usr/local/slurm directories over NFS. In addition to hosting these resources, the controller is responsible for creating and deleting compute nodes to match workload demands.

Login

Login nodes are the primary access point to the cluster for developers and researchers (“users”). Since the login nodes are shared resources amongst users, the login nodes are intended to be used for lightweight activities, including text editing, code compiling, and job submission. Large data transfers, software builds, and compute intensive workloads should be scheduled to run on compute nodes.

For Continuous Integration / Continuous Benchmarking (CI/CB) deployments, login nodes are typically not needed. See Fluid-Run Documentation for more details.

Compute

Compute nodes are used to execute scheduled workloads. Compute nodes are defined by specifying compute partitions that consists of groups of machine blocks. Each machine block is a homogeneous group of Google Compute Engine instances defined by the following attributes

  • name - The prefix for all instances in this machine block.

  • machine_type - Google Compute Engine machine type

  • max_node_count - The number of compute instances in this machine block

  • zone - The Google Cloud zone to deploy machines to in this machine block. If regional_capacity=True, instances are deployed to any zone within the corresponding region.

  • image - The VM image to use for machines in this block. By default, this is set to the image used by the controller and login nodes. Using custom images is often used to deploy specific applications to the cluster. See RCC-Apps for details on creating and deploying custom VM images to the RCC.

  • image_hyperthreads - Boolean flag to indicate if hyperthreading is enabled (True) or not (False).

  • compute_disk_type - The boot disk type.

  • compute_disk_size_gb - The size of the boot disk in GB.

  • compute_labels - Any labels to apply to compute nodes when deployed.

  • cpu_platform - The minimum CPU platform to request for compute nodes.

  • gpu_type - The type of GPU to attach to compute nodes. GPUs are only available in select zones.

  • gpu_count - The number of GPUs to attach to each instance.

  • gvnic - Boolean to enable (True) or disable (False) Google Virtual NIC. GVNIC is used to increase peak network bandwidth.

  • preemptible_bursting - Boolean to enable preemptible instances. Jobs should be capable of recovering of preemption. The RCC comes with Distributed Multithreaded Checkpointing (DMTCP) <https://docs.nersc.gov/development/checkpoint-restart/dmtcp/>_ to support application recovery (even for MPI applications)

  • vpc_subnet - The VPC subnetwork to deploy compute nodes to. If not specified, the subnetwork used to host the controller and login nodes is used.

  • exclusive - Boolean to set job scheduling to exclusive (one job per node, True).

  • enable_placement - Boolean to enable placement policy for compute node scheduling.

  • regional_capacity - Boolean to enable a spread placement policy. When set to False, a compace placement policy <https://cloud.google.com/compute/docs/instances/define-instance-placement#compact is used. enable_placement must also be set to true.

  • regional_policy - A previously created regional placement policy.

  • static_node_count - The number of static nodes in this machine block.