Deploy with Terraform

The Research Computing Cluster (RCC) can be deployed with Terraform infrastructure as code or using the Google Cloud Marketplace or by using. Three different operating systems are available for the RCC and all are available on the Google Cloud Marketplace :

All of the solutions have the same configurations available when deploying with Terraform. This guide will walk you through configuring a RCC deployment using the rcc-tf module by deploying examples on the Research Computing Cluster repository.

Tutorial

Getting Started

First, decide which RCC solution you want to deploy. The RCC-CentOS, RCC-Debian, RCC-Ubuntu, and RCC-Rocky solutions differ primarily in the operating system. Each of these solutions comes with a suite of compilers (GCC, Intel OneAPI Compilers, and AOMP), OpenMPI, and Singularity.

Additionally, Fluid Numerics offers application specific solutions, including RCC-WRF and RCC-CFD. Currently, RCC-WRF comes with the Weather Research and Forecasting Model (WRF), built with the Intel OneAPI Compilers, OpenMPI, and the cascadelake (c2) target architecture. The RCC-CFD solution comes with OpenFOAM, Paraview, and Gmsh in a set of target architecture optimized images for zen3 (c2d) and cascadelake (c2) architectures.

Be aware of the following limitations :

  • Ubuntu, Debian, and Rocky Linux clusters do not have Lustre client installed. If you plan on using a Lustre file system, you will need to use the CentOS solution.

  • The Debian and Rocky Linux cluster does not support ROCm, since AMD only support ROCm on Ubuntu and CentOS clusters.

To learn about pricing and licensing and features of the RCC solutions, you can start by heading to one of the following marketplace pages for the RCC

We recommend that you log into Google Cloud Shell, since Cloud Shell provides necessary authentication and command line tools, including Terraform, git, and the gcloud SDK. If you plan to use your own system, you will need to install and initialize the gcloud SDK and Terraform.

Next, clone the research-computing-cluster repository

The research-computing-cluster repository provides example deployments for each supported operating system under the tf/ subdirectory.

Create the terraform plan

Once you have chosen which operating system you want to use, navigate to the appropriate directory under tf, e.g. :

cd ~/research-computing-cluster/tf/rcc-centos

Each example comes with a Makefile system that allows you to customize your deployment and to create a tfvars file to help you get started quickly.

Set the following environment variables

  • RCC_NAME - The name of your cluster. This name is used to prefix the names of resources in your cluster. For example, if RCC_NAME="rcc", your controller and login node will be named rcc-controller and rcc-login-1 respectively.

  • RCC_PROJECT - This is the your Google Cloud project ID. You can obtain your project ID by running gcloud config get-value project.

  • RCC_ZONE - The Google Cloud zone where you want to deploy your cluster. Keep in mind that compute partitions can be placed in multiple zones during or after deployment; this will be covered in the next section of this tutorial.

  • RCC_MACHINE_TYPE - The machine type to use for your first compute partition. In the next section of this tutorial, we’ll cover how to add more partitions before deployment. For the RCC-CFD solution, you do not need to set the RCC_MACHINE_TYPE variable.

  • RCC_MAX_NODE - The maximum number of nodes to support in the first compute partition.

In the example below, we’ve configured a cluster named rcc to be deployed in us-west1-b with 10x c2-standard-8 compute nodes in the first partition.

export RCC_NAME="rcc"
export RCC_PROJECT="YOUR_GOOGLE_PROJECT_ID"
export RCC_ZONE="us-west1-b"
export RCC_MACHINE_TYPE="c2-standard-8"
export RCC_MAX_NODE=10

Once you’ve set the environment variables, you can create the basic.tfvars file and generate a terraform plan.

make plan

In addition to creating the basic.tfvars file, this step creates terraform.tfplan which lists the resources that will be created when you are ready.

(Optional) Customize your deployment

The basic plan that is created in the previous step creates a cluster with the following configuration

  • Controller - n1-standard-4 machine with 250 GB PD-Standard disk

  • Login - n1-standard-4 machine with 100 GB PD-Standard disk

  • Compute - Single compute partition (no GPUs) using the machine type and maximum node count requested.

If this is sufficient for your needs, you can move onto the next step. If you need to customize the deployment, open basic.tfvars in a text editor and customize the deployment values to suit your needs.

Cutomize Partitions

You can modify the partitions object in basic.tfvars to add multiple parititons, configure multi-region deployments, or add GPUs to compute nodes. We recommend duplicating the first partition as a template (lines 40-62 of basic.tfvars) to give you a good starting point to adding other partitions.

Add Filestore NFS

The rcc-tf module comes with an easy to use configuration to create and attach a Filestore instance to your cluster. To add a Filestore instance to your cluster, set create_filestore = true and configure the filestore object to meet your needs.

create_filestore = true
filestore = { name = "filestore"
              zone = null
              tier = "PREMIUM"
              capacity_gb = 2048
              fs_name = "nfs"
              network = null
            }

The example above creates a premium tier filestore instance with 2 TB capacity. Setting zone=null and network=null allows the rcc-tf module to set the zone and network to match those used for your controller and login node instances.

The mount point for Filestore on your cluster is automatically set to /mnt/filestore.

Add Lustre File System

The rcc-tf module comes with an easy to use configuration to create and attach a Lustre file system to your cluster. To add a Filestore instance to your cluster, set create_lustre = true and configure the lustre object to meet your needs.

We recommend that you use the provided settings for Lustre and increase the oss_node_count to increase file system capacity and performance.