Quick start

Setup a cluster

This quickstart guide will help you set up a small three-node cluster on Lambda.

  1. Start by logging in to ai.
ai login

This command will log you into AI, allowing you to create a cluster and run a job.

  1. Create a provider in ai, with the lambda-instances provider type.
# Create a lambda.provider.yaml file
# Currently we support only lambda-instances as a provider_type.
cat << 'EOF' > lambda.cluster.yaml
name: lambda-cluster
provider_type: lambda-instances
credentials: lambda.creds.json
EOF
 
# Create a lambda-creds.json file
cat << 'EOF' > lambda.creds.json
{
    "token": "[TOKEN_HERE]"
}
EOF
 
ai provider create lambda.provider.yaml

In the code above, replace [TOKEN_HERE] with your token from Lambda.

This command will configure your Lambda provider in the ai server.

  1. Create a Lambda cluster, using the provider you just created.
 
# Create lambda.cluster.yaml file
# Note you need to replace the key_file with the file path to the key you use to access the nodes,
# and replace the XXX.XXX.XXX.XXX with the nodes' IPs
cat << 'EOF' > lambda.cluster.yaml
name: lambda-cluster
provider: lambda
nodes:
  - XXX.XXX.XXX.XXX
  - XXX.XXX.XXX.XXX
node_credentials: key.pem
EOF
 
# Set up the cluster
ai cluster create lambda.cluster.yaml

In the command above, replace the nodes' IPs with the IPs of your actual nodes and key.pem with the SSH key file you use to access these nodes.

This command will create the cluster and set up the nodes to run jobs on them. The cluster might take a while to become reachable.

  1. Wait until the Lambda cluster is reachable.
ai cluster list

This command will list all your clusters' statuses, listing the new cluster as REACHABLE when it is ready to run jobs.

Run a job

  1. Create an empty repository at codedepot.ai.

  2. Clone a prepared Git AI repository from codedepot.ai.

git clone git@codedepot.ai:demo-cpu-cluster/demo-mnist.git
cd demo-mnist
git remote remove origin
git remote add origin [link-to-your-repository]
git push -u origin main

The command above will fork a demo repository and copy it to your account so that you can run jobs on it. Replace [link-to-your-repository]

  1. Run a job on the recently created lambda-cluster.
ai job run lambda-cluster

The command will return the name of the job run.

  1. Fetch the job logs

The ai server records all the stdout and stderr of your job. You can retrieve it by running the following command from the repository folder.

ai job log [job-name]

This command will print the log generated by the file. Replace [job-name] with the job name returned by ai job run

  1. Check the job status

You can also check the job status with the following command:

ai job list

This command will return a list of jobs with their statuses.

View Job Metadata in the Web App

You can always access the repository you created on the web app and observe all the experiments you have run with the ai command. You can also view the logs and metadata generated by this command.