Quick start
Setup a cluster
This quickstart guide will help you set up a small three-node cluster on Lambda.
- Start by logging in to
ai
.
ai login
This command will log you into AI, allowing you to create a cluster and run a job.
- Create a provider in
ai
, with thelambda-instances
provider type.
# Create a lambda.provider.yaml file
# Currently we support only lambda-instances as a provider_type.
cat << 'EOF' > lambda.provider.yaml
name: lambda-cluster
provider_type: lambda-instances
credentials: lambda.creds.json
EOF
# Create a lambda-creds.json file
cat << 'EOF' > lambda.creds.json
{
"token": "[TOKEN_HERE]"
}
EOF
ai provider create lambda.provider.yaml
In the code above, replace [TOKEN_HERE]
with your token from Lambda.
This command will configure your Lambda provider in the ai
server.
- Create a Lambda cluster, using the provider you just created.
# Create lambda.cluster.yaml file
# Note you need to replace the node_credentials with the file path to the key you use to access the nodes,
# and replace the XXX.XXX.XXX.XXX with the nodes' IPs
cat << 'EOF' > lambda.cluster.yaml
name: lambda-cluster
provider: lambda
nodes:
- XXX.XXX.XXX.XXX
- XXX.XXX.XXX.XXX
node_credentials: key.pem
EOF
# Set up the cluster
ai cluster create lambda.cluster.yaml
In the command above, replace the nodes' IPs with the IPs of your actual nodes and key.pem
with the SSH key file you use to access these nodes.
This command will create the cluster and set up the nodes to run jobs on them. The cluster might take a while to become reachable.
- Wait until the Lambda cluster is reachable.
ai cluster list
This command will list all your clusters' statuses, listing the new cluster as REACHABLE
when it is ready to run jobs.
Run a job
-
Create an empty repository at
codedepot.ai
. -
Clone a prepared Git AI repository from
codedepot.ai
.
git clone git@codedepot.ai:demo-cpu-cluster/demo-mnist.git
cd demo-mnist
git remote remove origin
git remote add origin [link-to-your-repository]
git push -u origin main
The command above will fork a demo repository and copy it to your account so that you can run jobs on it. Replace [link-to-your-repository]
- Run a job on the recently created
lambda-cluster
.
ai job run lambda-cluster
The command will return the name of the job run.
- Fetch the job logs
The ai
server records all the stdout
and stderr
of your job. You can retrieve it by running the following command from the repository folder.
ai job log [job-name]
This command will print the log generated by the file. Replace [job-name]
with the job name returned by ai job run
- Check the job status
You can also check the job status with the following command:
ai job list
This command will return a list of jobs with their statuses.
View Job Metadata in the Web App
You can always access the repository you created on the web app and observe all the experiments you have run with the ai
command. You can also view the logs and metadata generated by this command.