Quick start

Setup a node

This quickstart guide will help you set up a small three-node cluster on LambdaLabs.

# Depot login requests cloud endpoint and login through endpoint login page
depot login
 
# Create lambda.cluster.yaml file
# Note you need to replace the cluster secter token, and the nodes
# ip and types.
cat << 'EOF' > lambda.cluster.yaml
name: lambda-kubernetes
implementation: git-ai.clouds.lambda_kubernetes
cluster_secret:
  name: lambda-token
  value: ${LAMBDA_TOKEN}
node-types:
  - name: h100-8x
    vram: 80
    gpu-count: 8
  - name: h100-1x
    vram: 80
    gpu-count: 1
 
values:
	nodes:
	  - ip: ${NODE_1_IP}
      type: ${NODE_1_TYPE}
	  - ip: ${NODE_2_IP}
      type: ${NODE_2_TYPE}
    - ip: ${NODE_3_IP}
      type: ${NODE_3_TYPE}
EOF
 
# Set up the cluster
depot setup lambda.cluster.yaml

Fine-tunning meditron

This quickstart guide will help finetune meditron using our models as a starting point.

# Clone the repository
git clone git@codedepot.ai/epfllm/megatron-llm
cd megatron-llm
# Link the code repository to the model and data repositories
git ai input-repo add meditron git@codedepot.ai/epfllm/meditron.git
git ai input-repo add starcoder-data git@codedepot.ai/epfllm/starcoder-data.git
 
# Create a .depot.yaml file to run the command on the Lambda cluster
cat << 'EOF' > .depot.yaml
name: exp-003
deps: ["/bin/bash", "deps.sh"]
torchrun:
  command: ["megatron/experiment.py"]
volumes:
  - input-repo:
      name: datadaset
    shared: true
  - mounted-volume:
      name: results
      path: /results
    shared: false
gpuSpec:
  minGpusPerNode: 4
  minVramPerGPU: 40
  totalGpuCount: 8
custerSpec:
  cluster: lambda-kubernetes
EOF
git add .depot.yaml
git commit -m "Adding .depot.yaml file"
git push
 
# Start job on the cluster
depot run