GKE LLM Fine-Tuning | K8s Summit 2024 GCP Workshop
Preparation
- Fine-tune Open Source LLMs on GKE | CE093
- Learning Objectives
- Prepare your environment with a GKE cluster in standard mode.
- Set up an autoscaling L4 GPU nodepool.
- Run a Kubernetes Job to download Llama 2 7b and fine-tune using L4 GPUs
- Make use of GCS for efficient storage of models.
|
|
Step 1: Create the GKE Cluster
|
|
Step 2: Validate Cluster Readiness
Get access to the model
Sign the license consent agreement
- Request access to Meta Llama models by submitting the request access form at https://ai.meta.com/resources/models-and-libraries/llama-downloads/
- Accept the model terms.
- Login to Hugging Faces and find the llama2-7b-chat-hf model. https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
- Request agree to the terms and conditions.
- LLAMA 2 COMMUNITY LICENSE AGREEMENT
Generate an access token
- Click Your Profile > Settings > Access Tokens.
- Select New Token.
- Specify a Name of your choice and a Role of at least Read.
- Select Generate a token.
- Copy the generated token to your clipboard.
Prepare your environment
|
|
Run a Kubernetes Job to Finetune Llama 2 7b
Finetuning requires a base model and a dataset. For this post, the dell-research-harvard/AmericanStories
dataset will be used to fine-tune the Llama 2 7b base model. GCS will be used for storing the base model. GKE with GCSFuse is used to transparently save the fine-tuned model to GCS. This provides a cost efficient way to store and serve the model and only pay for the storage used by the model.