GKE LLM Fine-Tuning | K8s Summit 2024 GCP Workshop

Preparation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
student_03_216a983cfb25@cloudshell:~ (qwiklabs-asl-02-cd54d6e804de)$ 

gcloud auth list
# Credentialed Accounts

# ACTIVE: *
# ACCOUNT: [email protected]

gcloud config list project
# [core]
# project = qwiklabs-asl-02-cd54d6e804de

# Your active configuration is: [cloudshell-23006]

Step 1: Create the GKE Cluster

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
export CLUSTER_NAME="ml-gke"
export REGION="us-central1"
export BUCKET_NAME=${GOOGLE_CLOUD_PROJECT}-llama-l4
export SERVICE_ACCOUNT="l4-lab@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com"

gcloud container clusters create $CLUSTER_NAME \
  --enable-image-streaming \
  --addons=HttpLoadBalancing \
  --machine-type=e2-standard-2 \
  --shielded-secure-boot \
  --shielded-integrity-monitoring \
  --region=${REGION} \
  --num-nodes=1 \
  --enable-ip-alias \
  --release-channel=rapid \
  --node-locations=${REGION}-a \
  --workload-pool=${GOOGLE_CLOUD_PROJECT}.svc.id.goog \
  --addons GcsFuseCsiDriver

# Note: The Kubelet readonly port (10255) is now deprecated. Please update your workloads to use the recommended alternatives. See https://cloud.google.com/kubernetes-engine/docs/how-to/disable-kubelet-readonly-port for ways to check usage and for migration instructions.
# Creating cluster ml-gke in us-central1... Cluster is being health-checked (Kubernetes Control Plane is healthy)...done.                           
# Created [https://container.googleapis.com/v1/projects/qwiklabs-asl-02-cd54d6e804de/zones/us-central1/clusters/ml-gke].
# To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-central1/ml-gke?project=qwiklabs-asl-02-cd54d6e804de

Step 2: Validate Cluster Readiness

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
gcloud container clusters list

NAME: ml-gke
LOCATION: us-central1
MASTER_VERSION: 1.31.1-gke.1146000
MASTER_IP: 34.67.21.231
MACHINE_TYPE: e2-standard-2
NODE_VERSION: 1.31.1-gke.1146000
NUM_NODES: 1
STATUS: RUNNING

Get access to the model

Generate an access token

  • Click Your Profile > Settings > Access Tokens.
  • Select New Token.
  • Specify a Name of your choice and a Role of at least Read.
  • Select Generate a token.
  • Copy the generated token to your clipboard.
1
2
2024-10-23-k8s-summit-gcp-1
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

Prepare your environment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Set the default environment variables:
export HF_TOKEN=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

# Create the Node Pool
gcloud container node-pools create gpupool --cluster ml-gke \
  --accelerator type=nvidia-l4,count=8,gpu-driver-version=latest \
  --machine-type g2-standard-96 \
  --ephemeral-storage-local-ssd=count=8 \
  --enable-autoscaling --enable-image-streaming \
  --num-nodes=0 --min-nodes=0 --max-nodes=3 \
  --shielded-secure-boot \
  --shielded-integrity-monitoring \
  --node-locations ${REGION}-a,${REGION}-b --region ${REGION} 

# Note: Machines with GPUs have certain limitations which may affect your workflow. Learn more at https://cloud.google.com/kubernetes-engine/docs/how-to/gpus
# Note: Starting in GKE 1.30.1-gke.115600, if you don't specify a driver version, GKE installs the default GPU driver for your node's GKE version.
# Creating node pool gpupool...done.                                                                                                                
# Created [https://container.googleapis.com/v1/projects/qwiklabs-asl-02-cd54d6e804de/zones/us-central1/clusters/ml-gke/nodePools/gpupool].

# NAME: gpupool
# MACHINE_TYPE: g2-standard-96
# DISK_SIZE_GB: 100
# NODE_VERSION: 1.31.1-gke.1146000

# Create a Kubernetes Secret that contains the Hugging Face token
kubectl create secret generic hf-secret \
  --from-literal=hf_api_token=${HF_TOKEN} \
  --dry-run=client -o yaml | kubectl apply -f -
# secret/hf-secret created

Run a Kubernetes Job to Finetune Llama 2 7b

Finetuning requires a base model and a dataset. For this post, the dell-research-harvard/AmericanStories dataset will be used to fine-tune the Llama 2 7b base model. GCS will be used for storing the base model. GKE with GCSFuse is used to transparently save the fine-tuned model to GCS. This provides a cost efficient way to store and serve the model and only pay for the storage used by the model.

[閱讀全文]

Distributed Load Testing Using GKE and Locust

This is the sample code for the Distributed load testing using Google Kubernetes Engine tutorial.

License

This code is Apache 2.0 licensed and more information can be found in LICENSE. For information on licenses for third party software and libraries, refer to the docker-image/licenses directory.

Guestbook with Cloud Code

The Guestbook sample demonstrates how to deploy a Kubernetes application with a front end service and a back end service using the Cloud Code IDE extension.

For details on how to use this sample as a template in Cloud Code, read the documentation for Cloud Code for VS Code or IntelliJ.

Table of Contents


What’s in this sample

Kubernetes architecture

Kubernetes Architecture Diagram

[閱讀全文]

Guestbook
Your Name
Message
{% if messages %} {% for m in messages %}
{{m.author}}
{{ format_duration(m.date) }}

{{m.message}}

{% endfor %} {% else %}{% endif %}