# GKE Chatbot | K8s Summit 2024 GCP Workshop

Preparation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CLUSTER_NAME=my-cluster
CLUSTER_LOCATION=us-central1

gcloud container clusters create-auto ${CLUSTER_NAME} \
  --location=${CLUSTER_LOCATION}

# kubeconfig entry generated for my-cluster.
# NAME: my-cluster
# LOCATION: us-central1
# MASTER_VERSION: 1.30.5-gke.1014001
# MASTER_IP: 34.45.80.180
# MACHINE_TYPE: e2-small
# NODE_VERSION: 1.30.5-gke.1014001
# NUM_NODES: 3
# STATUS: RUNNING

kubectl cluster-info
# Kubernetes control plane is running at https://34.45.80.180
# GLBCDefaultBackend is running at https://34.45.80.180/api/v1/namespaces/kube-system/services/default-http-backend:http/proxy
# KubeDNS is running at https://34.45.80.180/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
# Metrics-server is running at https://34.45.80.180/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy

# To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

任務 2. 部署 Ollama

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
kubectl apply -f ollama.yaml
# Warning: autopilot-default-resources-mutator:Autopilot updated StatefulSet default/ollama: defaulted unspecified 'nvidia.com/gpu' resource for containers [ollama] (see http://g.co/gke/autopilot-defaults).
# statefulset.apps/ollama created
# service/ollama created

kubectl get pods -w
# 等待 ollama-0 Pod 變成 Running 狀態,表示 Ollama 已成功部署。

kubectl get nodes
# NAME                                        STATUS   ROLES    AGE     VERSION
# gk3-my-cluster-nap-1l47y8kj-033ca61f-rgvm   Ready    <none>   2m54s   v1.30.5-gke.1014001
# gk3-my-cluster-nap-nwlb0qz1-ea6bcb2c-b44d   Ready    <none>   2m9s    v1.30.5-gke.1014001

任務 3. 使用 Ollama CLI 與 Gemma 2 互動

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 安裝 Ollama CLI:在 Cloud Shell 中執行以下指令安裝 Ollama CLI:
curl -fsSL https://ollama.com/install.sh | sh

# 將 Ollama 服務端口轉發到本地: 執行以下指令,將 Ollama 服務的 11434 端口轉發到本地 (Cloud Shell) 的 11434 端口:
kubectl port-forward svc/ollama 11434:11434

# 在新分頁中執行 Gemma 2 模型: 在新的終端分頁中,設定 OLLAMA_HOST 環境變數,指定 Ollama API 的地址,然後使用 ollama run 指令執行 Gemma 2 模型:
OLLAMA_HOST=http://localhost:11434 ollama run gemma2

>>> 你好!

# 你好!👋  有什么我可以帮你的吗?😊

任務 4. 部署 Open WebUI,打造更友善的 Chatbot 互動介面

openwebui.yaml

[閱讀全文]

GKE LLM Fine-Tuning | K8s Summit 2024 GCP Workshop

Preparation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
student_03_216a983cfb25@cloudshell:~ (qwiklabs-asl-02-cd54d6e804de)$ 

gcloud auth list
# Credentialed Accounts

# ACTIVE: *
# ACCOUNT: [email protected]

gcloud config list project
# [core]
# project = qwiklabs-asl-02-cd54d6e804de

# Your active configuration is: [cloudshell-23006]

Step 1: Create the GKE Cluster

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
export CLUSTER_NAME="ml-gke"
export REGION="us-central1"
export BUCKET_NAME=${GOOGLE_CLOUD_PROJECT}-llama-l4
export SERVICE_ACCOUNT="l4-lab@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com"

gcloud container clusters create $CLUSTER_NAME \
  --enable-image-streaming \
  --addons=HttpLoadBalancing \
  --machine-type=e2-standard-2 \
  --shielded-secure-boot \
  --shielded-integrity-monitoring \
  --region=${REGION} \
  --num-nodes=1 \
  --enable-ip-alias \
  --release-channel=rapid \
  --node-locations=${REGION}-a \
  --workload-pool=${GOOGLE_CLOUD_PROJECT}.svc.id.goog \
  --addons GcsFuseCsiDriver

# Note: The Kubelet readonly port (10255) is now deprecated. Please update your workloads to use the recommended alternatives. See https://cloud.google.com/kubernetes-engine/docs/how-to/disable-kubelet-readonly-port for ways to check usage and for migration instructions.
# Creating cluster ml-gke in us-central1... Cluster is being health-checked (Kubernetes Control Plane is healthy)...done.                           
# Created [https://container.googleapis.com/v1/projects/qwiklabs-asl-02-cd54d6e804de/zones/us-central1/clusters/ml-gke].
# To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-central1/ml-gke?project=qwiklabs-asl-02-cd54d6e804de

Step 2: Validate Cluster Readiness

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
gcloud container clusters list

NAME: ml-gke
LOCATION: us-central1
MASTER_VERSION: 1.31.1-gke.1146000
MASTER_IP: 34.67.21.231
MACHINE_TYPE: e2-standard-2
NODE_VERSION: 1.31.1-gke.1146000
NUM_NODES: 1
STATUS: RUNNING

Get access to the model

Generate an access token

  • Click Your Profile > Settings > Access Tokens.
  • Select New Token.
  • Specify a Name of your choice and a Role of at least Read.
  • Select Generate a token.
  • Copy the generated token to your clipboard.
1
2
2024-10-23-k8s-summit-gcp-1
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

Prepare your environment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Set the default environment variables:
export HF_TOKEN=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

# Create the Node Pool
gcloud container node-pools create gpupool --cluster ml-gke \
  --accelerator type=nvidia-l4,count=8,gpu-driver-version=latest \
  --machine-type g2-standard-96 \
  --ephemeral-storage-local-ssd=count=8 \
  --enable-autoscaling --enable-image-streaming \
  --num-nodes=0 --min-nodes=0 --max-nodes=3 \
  --shielded-secure-boot \
  --shielded-integrity-monitoring \
  --node-locations ${REGION}-a,${REGION}-b --region ${REGION} 

# Note: Machines with GPUs have certain limitations which may affect your workflow. Learn more at https://cloud.google.com/kubernetes-engine/docs/how-to/gpus
# Note: Starting in GKE 1.30.1-gke.115600, if you don't specify a driver version, GKE installs the default GPU driver for your node's GKE version.
# Creating node pool gpupool...done.                                                                                                                
# Created [https://container.googleapis.com/v1/projects/qwiklabs-asl-02-cd54d6e804de/zones/us-central1/clusters/ml-gke/nodePools/gpupool].

# NAME: gpupool
# MACHINE_TYPE: g2-standard-96
# DISK_SIZE_GB: 100
# NODE_VERSION: 1.31.1-gke.1146000

# Create a Kubernetes Secret that contains the Hugging Face token
kubectl create secret generic hf-secret \
  --from-literal=hf_api_token=${HF_TOKEN} \
  --dry-run=client -o yaml | kubectl apply -f -
# secret/hf-secret created

Run a Kubernetes Job to Finetune Llama 2 7b

Finetuning requires a base model and a dataset. For this post, the dell-research-harvard/AmericanStories dataset will be used to fine-tune the Llama 2 7b base model. GCS will be used for storing the base model. GKE with GCSFuse is used to transparently save the fine-tuned model to GCS. This provides a cost efficient way to store and serve the model and only pay for the storage used by the model.

[閱讀全文]

Distributed Load Testing Using GKE and Locust

This is the sample code for the Distributed load testing using Google Kubernetes Engine tutorial.

License

This code is Apache 2.0 licensed and more information can be found in LICENSE. For information on licenses for third party software and libraries, refer to the docker-image/licenses directory.

Guestbook with Cloud Code

The Guestbook sample demonstrates how to deploy a Kubernetes application with a front end service and a back end service using the Cloud Code IDE extension.

For details on how to use this sample as a template in Cloud Code, read the documentation for Cloud Code for VS Code or IntelliJ.

Table of Contents


What’s in this sample

Kubernetes architecture

Kubernetes Architecture Diagram

[閱讀全文]

Guestbook
Your Name
Message
{% if messages %} {% for m in messages %}
{{m.author}}
{{ format_duration(m.date) }}

{{m.message}}

{% endfor %} {% else %}{% endif %}