發表於 0001-01-01 (最後修改於 2024-12-29) | 4 分鐘 | 1568 個字 | androchentw

# GKE Chatbot | K8s Summit 2024 GCP Workshop

Preparation

Run Gemma 2 Powered Chatbot on GKE Autopilot
- 主要使用這個頁面的 command, 僅使用 qwiklab 的資源

CLUSTER_NAME=my-cluster
CLUSTER_LOCATION=us-central1

gcloud container clusters create-auto ${CLUSTER_NAME} \
  --location=${CLUSTER_LOCATION}

# kubeconfig entry generated for my-cluster.
# NAME: my-cluster
# LOCATION: us-central1
# MASTER_VERSION: 1.30.5-gke.1014001
# MASTER_IP: 34.45.80.180
# MACHINE_TYPE: e2-small
# NODE_VERSION: 1.30.5-gke.1014001
# NUM_NODES: 3
# STATUS: RUNNING

kubectl cluster-info
# Kubernetes control plane is running at https://34.45.80.180
# GLBCDefaultBackend is running at https://34.45.80.180/api/v1/namespaces/kube-system/services/default-http-backend:http/proxy
# KubeDNS is running at https://34.45.80.180/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
# Metrics-server is running at https://34.45.80.180/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy

# To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

任務 2. 部署 Ollama

kubectl apply -f ollama.yaml
# Warning: autopilot-default-resources-mutator:Autopilot updated StatefulSet default/ollama: defaulted unspecified 'nvidia.com/gpu' resource for containers [ollama] (see http://g.co/gke/autopilot-defaults).
# statefulset.apps/ollama created
# service/ollama created

kubectl get pods -w
# 等待 ollama-0 Pod 變成 Running 狀態，表示 Ollama 已成功部署。

kubectl get nodes
# NAME                                        STATUS   ROLES    AGE     VERSION
# gk3-my-cluster-nap-1l47y8kj-033ca61f-rgvm   Ready    <none>   2m54s   v1.30.5-gke.1014001
# gk3-my-cluster-nap-nwlb0qz1-ea6bcb2c-b44d   Ready    <none>   2m9s    v1.30.5-gke.1014001

任務 3. 使用 Ollama CLI 與 Gemma 2 互動

# 安裝 Ollama CLI：在 Cloud Shell 中執行以下指令安裝 Ollama CLI：
curl -fsSL https://ollama.com/install.sh | sh

# 將 Ollama 服務端口轉發到本地： 執行以下指令，將 Ollama 服務的 11434 端口轉發到本地 (Cloud Shell) 的 11434 端口：
kubectl port-forward svc/ollama 11434:11434

# 在新分頁中執行 Gemma 2 模型： 在新的終端分頁中，設定 OLLAMA_HOST 環境變數，指定 Ollama API 的地址，然後使用 ollama run 指令執行 Gemma 2 模型：
OLLAMA_HOST=http://localhost:11434 ollama run gemma2

>>> 你好！

# 你好！👋  有什么我可以帮你的吗？😊

任務 4. 部署 Open WebUI，打造更友善的 Chatbot 互動介面

openwebui.yaml

[閱讀全文]

發表於 0001-01-01 (最後修改於 2024-12-29) | 8 分鐘 | 3630 個字 | androchentw

GKE LLM Fine-Tuning | K8s Summit 2024 GCP Workshop

Preparation

Fine-tune Open Source LLMs on GKE | CE093
- ai-on-gke > finetuning-llama-7b-on-l4
- dell-research-harvard/AmericanStories dataset
Learning Objectives
- Prepare your environment with a GKE cluster in standard mode.
- Set up an autoscaling L4 GPU nodepool.
- Run a Kubernetes Job to download Llama 2 7b and fine-tune using L4 GPUs
- Make use of GCS for efficient storage of models.

student_03_216a983cfb25@cloudshell:~ (qwiklabs-asl-02-cd54d6e804de)$ 

gcloud auth list
# Credentialed Accounts

# ACTIVE: *
# ACCOUNT: [email protected]

gcloud config list project
# [core]
# project = qwiklabs-asl-02-cd54d6e804de

# Your active configuration is: [cloudshell-23006]

Step 1: Create the GKE Cluster

export CLUSTER_NAME="ml-gke"
export REGION="us-central1"
export BUCKET_NAME=${GOOGLE_CLOUD_PROJECT}-llama-l4
export SERVICE_ACCOUNT="l4-lab@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com"

gcloud container clusters create $CLUSTER_NAME \
  --enable-image-streaming \
  --addons=HttpLoadBalancing \
  --machine-type=e2-standard-2 \
  --shielded-secure-boot \
  --shielded-integrity-monitoring \
  --region=${REGION} \
  --num-nodes=1 \
  --enable-ip-alias \
  --release-channel=rapid \
  --node-locations=${REGION}-a \
  --workload-pool=${GOOGLE_CLOUD_PROJECT}.svc.id.goog \
  --addons GcsFuseCsiDriver

# Note: The Kubelet readonly port (10255) is now deprecated. Please update your workloads to use the recommended alternatives. See https://cloud.google.com/kubernetes-engine/docs/how-to/disable-kubelet-readonly-port for ways to check usage and for migration instructions.
# Creating cluster ml-gke in us-central1... Cluster is being health-checked (Kubernetes Control Plane is healthy)...done.                           
# Created [https://container.googleapis.com/v1/projects/qwiklabs-asl-02-cd54d6e804de/zones/us-central1/clusters/ml-gke].
# To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-central1/ml-gke?project=qwiklabs-asl-02-cd54d6e804de

Step 2: Validate Cluster Readiness

gcloud container clusters list

NAME: ml-gke
LOCATION: us-central1
MASTER_VERSION: 1.31.1-gke.1146000
MASTER_IP: 34.67.21.231
MACHINE_TYPE: e2-standard-2
NODE_VERSION: 1.31.1-gke.1146000
NUM_NODES: 1
STATUS: RUNNING

Get access to the model

Request access to Meta Llama models by submitting the request access form at https://ai.meta.com/resources/models-and-libraries/llama-downloads/
Accept the model terms.
Login to Hugging Faces and find the llama2-7b-chat-hf model. https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
Request agree to the terms and conditions.
- LLAMA 2 COMMUNITY LICENSE AGREEMENT

Generate an access token

Click Your Profile > Settings > Access Tokens.
Select New Token.
Specify a Name of your choice and a Role of at least Read.
Select Generate a token.
Copy the generated token to your clipboard.

1
2

2024-10-23-k8s-summit-gcp-1
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

Prepare your environment

# Set the default environment variables:
export HF_TOKEN=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

# Create the Node Pool
gcloud container node-pools create gpupool --cluster ml-gke \
  --accelerator type=nvidia-l4,count=8,gpu-driver-version=latest \
  --machine-type g2-standard-96 \
  --ephemeral-storage-local-ssd=count=8 \
  --enable-autoscaling --enable-image-streaming \
  --num-nodes=0 --min-nodes=0 --max-nodes=3 \
  --shielded-secure-boot \
  --shielded-integrity-monitoring \
  --node-locations ${REGION}-a,${REGION}-b --region ${REGION} 

# Note: Machines with GPUs have certain limitations which may affect your workflow. Learn more at https://cloud.google.com/kubernetes-engine/docs/how-to/gpus
# Note: Starting in GKE 1.30.1-gke.115600, if you don't specify a driver version, GKE installs the default GPU driver for your node's GKE version.
# Creating node pool gpupool...done.                                                                                                                
# Created [https://container.googleapis.com/v1/projects/qwiklabs-asl-02-cd54d6e804de/zones/us-central1/clusters/ml-gke/nodePools/gpupool].

# NAME: gpupool
# MACHINE_TYPE: g2-standard-96
# DISK_SIZE_GB: 100
# NODE_VERSION: 1.31.1-gke.1146000

# Create a Kubernetes Secret that contains the Hugging Face token
kubectl create secret generic hf-secret \
  --from-literal=hf_api_token=${HF_TOKEN} \
  --dry-run=client -o yaml | kubectl apply -f -
# secret/hf-secret created

Run a Kubernetes Job to Finetune Llama 2 7b

Finetuning requires a base model and a dataset. For this post, the dell-research-harvard/AmericanStories dataset will be used to fine-tune the Llama 2 7b base model. GCS will be used for storing the base model. GKE with GCSFuse is used to transparently save the fine-tuned model to GCS. This provides a cost efficient way to store and serve the model and only pay for the storage used by the model.

[閱讀全文]

發表於 0001-01-01 (最後修改於 2024-12-29) | 1 分鐘 | 52 個字 | androchentw

Distributed Load Testing Using GKE and Locust

This is the sample code for the Distributed load testing using Google Kubernetes Engine tutorial.

License

This code is Apache 2.0 licensed and more information can be found in LICENSE. For information on licenses for third party software and libraries, refer to the docker-image/licenses directory.

發表於 0001-01-01 (最後修改於 2024-12-29) | 3 分鐘 | 551 個字 | androchentw

Guestbook with Cloud Code

The Guestbook sample demonstrates how to deploy a Kubernetes application with a front end service and a back end service using the Cloud Code IDE extension.

For details on how to use this sample as a template in Cloud Code, read the documentation for Cloud Code for VS Code or IntelliJ.

What’s in this sample

Kubernetes architecture

Kubernetes Architecture Diagram

[閱讀全文]

發表於 0001-01-01 (最後修改於 2024-12-29) | 1 分鐘 | 48 個字 | androchentw

Guestbook

{% if messages %} {% for m in messages %}

{% endfor %} {% else %}

No messages are logged to the guestbook yet.

{% endif %}

# GKE Chatbot | K8s Summit 2024 GCP Workshop

Preparation

任務 2. 部署 Ollama

任務 3. 使用 Ollama CLI 與 Gemma 2 互動

任務 4. 部署 Open WebUI，打造更友善的 Chatbot 互動介面

GKE LLM Fine-Tuning | K8s Summit 2024 GCP Workshop

Preparation

Step 1: Create the GKE Cluster

Step 2: Validate Cluster Readiness

Get access to the model

Generate an access token

Prepare your environment

Run a Kubernetes Job to Finetune Llama 2 7b

Distributed Load Testing Using GKE and Locust

License

Guestbook with Cloud Code

Table of Contents

What’s in this sample

Kubernetes architecture

My Guestbook

{{m.author}}

{{ format_duration(m.date) }}

# GKE Chatbot | K8s Summit 2024 GCP Workshop

Preparation

任務 2. 部署 Ollama

任務 3. 使用 Ollama CLI 與 Gemma 2 互動

任務 4. 部署 Open WebUI，打造更友善的 Chatbot 互動介面

GKE LLM Fine-Tuning | K8s Summit 2024 GCP Workshop

Preparation

Step 1: Create the GKE Cluster

Step 2: Validate Cluster Readiness

Get access to the model

Sign the license consent agreement

Generate an access token

Prepare your environment

Run a Kubernetes Job to Finetune Llama 2 7b

Distributed Load Testing Using GKE and Locust

License

Guestbook with Cloud Code

Table of Contents

What’s in this sample

Kubernetes architecture

My Guestbook

{{m.author}}

{{ format_duration(m.date) }}