在Kubernetes上部署DeepSeek-R1进行高效AI推理

手机
2025-09-05 22:09:02

在本篇文章中，我们将介绍如何使用亚马逊云科技的Kubernetes服务Amazon EKS Auto Mode，在亚马逊云科技上部署DeepSeek模型。Amazon EKS Auto Mode提供了更强的灵活性和可扩展性，同时无需管理Kubernetes控制节点、计算、存储和网络组件，大大简化了部署流程。

为什么要在Amazon EKS上部署DeepSeek？开源与开放性：DeepSeek采用开源方式，让更多企业和开发人员可以访问和使用其先进的语言模型，推动AI发展。增强推理能力：DeepSeek-R1采用Chain of Thought (CoT) 推理，使模型可以将复杂问题拆解为多个可管理的步骤，提升其解决数学问题、逻辑推理等任务的能力。简化Amazon EKS上的托管：通过Amazon EKS Auto Mode托管DeepSeek，无需管理底层Kubernetes基础设施，让大家专注于部署和使用模型。在Amazon EKS Auto Mode上部署DeepSeek-R1

在本教程中，我们将使用DeepSeek-R1-Distill-Llama-8B模型，这是一个DeepSeek的轻量级蒸馏版本，相比完整的DeepSeek-R1（671B参数）模型，它占用更少的计算资源（如GPU），但性能稍逊于完整版本。如果大家希望部署完整的DeepSeek-R1模型，可以在vLLM配置中替换蒸馏版模型。

安装前置所需依赖

本教程使用亚马逊云科技CloudShell来简化安装过程。

# Installing kubectl curl -LO " dl.k8s.io/release/$(curl -L -s dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl # Install Terraform sudo yum install -y yum-utils sudo yum-config-manager --add-repo rpm.releases.hashicorp /AmazonLinux/hashicorp.repo sudo yum -y install terraform 使用Terraform创建Amazon EKS Auto Mode集群

接下来我们将使用Terraform快速创建云基础架构，包括：

VPC（虚拟私有云）ECR（Elastic Container Registry）镜像仓库启用Auto Mode的EKS集群 # Clone the GitHub repo with the manifests git clone -b v0.1 github /aws-samples/deepseek-using-vllm-on-eks cd deepseek-using-vllm-on-eks # Apply the Terraform configuration terraform init terraform apply -auto-approve # After Terraform finishes, configure kubectl with the new EKS cluster $(terraform output configure_kubectl | jq -r)

创建EKS Auto Mode NodePool

为了启用模型的GPU支持，需要创建自定义NodePool。

# Create a custom NodePool with GPU support kubectl apply -f manifests/gpu-nodepool.yaml # Check if the NodePool is in 'Ready' state kubectl get nodepool/gpu-nodepool

部署DeepSeek模型

我们将使用vLLM部署DeepSeek-R1-Distill-Llama-8B模型。为简化流程，我们提供了一个sed命令，方便大家快速设置模型名称和参数。

# Use the sed command to replace the placeholder with the model name and configuration parameters sed -i "s|__MODEL_NAME_AND_PARAMETERS__|deepseek-ai/DeepSeek-R1-Distill-Llama-8B --max_model 2048|g" manifests/deepseek-deployment-gpu.yaml # Deploy the DeepSeek model on Kubernetes kubectl apply -f manifests/deepseek-deployment-gpu.yaml # Check the pods in the 'deepseek' namespace kubectl get po -n deepseek 部署完成后，Pod可能会短暂处于Pending状态，因为EKS Auto Mode需要自动配置EC2实例，并安装必要的GPU驱动。如果Pod在Pending状态停留较长时间，请检查AWS账户的GPU实例可创建的额度，确保能够启动所需的GPU实例。可以在亚马逊云科技EC2实例配额文档中查看详细信息。注意：这些配额基于vCPU，而非实例的数量，因此申请云资源额度时需要按vCPU数量进行调整。 # Wait for the pod to reach the 'Running' state watch -n 1 kubectl get po -n deepseek # Verify that a new Node has been created kubectl get nodes -l owner=data-engineer # Check the logs to confirm that vLLM has started kubectl logs deployment.apps/deepseek-deployment -n deepseek

当部署准备就绪后，日志记录中会显示提示：Application startup complete

与DeepSeek LLM交互

接下来，我们可以创建本地代理，然后使用curl请求与模型交互。

# Set up a proxy to forward the service port to your local terminal kubectl port-forward svc/deepseek-svc -n deepseek 8080:80 > port-forward.log 2>&1 & # Send a curl request to the model curl -X POST "http://localhost:8080/v1/chat/completions" -H "Content-Type: application/json" --data '{ "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", "messages": [ { "role": "user", "content": "What is Kubernetes?" } ] }' 响应时间可能会根据模型的计算复杂度有所不同，一般需要几秒钟。可以通过deepseek-deployment日志来监控模型的执行情况。为模型构建Chatbot UI

尽管我们可以通过API直接发送请求调用EKS上的模型，但我们也可以构建一个更方便交互的Chatbot UI界面，以便用户能更方便地与模型交互。UI的源代码已经在GitHub仓库中提供。

# Retrieve the ECR repository URI created by Terraform export ECR_REPO=$(terraform output ecr_repository_uri | jq -r) # Build the container image for the Chatbot UI docker build -t $ECR_REPO:0.1 chatbot-ui/application/. # Login to ECR and push the image aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_REPO docker push $ECR_REPO:0.1 # Update the deployment manifest to use the image sed -i "s#__IMAGE_DEEPSEEK_CHATBOT__#$ECR_REPO:0.1#g" chatbot-ui/manifests/deployment.yaml # Generate a random password for the Chatbot UI login sed -i "s|__PASSWORD__|$(openssl rand -base64 12 | tr -dc A-Za-z0-9 | head -c 16)|" chatbot-ui/manifests/deployment.yaml # Deploy the UI and create the ingress class required for load balancers kubectl apply -f chatbot-ui/manifests/ingress-class.yaml kubectl apply -f chatbot-ui/manifests/deployment.yaml # Get the URL for the load balancer to access the application echo http://$(kubectl get ingress/deepseek-chatbot-ingress -n deepseek -o json | jq -r '.status.loadBalancer.ingress[0].hostname') 等待几秒钟，让负载均衡器（Load Balancer）完成部署。访问Chatbot UI时，需要输入存储在Kubernetes Secret中的用户名和密码。 echo -e "Username=$(kubectl get secret deepseek-chatbot-secrets -n deepseek -o jsonpath='{.data.admin-username}' | base64 --decode)\nPassword=$(kubectl get secret deepseek-chatbot-secrets -n deepseek -o jsonpath='{.data.admin-password}' | base64 --decode)" 登录后，大家将在界面中看到我们的Chatbot UI界面，在这里可直接与模型交互！总结

通过本教程，大家可以高效地在Amazon EKS上部署DeepSeek R1模型，利用其灵活的扩展选项和精细化计算资源控制，在优化成本的同时保持高性能。该方案充分利用了Kubernetes的原生功能，结合Amazon EKS的Auto Mode特性，提供了一种高度自定义的部署方式，可以根据具体的运营需求和预算进行精确调整。

如果大家希望进一步探索其他部署模式（如使用Neuron或开源Karpenter进行部署），可以访问GitHub仓库deepseek-using-vllm-on-eks，获取更多代码示例。

标签：

在Kubernetes上部署DeepSeek-R1进行高效AI推理由讯客互联手机栏目发布，感谢您对讯客互联的认可，以及对我们原创作品以及文章的青睐，非常欢迎各位朋友分享到个人网站或者朋友圈，但转载请说明文章出处“在Kubernetes上部署DeepSeek-R1进行高效AI推理”

上一篇
Python学习心得异常处理

下一篇
Redis回收进程工作流程详解