dstack¶

vLLM 可以在基於雲的 GPU 機器上執行，使用 dstack，這是一個用於在任何雲上執行 LLM 的開源框架。本教程假設您已在您的雲環境中配置了憑據、閘道器和 GPU 配額。

要安裝 dstack 客戶端，請執行

pip install "dstack[all]
dstack server

接下來，要配置您的 dstack 專案，請執行

mkdir -p vllm-dstack
cd vllm-dstack
dstack init

接下來，要使用您選擇的 LLM 配置一個虛擬機器例項（本例中為 NousResearch/Llama-2-7b-chat-hf），請為 dstack Service 建立以下 serve.dstack.yml 檔案

配置

type: service

python: "3.11"
env:
    - MODEL=NousResearch/Llama-2-7b-chat-hf
port: 8000
resources:
    gpu: 24GB
commands:
    - pip install vllm
    - vllm serve $MODEL --port 8000
model:
    format: openai
    type: chat
    name: NousResearch/Llama-2-7b-chat-hf

然後，執行以下 CLI 進行配置

命令

$ dstack run . -f serve.dstack.yml

⠸ Getting run plan...
Configuration  serve.dstack.yml
Project        deep-diver-main
User           deep-diver
Min resources  2..xCPU, 8GB.., 1xGPU (24GB)
Max price      -
Max duration   -
Spot policy    auto
Retry policy   no

#  BACKEND  REGION       INSTANCE       RESOURCES                               SPOT  PRICE
1  gcp   us-central1  g2-standard-4  4xCPU, 16GB, 1xL4 (24GB), 100GB (disk)  yes   $0.223804
2  gcp   us-east1     g2-standard-4  4xCPU, 16GB, 1xL4 (24GB), 100GB (disk)  yes   $0.223804
3  gcp   us-west1     g2-standard-4  4xCPU, 16GB, 1xL4 (24GB), 100GB (disk)  yes   $0.223804
    ...
Shown 3 of 193 offers, $5.876 max

Continue? [y/n]: y
⠙ Submitting run...
⠏ Launching spicy-treefrog-1 (pulling)
spicy-treefrog-1 provisioning completed (running)
Service is published at ...

配置完成後，您可以使用 OpenAI SDK 與模型進行互動

程式碼

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.<gateway domain>",
    api_key="<YOUR-DSTACK-SERVER-ACCESS-TOKEN>"
)

completion = client.chat.completions.create(
    model="NousResearch/Llama-2-7b-chat-hf",
    messages=[
        {
            "role": "user",
            "content": "Compose a poem that explains the concept of recursion in programming.",
        }
    ]
)

print(completion.choices[0].message.content)

注意

dstack 使用 dstack 的令牌自動處理閘道器上的身份驗證。同時，如果您不想配置閘道器，可以配置 dstack Task 而不是 Service。Task 僅用於開發目的。如果您想了解更多關於如何使用 dstack 提供 vLLM 服務的實踐材料，請檢視此倉庫