跳到內容

Cerebrium

vLLM_plus_cerebrium

vLLM 可以在基於雲的 GPU 機器上執行,藉助 Cerebrium 平臺,這是一個無伺服器人工智慧基礎設施平臺,可幫助公司更輕鬆地構建和部署基於人工智慧的應用程式。

要安裝 Cerebrium 客戶端,請執行

pip install cerebrium
cerebrium login

接下來,建立您的 Cerebrium 專案,執行

cerebrium init vllm-project

接下來,要安裝所需的包,請將以下內容新增到您的 cerebrium.toml

[cerebrium.deployment]
docker_base_image_url = "nvidia/cuda:12.1.1-runtime-ubuntu22.04"

[cerebrium.dependencies.pip]
vllm = "latest"

接下來,讓我們新增程式碼來處理您選擇的 LLM 的推理(本例中使用 mistralai/Mistral-7B-Instruct-v0.1),將以下程式碼新增到您的 main.py

程式碼
from vllm import LLM, SamplingParams

llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.1")

def run(prompts: list[str], temperature: float = 0.8, top_p: float = 0.95):

    sampling_params = SamplingParams(temperature=temperature, top_p=top_p)
    outputs = llm.generate(prompts, sampling_params)

    # Print the outputs.
    results = []
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        results.append({"prompt": prompt, "generated_text": generated_text})

    return {"results": results}

然後,執行以下程式碼將其部署到雲端

cerebrium deploy

如果成功,您將收到一個 CURL 命令,可以對其進行推理呼叫。請記住在 URL 末尾加上您正在呼叫的函式名(本例中是 /run

命令
curl -X POST https://api.cortex.cerebrium.ai/v4/p-xxxxxx/vllm/run \
-H 'Content-Type: application/json' \
-H 'Authorization: <JWT TOKEN>' \
--data '{
"prompts": [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is"
]
}'

您應該會收到如下響應

響應
{
    "run_id": "52911756-3066-9ae8-bcc9-d9129d1bd262",
    "result": {
        "result": [
            {
                "prompt": "Hello, my name is",
                "generated_text": " Sarah, and I'm a teacher. I teach elementary school students. One of"
            },
            {
                "prompt": "The president of the United States is",
                "generated_text": " elected every four years. This is a democratic system.\n\n5. What"
            },
            {
                "prompt": "The capital of France is",
                "generated_text": " Paris.\n"
            },
            {
                "prompt": "The future of AI is",
                "generated_text": " bright, but it's important to approach it with a balanced and nuanced perspective."
            }
        ]
    },
    "run_time_ms": 152.53663063049316
}

您現在擁有一個自動伸縮的端點,只需為您使用的計算付費!