跳到內容

vLLM

NVIDIA Triton

NVIDIA Triton¶

Triton Inference Server 提供了一個教程，演示如何使用 vLLM 快速部署一個簡單的 facebook/opt-125m 模型。更多詳情請參閱在 Triton 中部署 vLLM 模型。