跳到內容

結構化輸出

vLLM 支援使用 xgrammarguidance 作為後端來生成結構化輸出。本文件將向您展示一些可用選項的示例,以生成結構化輸出。

警告

如果您仍在使用以下在 v0.12.0 中移除的已棄用的 API 欄位,請更新您的程式碼以使用 structured_outputs,具體方法請參見本文件其餘部分。

  • guided_json -> {"structured_outputs": {"json": ...}}StructuredOutputsParams(json=...)
  • guided_regex -> {"structured_outputs": {"regex": ...}}StructuredOutputsParams(regex=...)
  • guided_choice -> {"structured_outputs": {"choice": ...}}StructuredOutputsParams(choice=...)
  • guided_grammar -> {"structured_outputs": {"grammar": ...}}StructuredOutputsParams(grammar=...)
  • guided_whitespace_pattern -> {"structured_outputs": {"whitespace_pattern": ...}}StructuredOutputsParams(whitespace_pattern=...)
  • structural_tag -> {"structured_outputs": {"structural_tag": ...}}StructuredOutputsParams(structural_tag=...)
  • guided_decoding_backend -> 從您的請求中移除此欄位

線上服務 (OpenAI API)

您可以使用 OpenAI 的 CompletionsChat API 來生成結構化輸出。

支援以下引數,必須新增為額外引數:

  • choice:輸出將是選項中的一個。
  • regex:輸出將遵循正則表示式模式。
  • json:輸出將遵循 JSON 模式。
  • grammar:輸出將遵循上下文無關文法。
  • structural_tag:在生成的文字中,遵循指定標籤集內的 JSON 模式。

您可以在 OpenAI 相容伺服器 頁面上找到支援的引數的完整列表。

結構化輸出在 OpenAI 相容伺服器中預設支援。您可以透過將 --structured-outputs-config.backend 標誌設定為 vllm serve 來選擇要使用的後端。預設後端是 auto,它將嘗試根據請求的詳細資訊選擇合適的後端。您也可以選擇一個特定的後端,以及一些選項。完整的選項集可在 vllm serve --help 文字中找到。

現在讓我們從最簡單的 choice 開始,看每個用例的示例。

程式碼
from openai import OpenAI
client = OpenAI(
    base_url="https://:8000/v1",
    api_key="-",
)
model = client.models.list().data[0].id

completion = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
    ],
    extra_body={"structured_outputs": {"choice": ["positive", "negative"]}},
)
print(completion.choices[0].message.content)

下一個示例展示瞭如何使用 regex。支援的正則表示式語法取決於結構化輸出後端。例如,xgrammarguidanceoutlines 使用 Rust 風格的正則表示式,而 lm-format-enforcer 使用 Python 的 re 模組。目的是在給定一個簡單的正則表示式模板的情況下,生成一個電子郵件地址。

程式碼
completion = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "user",
            "content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com and new line. Example result: [email protected]\n",
        }
    ],
    extra_body={"structured_outputs": {"regex": r"\w+@\w+\.com\n"}, "stop": ["\n"]},
)
print(completion.choices[0].message.content)

結構化文字生成中最相關的特性之一是生成具有預定義欄位和格式的有效 JSON 的選項。為此,我們可以透過兩種方式使用 json 引數:

下一個示例展示瞭如何將 response_format 引數與 Pydantic 模型一起使用。

程式碼
from pydantic import BaseModel
from enum import Enum

class CarType(str, Enum):
    sedan = "sedan"
    suv = "SUV"
    truck = "Truck"
    coupe = "Coupe"

class CarDescription(BaseModel):
    brand: str
    model: str
    car_type: CarType

json_schema = CarDescription.model_json_schema()

completion = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "user",
            "content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "car-description",
            "schema": CarDescription.model_json_schema()
        },
    },
)
print(completion.choices[0].message.content)

提示

雖然並非嚴格必需,但通常最好在提示中指明 JSON 模式以及如何填充欄位。在大多數情況下,這可以顯著提高結果。

最後是 grammar 選項,這可能是最難使用的,但它非常強大。它允許我們定義完整的語言,如 SQL 查詢。它透過使用上下文無關的 EBNF 文法來實現。例如,我們可以使用它來定義一種特定格式的簡化 SQL 查詢。

程式碼
simplified_sql_grammar = """
    root ::= select_statement

    select_statement ::= "SELECT " column " from " table " where " condition

    column ::= "col_1 " | "col_2 "

    table ::= "table_1 " | "table_2 "

    condition ::= column "= " number

    number ::= "1 " | "2 "
"""

completion = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "user",
            "content": "Generate an SQL query to show the 'username' and 'email' from the 'users' table.",
        }
    ],
    extra_body={"structured_outputs": {"grammar": simplified_sql_grammar}},
)
print(completion.choices[0].message.content)

另請參見:完整示例

推理輸出

您也可以使用結構化輸出來進行推理,支援:用於推理模型。

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --reasoning-parser deepseek_r1

請注意,您可以將推理與任何提供的結構化輸出功能結合使用。以下示例使用了 JSON 模式:

程式碼
from pydantic import BaseModel


class People(BaseModel):
    name: str
    age: int


completion = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "user",
            "content": "Generate a JSON with the name and age of one random person.",
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "people",
            "schema": People.model_json_schema()
        }
    },
)
print("reasoning: ", completion.choices[0].message.reasoning)
print("content: ", completion.choices[0].message.content)

另請參見:完整示例

實驗性自動解析 (OpenAI API)

本節介紹 OpenAI 的 Beta 版包裝器,它覆蓋了 client.chat.completions.create() 方法,並提供了與 Python 特定型別更豐富的整合。

在撰寫本文時(openai==1.54.4),這是 OpenAI 客戶端庫中的一個“Beta”功能。程式碼參考可以在此處找到。

在以下示例中,vLLM 使用 vllm serve meta-llama/Llama-3.1-8B-Instruct 進行設定。

這是一個簡單的示例,演示瞭如何使用 Pydantic 模型獲取結構化輸出。

程式碼
from pydantic import BaseModel
from openai import OpenAI

class Info(BaseModel):
    name: str
    age: int

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
model = client.models.list().data[0].id
completion = client.beta.chat.completions.parse(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
    ],
    response_format=Info,
)

message = completion.choices[0].message
print(message)
assert message.parsed
print("Name:", message.parsed.name)
print("Age:", message.parsed.age)
ParsedChatCompletionMessage[Testing](content='{"name": "Cameron", "age": 28}', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=Testing(name='Cameron', age=28))
Name: Cameron
Age: 28

這是一個更復雜的示例,使用巢狀的 Pydantic 模型來處理分步數學解決方案。

程式碼
from typing import List
from pydantic import BaseModel
from openai import OpenAI

class Step(BaseModel):
    explanation: str
    output: str

class MathResponse(BaseModel):
    steps: list[Step]
    final_answer: str

completion = client.beta.chat.completions.parse(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful expert math tutor."},
        {"role": "user", "content": "Solve 8x + 31 = 2."},
    ],
    response_format=MathResponse,
)

message = completion.choices[0].message
print(message)
assert message.parsed
for i, step in enumerate(message.parsed.steps):
    print(f"Step #{i}:", step)
print("Answer:", message.parsed.final_answer)

輸出

ParsedChatCompletionMessage[MathResponse](content='{ "steps": [{ "explanation": "First, let\'s isolate the term with the variable \'x\'. To do this, we\'ll subtract 31 from both sides of the equation.", "output": "8x + 31 - 31 = 2 - 31"}, { "explanation": "By subtracting 31 from both sides, we simplify the equation to 8x = -29.", "output": "8x = -29"}, { "explanation": "Next, let\'s isolate \'x\' by dividing both sides of the equation by 8.", "output": "8x / 8 = -29 / 8"}], "final_answer": "x = -29/8" }', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=MathResponse(steps=[Step(explanation="First, let's isolate the term with the variable 'x'. To do this, we'll subtract 31 from both sides of the equation.", output='8x + 31 - 31 = 2 - 31'), Step(explanation='By subtracting 31 from both sides, we simplify the equation to 8x = -29.', output='8x = -29'), Step(explanation="Next, let's isolate 'x' by dividing both sides of the equation by 8.", output='8x / 8 = -29 / 8')], final_answer='x = -29/8'))
Step #0: explanation="First, let's isolate the term with the variable 'x'. To do this, we'll subtract 31 from both sides of the equation." output='8x + 31 - 31 = 2 - 31'
Step #1: explanation='By subtracting 31 from both sides, we simplify the equation to 8x = -29.' output='8x = -29'
Step #2: explanation="Next, let's isolate 'x' by dividing both sides of the equation by 8." output='8x / 8 = -29 / 8'
Answer: x = -29/8

關於使用 structural_tag 的示例可以在此處找到: examples/online_serving/structured_outputs

離線推理

離線推理支援相同型別的結構化輸出。要使用它,我們需要在 SamplingParams 中使用 StructuredOutputsParams 類來配置結構化輸出。StructuredOutputsParams 內的主要可用選項是:

  • json
  • regex
  • choice
  • grammar
  • structural_tag

這些引數的使用方式與線上服務示例中的引數相同。下面展示了一個使用 choice 引數的示例。

程式碼
from vllm import LLM, SamplingParams
from vllm.sampling_params import StructuredOutputsParams

llm = LLM(model="HuggingFaceTB/SmolLM2-1.7B-Instruct")

structured_outputs_params = StructuredOutputsParams(choice=["Positive", "Negative"])
sampling_params = SamplingParams(structured_outputs=structured_outputs_params)
outputs = llm.generate(
    prompts="Classify this sentiment: vLLM is wonderful!",
    sampling_params=sampling_params,
)
print(outputs[0].outputs[0].text)

另請參見:完整示例