結構化輸出¶
vLLM 支援使用 xgrammar 或 guidance 作為後端生成結構化輸出。本文件將為您展示可用於生成結構化輸出的不同選項的一些示例。
線上服務 (OpenAI API)¶
您可以使用 OpenAI 的 Completions 和 Chat API 來生成結構化輸出。
支援以下引數,它們必須作為額外引數新增:
guided_choice
: 輸出將精確地是所選選項之一。guided_regex
: 輸出將遵循正則表示式模式。guided_json
: 輸出將遵循 JSON 模式。guided_grammar
: 輸出將遵循上下文無關語法。structural_tag
: 在生成的文字中,遵循指定標籤集內的 JSON 模式。
您可以在 OpenAI 相容伺服器 頁面上檢視支援的完整引數列表。
OpenAI 相容伺服器預設支援結構化輸出。您可以透過將 --guided-decoding-backend
標誌設定為 vllm serve
來指定要使用的後端。預設後端是 auto
,它將嘗試根據請求的詳細資訊選擇適當的後端。您也可以選擇特定的後端,並附加一些選項。完整的選項集可在 vllm serve --help
文字中找到。
現在,我們來看每個示例,從最簡單的 guided_choice
開始。
程式碼
from openai import OpenAI
client = OpenAI(
base_url="https://:8000/v1",
api_key="-",
)
model = client.models.list().data[0].id
completion = client.chat.completions.create(
model=model,
messages=[
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
],
extra_body={"guided_choice": ["positive", "negative"]},
)
print(completion.choices[0].message.content)
下一個示例展示瞭如何使用 guided_regex
。其思想是根據簡單的正則表示式模板生成電子郵件地址。
程式碼
completion = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com and new line. Example result: [email protected]\n",
}
],
extra_body={"guided_regex": r"\w+@\w+\.com\n", "stop": ["\n"]},
)
print(completion.choices[0].message.content)
結構化文字生成中最相關的特性之一是能夠生成具有預定義欄位和格式的有效 JSON。為此,我們可以透過兩種不同的方式使用 guided_json
引數:
- 直接使用 JSON Schema
- 定義一個 Pydantic 模型,然後從中提取 JSON Schema(這通常是一個更簡單的選項)。
下一個示例展示瞭如何將 guided_json
引數與 Pydantic 模型一起使用。
程式碼
from pydantic import BaseModel
from enum import Enum
class CarType(str, Enum):
sedan = "sedan"
suv = "SUV"
truck = "Truck"
coupe = "Coupe"
class CarDescription(BaseModel):
brand: str
model: str
car_type: CarType
json_schema = CarDescription.model_json_schema()
completion = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "car-description",
"schema": CarDescription.model_json_schema()
},
},
)
print(completion.choices[0].message.content)
提示
雖然並非絕對必要,但通常最好在提示中指明 JSON 模式以及欄位應如何填充。這在大多數情況下可以顯著改善結果。
最後,我們有 guided_grammar
選項,這可能是最難使用的,但它確實非常強大。它允許我們定義完整的語言,如 SQL 查詢。它透過使用上下文無關的 EBNF 語法工作。例如,我們可以使用它來定義簡化 SQL 查詢的特定格式。
程式碼
simplified_sql_grammar = """
root ::= select_statement
select_statement ::= "SELECT " column " from " table " where " condition
column ::= "col_1 " | "col_2 "
table ::= "table_1 " | "table_2 "
condition ::= column "= " number
number ::= "1 " | "2 "
"""
completion = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": "Generate an SQL query to show the 'username' and 'email' from the 'users' table.",
}
],
extra_body={"guided_grammar": simplified_sql_grammar},
)
print(completion.choices[0].message.content)
另請參閱:完整示例
推理輸出¶
您還可以將結構化輸出與
請注意,您可以將推理與任何提供的結構化輸出功能結合使用。以下示例使用帶有 JSON 模式的功能。
程式碼
from pydantic import BaseModel
class People(BaseModel):
name: str
age: int
completion = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": "Generate a JSON with the name and age of one random person.",
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "people",
"schema": People.model_json_schema()
}
},
)
print("reasoning_content: ", completion.choices[0].message.reasoning_content)
print("content: ", completion.choices[0].message.content)
另請參閱:完整示例
實驗性自動解析 (OpenAI API)¶
本節介紹了 OpenAI Beta 版對 client.chat.completions.create()
方法的包裝,該包裝提供了與 Python 特定型別更豐富的整合。
截至撰寫本文時 (openai==1.54.4
),這是 OpenAI 客戶端庫中的一項“Beta”功能。程式碼參考可在此處找到。
對於以下示例,vLLM 使用 vllm serve meta-llama/Llama-3.1-8B-Instruct
進行設定。
這是一個使用 Pydantic 模型獲取結構化輸出的簡單示例。
程式碼
from pydantic import BaseModel
from openai import OpenAI
class Info(BaseModel):
name: str
age: int
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
model = client.models.list().data[0].id
completion = client.beta.chat.completions.parse(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
],
response_format=Info,
)
message = completion.choices[0].message
print(message)
assert message.parsed
print("Name:", message.parsed.name)
print("Age:", message.parsed.age)
ParsedChatCompletionMessage[Testing](content='{"name": "Cameron", "age": 28}', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=Testing(name='Cameron', age=28))
Name: Cameron
Age: 28
這是一個使用巢狀 Pydantic 模型處理逐步數學解法的更復雜示例。
程式碼
from typing import List
from pydantic import BaseModel
from openai import OpenAI
class Step(BaseModel):
explanation: str
output: str
class MathResponse(BaseModel):
steps: list[Step]
final_answer: str
completion = client.beta.chat.completions.parse(
model=model,
messages=[
{"role": "system", "content": "You are a helpful expert math tutor."},
{"role": "user", "content": "Solve 8x + 31 = 2."},
],
response_format=MathResponse,
)
message = completion.choices[0].message
print(message)
assert message.parsed
for i, step in enumerate(message.parsed.steps):
print(f"Step #{i}:", step)
print("Answer:", message.parsed.final_answer)
輸出
ParsedChatCompletionMessage[MathResponse](content='{ "steps": [{ "explanation": "First, let\'s isolate the term with the variable \'x\'. To do this, we\'ll subtract 31 from both sides of the equation.", "output": "8x + 31 - 31 = 2 - 31"}, { "explanation": "By subtracting 31 from both sides, we simplify the equation to 8x = -29.", "output": "8x = -29"}, { "explanation": "Next, let\'s isolate \'x\' by dividing both sides of the equation by 8.", "output": "8x / 8 = -29 / 8"}], "final_answer": "x = -29/8" }', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=MathResponse(steps=[Step(explanation="First, let's isolate the term with the variable 'x'. To do this, we'll subtract 31 from both sides of the equation.", output='8x + 31 - 31 = 2 - 31'), Step(explanation='By subtracting 31 from both sides, we simplify the equation to 8x = -29.', output='8x = -29'), Step(explanation="Next, let's isolate 'x' by dividing both sides of the equation by 8.", output='8x / 8 = -29 / 8')], final_answer='x = -29/8'))
Step #0: explanation="First, let's isolate the term with the variable 'x'. To do this, we'll subtract 31 from both sides of the equation." output='8x + 31 - 31 = 2 - 31'
Step #1: explanation='By subtracting 31 from both sides, we simplify the equation to 8x = -29.' output='8x = -29'
Step #2: explanation="Next, let's isolate 'x' by dividing both sides of the equation by 8." output='8x / 8 = -29 / 8'
Answer: x = -29/8
使用 structural_tag
的示例可在以下連結找到: examples/online_serving/structured_outputs
離線推理¶
離線推理也支援相同型別的結構化輸出。要使用它,我們需要在 SamplingParams
類中配置引導式解碼,使用 GuidedDecodingParams
。GuidedDecodingParams
中的主要可用選項有:
json
regex
choice
grammar
structural_tag
這些引數可以與上面線上服務示例中的引數以相同的方式使用。下面展示了 choice
引數用法的一個示例:
程式碼
from vllm import LLM, SamplingParams
from vllm.sampling_params import GuidedDecodingParams
llm = LLM(model="HuggingFaceTB/SmolLM2-1.7B-Instruct")
guided_decoding_params = GuidedDecodingParams(choice=["Positive", "Negative"])
sampling_params = SamplingParams(guided_decoding=guided_decoding_params)
outputs = llm.generate(
prompts="Classify this sentiment: vLLM is wonderful!",
sampling_params=sampling_params,
)
print(outputs[0].outputs[0].text)
另請參閱:完整示例