來源 examples/offline_inference/openai。
使用 OpenAI 批次檔案格式的離線推理#
重要提示
這是關於使用 OpenAI 批次檔案格式執行批次推理的指南,**不是** 完整的批次 (REST) API。
檔案格式#
OpenAI 批次檔案格式由新行上的一系列 JSON 物件組成。
每一行代表一個單獨的請求。 有關更多詳細資訊,請參閱 OpenAI 包參考。
注意
我們目前支援 /v1/chat/completions
、/v1/embeddings
和 /v1/score
端點(補全功能即將推出)。
先決條件#
示例 1:使用本地檔案執行#
步驟 1:建立您的批次檔案#
要繼續學習此示例,您可以下載示例批次檔案,或在您的工作目錄中建立您自己的批次檔案。
wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl
一旦您建立了批次檔案,它應該看起來像這樣
$ cat offline_inference/openai/openai_example_batch.jsonl
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
步驟 2:執行批次處理#
批次執行工具旨在從命令列使用。
您可以使用以下命令執行批次處理,這將將其結果寫入名為 results.jsonl
的檔案
python -m vllm.entrypoints.openai.run_batch -i offline_inference/openai/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
步驟 3:檢查您的結果#
您現在應該在 results.jsonl
中獲得您的結果。 您可以透過執行 cat results.jsonl
來檢查您的結果
$ cat results.jsonl
{"id":"vllm-383d1c59835645aeb2e07d004d62a826","custom_id":"request-1","response":{"id":"cmpl-61c020e54b964d5a98fa7527bfcdd378","object":"chat.completion","created":1715633336,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! It's great to meet you! I'm here to help with any questions or tasks you may have. What's on your mind today?"},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":25,"total_tokens":56,"completion_tokens":31}},"error":null}
{"id":"vllm-42e3d09b14b04568afa3f1797751a267","custom_id":"request-2","response":{"id":"cmpl-f44d049f6b3a42d4b2d7850bb1e31bcc","object":"chat.completion","created":1715633336,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"*silence*"},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":27,"total_tokens":32,"completion_tokens":5}},"error":null}
示例 2:使用遠端檔案#
批次執行器支援透過 http/https 訪問的遠端輸入和輸出 URL。
例如,要針對位於 https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl
的示例輸入檔案執行,您可以執行
python -m vllm.entrypoints.openai.run_batch -i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
示例 3:與 AWS S3 整合#
為了與雲 blob 儲存整合,我們建議使用預簽名 URL。
[在此處瞭解更多關於 S3 預簽名 URL 的資訊]
其他先決條件#
awscli
包(執行pip install awscli
)來配置您的憑據並互動式地使用 s3。boto3
python 包(執行pip install boto3
)來生成預簽名 URL。
步驟 1:上傳您的輸入指令碼#
要繼續學習此示例,您可以下載示例批次檔案,或在您的工作目錄中建立您自己的批次檔案。
wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl
一旦您建立了批次檔案,它應該看起來像這樣
$ cat offline_inference/openai/openai_example_batch.jsonl
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
現在將您的批次檔案上傳到您的 S3 儲存桶。
aws s3 cp offline_inference/openai/openai_example_batch.jsonl s3://MY_BUCKET/MY_INPUT_FILE.jsonl
步驟 2:生成您的預簽名 URL#
預簽名 URL 只能透過 SDK 生成。 您可以執行以下 python 指令碼來生成您的預簽名 URL。 請務必將 MY_BUCKET
、MY_INPUT_FILE.jsonl
和 MY_OUTPUT_FILE.jsonl
佔位符替換為您的儲存桶和檔名。
(該指令碼改編自 awsdocs/aws-doc-sdk-examples)
import boto3
from botocore.exceptions import ClientError
def generate_presigned_url(s3_client, client_method, method_parameters, expires_in):
"""
Generate a presigned Amazon S3 URL that can be used to perform an action.
:param s3_client: A Boto3 Amazon S3 client.
:param client_method: The name of the client method that the URL performs.
:param method_parameters: The parameters of the specified client method.
:param expires_in: The number of seconds the presigned URL is valid for.
:return: The presigned URL.
"""
try:
url = s3_client.generate_presigned_url(
ClientMethod=client_method, Params=method_parameters, ExpiresIn=expires_in
)
except ClientError:
raise
return url
s3_client = boto3.client("s3")
input_url = generate_presigned_url(
s3_client, "get_object", {"Bucket": "MY_BUCKET", "Key": "MY_INPUT_FILE.jsonl"}, 3600
)
output_url = generate_presigned_url(
s3_client, "put_object", {"Bucket": "MY_BUCKET", "Key": "MY_OUTPUT_FILE.jsonl"}, 3600
)
print(f"{input_url=}")
print(f"{output_url=}")
此指令碼應輸出
input_url='https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_INPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091'
output_url='https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_OUTPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091'
步驟 3:使用您的預簽名 URL 執行批次執行器#
您現在可以使用在上一步中生成的 URL 執行批次執行器。
python -m vllm.entrypoints.openai.run_batch \
-i "https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_INPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091" \
-o "https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_OUTPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091" \
--model --model meta-llama/Meta-Llama-3-8B-Instruct
步驟 4:檢視您的結果#
您的結果現在位於 S3 上。 您可以在終端中執行以下命令來檢視它們
aws s3 cp s3://MY_BUCKET/MY_OUTPUT_FILE.jsonl -
示例 4:使用嵌入端點#
其他先決條件#
確保您正在使用
vllm >= 0.5.5
。
步驟 1:建立您的批次檔案#
將嵌入請求新增到您的批次檔案。 以下是一個示例
{"custom_id": "request-1", "method": "POST", "url": "/v1/embeddings", "body": {"model": "intfloat/e5-mistral-7b-instruct", "input": "You are a helpful assistant."}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/embeddings", "body": {"model": "intfloat/e5-mistral-7b-instruct", "input": "You are an unhelpful assistant."}}
您甚至可以在批次檔案中混合聊天完成和嵌入請求,只要您使用的模型同時支援聊天完成和嵌入(請注意,所有請求都必須使用相同的模型)。
步驟 2:執行批次處理#
您可以使用與前面示例中相同的命令來執行批次處理。
步驟 3:檢查您的結果#
您可以透過執行 cat results.jsonl
來檢查您的結果
$ cat results.jsonl
{"id":"vllm-db0f71f7dec244e6bce530e0b4ef908b","custom_id":"request-1","response":{"status_code":200,"request_id":"vllm-batch-3580bf4d4ae54d52b67eee266a6eab20","body":{"id":"embd-33ac2efa7996430184461f2e38529746","object":"list","created":444647,"model":"intfloat/e5-mistral-7b-instruct","data":[{"index":0,"object":"embedding","embedding":[0.016204833984375,0.0092010498046875,0.0018358230590820312,-0.0028228759765625,0.001422882080078125,-0.0031147003173828125,...]}],"usage":{"prompt_tokens":8,"total_tokens":8,"completion_tokens":0}}},"error":null}
...
示例 5:使用評分端點#
其他先決條件#
確保您正在使用
vllm >= 0.7.0
。
步驟 1:建立您的批次檔案#
將評分請求新增到您的批次檔案。 以下是一個示例
{"custom_id": "request-1", "method": "POST", "url": "/v1/score", "body": {"model": "BAAI/bge-reranker-v2-m3", "text_1": "What is the capital of France?", "text_2": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/score", "body": {"model": "BAAI/bge-reranker-v2-m3", "text_1": "What is the capital of France?", "text_2": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}
您可以在批次檔案中混合聊天完成、嵌入和評分請求,只要您使用的模型都支援它們(請注意,所有請求都必須使用相同的模型)。
步驟 2:執行批次處理#
您可以使用與前面示例中相同的命令來執行批次處理。
步驟 3:檢查您的結果#
您可以透過執行 cat results.jsonl
來檢查您的結果
$ cat results.jsonl
{"id":"vllm-f87c5c4539184f618e555744a2965987","custom_id":"request-1","response":{"status_code":200,"request_id":"vllm-batch-806ab64512e44071b37d3f7ccd291413","body":{"id":"score-4ee45236897b4d29907d49b01298cdb1","object":"list","created":1737847944,"model":"BAAI/bge-reranker-v2-m3","data":[{"index":0,"object":"score","score":0.0010900497436523438},{"index":1,"object":"score","score":1.0}],"usage":{"prompt_tokens":37,"total_tokens":37,"completion_tokens":0,"prompt_tokens_details":null}}},"error":null}
{"id":"vllm-41990c51a26d4fac8419077f12871099","custom_id":"request-2","response":{"status_code":200,"request_id":"vllm-batch-73ce66379026482699f81974e14e1e99","body":{"id":"score-13f2ffe6ba40460fbf9f7f00ad667d75","object":"list","created":1737847944,"model":"BAAI/bge-reranker-v2-m3","data":[{"index":0,"object":"score","score":0.001094818115234375},{"index":1,"object":"score","score":1.0}],"usage":{"prompt_tokens":37,"total_tokens":37,"completion_tokens":0,"prompt_tokens_details":null}}},"error":null}
示例材料#
openai_example_batch.jsonl
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}