> ## Documentation Index
> Fetch the complete documentation index at: https://docs.modelslab.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat Completions

> OpenAI-compatible chat completions endpoint. Works with any OpenAI SDK or compatible client.

## Request

```bash theme={null}
POST https://modelslab.com/api/v7/llm/chat/completions
```

Pass your API key as a Bearer token in the `Authorization` header.

```bash theme={null}
curl -X POST https://modelslab.com/api/v7/llm/chat/completions \
  -H "Authorization: Bearer $MODELSLAB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 1000,
    "temperature": 0.7
  }'
```

## Body

```json theme={null}
{
  "model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "max_tokens": 1000,
  "temperature": 0.7,
  "top_p": 1,
  "stream": false,
  "presence_penalty": 0,
  "frequency_penalty": 0
}
```

## Response

```json theme={null}
{
  "id": "chat-abc123",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}
```

## Streaming

Set `"stream": true` to receive Server-Sent Events (SSE) as tokens are generated:

```bash theme={null}
curl -X POST https://modelslab.com/api/v7/llm/chat/completions \
  -H "Authorization: Bearer $MODELSLAB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
    "messages": [{"role": "user", "content": "Write a haiku"}],
    "stream": true
  }'
```

Each SSE event contains a `chat.completion.chunk` object:

```json theme={null}
data: {"id":"chat-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Silent"},"finish_reason":null}]}
data: {"id":"chat-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" snow"},"finish_reason":null}]}
data: [DONE]
```

## OpenAI SDK

This endpoint is fully compatible with the OpenAI SDK. Just change the `base_url` and `api_key`:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from openai import OpenAI

    client = OpenAI(
        api_key="YOUR_MODELSLAB_API_KEY",
        base_url="https://modelslab.com/api/v7/llm",
    )

    # Non-streaming
    response = client.chat.completions.create(
        model="Qwen/Qwen2.5-VL-72B-Instruct-together",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain quantum computing in simple terms"},
        ],
        max_tokens=1000,
    )
    print(response.choices[0].message.content)

    # Streaming
    stream = client.chat.completions.create(
        model="Qwen/Qwen2.5-VL-72B-Instruct-together",
        messages=[{"role": "user", "content": "Write a story"}],
        stream=True,
    )
    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")
    ```
  </Tab>

  <Tab title="Node.js">
    ```javascript theme={null}
    import OpenAI from 'openai';

    const client = new OpenAI({
      apiKey: 'YOUR_MODELSLAB_API_KEY',
      baseURL: 'https://modelslab.com/api/v7/llm',
    });

    const response = await client.chat.completions.create({
      model: 'Qwen/Qwen2.5-VL-72B-Instruct-together',
      messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: 'Hello!' },
      ],
    });

    console.log(response.choices[0].message.content);
    ```
  </Tab>
</Tabs>


## OpenAPI

````yaml POST /chat/completions
openapi: 3.1.0
info:
  title: ModelsLab LLM API
  description: >-
    Unified LLM API with OpenAI and Anthropic SDK compatibility. Access 200+
    language models through a single API.
  version: 7.0.0
servers: []
security: []
paths:
  /chat/completions:
    post:
      summary: Chat Completions
      description: >-
        OpenAI-compatible chat completions endpoint. Create chat conversations
        and receive model responses.
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ChatCompletionsRequest'
      responses:
        '200':
          description: Chat completion response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ChatCompletionsResponse'
        '400':
          description: Bad request - invalid parameters
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        '401':
          description: Unauthorized - invalid or missing API key
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
      security:
        - bearerAuth: []
      servers:
        - url: https://modelslab.com/api/v7/llm
components:
  schemas:
    ChatCompletionsRequest:
      type: object
      required:
        - messages
      properties:
        model:
          type: string
          description: >-
            Model ID to use for the completion (e.g.
            'Qwen/Qwen2.5-VL-72B-Instruct-together')
        messages:
          type: array
          description: Array of chat messages
          items:
            $ref: '#/components/schemas/ChatMessage'
        max_tokens:
          type: integer
          default: 1000
          minimum: 1
          description: Maximum number of tokens to generate
        temperature:
          type: number
          minimum: 0
          maximum: 2
          default: 1
          description: Sampling temperature (0-2). Higher values make output more random.
        top_p:
          type: number
          minimum: 0
          maximum: 1
          default: 1
          description: Nucleus sampling parameter
        stream:
          type: boolean
          default: false
          description: Whether to stream partial results as Server-Sent Events
        presence_penalty:
          type: number
          minimum: -2
          maximum: 2
          default: 0
          description: Penalize new tokens based on whether they appear in the text so far
        frequency_penalty:
          type: number
          minimum: -2
          maximum: 2
          default: 0
          description: Penalize new tokens based on their frequency in the text so far
    ChatCompletionsResponse:
      type: object
      properties:
        id:
          type: string
          description: Unique completion ID
        object:
          type: string
          enum:
            - chat.completion
        created:
          type: integer
          description: Unix timestamp
        model:
          type: string
          description: Model used
        choices:
          type: array
          items:
            $ref: '#/components/schemas/ChatChoice'
        usage:
          $ref: '#/components/schemas/OpenAIUsage'
    Error:
      type: object
      properties:
        error:
          type: object
          properties:
            message:
              type: string
            type:
              type: string
            code:
              type: string
    ChatMessage:
      type: object
      required:
        - role
        - content
      properties:
        role:
          type: string
          enum:
            - system
            - user
            - assistant
          description: Role of the message sender
        content:
          type: string
          description: Content of the message
    ChatChoice:
      type: object
      properties:
        index:
          type: integer
        message:
          type: object
          properties:
            role:
              type: string
              enum:
                - assistant
            content:
              type: string
        finish_reason:
          type: string
          enum:
            - stop
            - length
            - content_filter
    OpenAIUsage:
      type: object
      properties:
        prompt_tokens:
          type: integer
        completion_tokens:
          type: integer
        total_tokens:
          type: integer
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: Bearer token authentication using ModelsLab API key

````