> ## Documentation Index
> Fetch the complete documentation index at: https://docs.modelslab.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech To Text

> Speech-to-Text allow to convert audio into written transcription in multiple languages.

## Request

Make a `POST` request to below endpoint and pass the required parameters as a request body.

```curl curl theme={null}
--request POST 'https://modelslab.com/api/v6/voice/speech_to_text' \
```

## Body

```json json theme={null}
{    
  "key": "your_api_key",    
  "init_audio": "https://assets.modelslab.ai/generations/9ab0c784-65ec-41b3-a646-99dfe16b053b.mp3",    
  "language": "en",    
  "timestamp_level": null,    
  "webhook": null,    
  "track_id": null
}
```

<Warning>
  **Timestamp Level Accuracy:** Sentence-level timestamps work well and provide reliable results. However, word-level timestamps may not be accurate and may provide less reliable results.
</Warning>

### Languages Supported

<Note>
  Whisper supports several languages, but performance may vary due to factors like limited training data, script complexity, and regional dialects, potentially affecting transcription accuracy.
</Note>

```
"Afrikaans": "af",
"Arabic": "ar",
"Belarusian": "be",
"Bengali": "bn",
"Bulgarian": "bg",
"Chinese": "zh",
"Czech": "cs",
"Danish": "da",
"Dutch": "nl",
"English": "en",
"Finnish": "fi",
"French": "fr",
"German": "de",
"Greek": "el",
"Hebrew": "he",
"Hindi": "hi",
"Hungarian": "hu",
"Indonesian": "id",
"Italian": "it",
"Japanese": "ja",
"Kannada": "kn",
"Korean": "ko",
"Malayalam": "ml",
"Marathi": "mr",
"Nepali": "ne",
"Panjabi": "pa",
"Persian": "fa",
"Polish": "pl",
"Portuguese": "pt",
"Romanian": "ro",
"Russian": "ru",
"Serbian": "sr",
"Spanish": "es",
"Swedish": "sv",
"Tagalog": "tl",
"Tamil": "ta",
"Telugu": "te",
"Thai": "th",
"Turkish": "tr",
"Ukrainian": "uk",
"Urdu": "ur",
"Vietnamese": "vi",
"Welsh": "cy"
```


## OpenAPI

````yaml POST /voice/speech_to_text
openapi: 3.1.0
info:
  title: ModelsLab Voice API
  description: >-
    A comprehensive API for AI-driven voice and audio generation including
    text-to-speech, voice cloning, music generation, and audio processing
    capabilities
  license:
    name: MIT
  version: 6.0.0
servers:
  - url: https://modelslab.com/api/v6
security: []
paths:
  /voice/speech_to_text:
    post:
      summary: Convert speech to text
      description: Transcribes audio files to text
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SpeechToTextRequest'
      responses:
        '200':
          description: Speech to text response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/VoiceResponse'
        '400':
          description: Bad request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
components:
  schemas:
    SpeechToTextRequest:
      type: object
      required:
        - key
        - init_audio
        - language
      properties:
        key:
          type: string
          description: API key required to authorize the request
        init_audio:
          type: string
          format: uri
          description: >-
            URL of audio file to transcribe. Supported: WAV, MP3, FLAC, OPUS (5
            seconds - 1 hour)
        language:
          type: string
          description: Language code in ISO 639-1 format (e.g. 'en', 'es', 'fr')
        timestamp_level:
          type: string
          enum:
            - word
            - sentence
            - null
          description: >-
            Level of detail for timestamps in transcription. Sentence-level
            timestamps work well and provide reliable results. However,
            word-level timestamps may not be accurate and may provide less
            reliable results.
        webhook:
          type: string
          format: uri
          description: URL to receive POST notification upon completion
        track_id:
          type: integer
          description: ID for webhook identification
    VoiceResponse:
      type: object
      properties:
        status:
          type: string
          enum:
            - success
            - processing
            - error
          description: Status of the voice generation
        generationTime:
          type: number
          description: Time taken to generate the audio in seconds
        id:
          type: integer
          description: Unique identifier for the voice generation
        output:
          type: array
          items:
            type: string
            format: uri
          description: Array of generated audio URLs
        proxy_links:
          type: array
          items:
            type: string
            format: uri
          description: Array of proxy audio URLs
        future_links:
          type: array
          items:
            type: string
            format: uri
          description: Array of future audio URLs for queued requests
        links:
          type: array
          items:
            type: string
            format: uri
          description: Array of audio URLs (voice cover response)
        meta:
          type: object
          description: Metadata about the audio generation including all parameters used
        eta:
          type: integer
          description: Estimated time for completion in seconds (processing status)
        message:
          type: string
          description: Status message or additional information
        tip:
          type: string
          description: Additional information or tips for the user
        fetch_result:
          type: string
          format: uri
          description: URL to fetch the result when processing
        audio_time:
          type: number
          description: Duration of the generated audio in seconds
    Error:
      type: object
      required:
        - status
        - message
      properties:
        status:
          type: string
          enum:
            - error
        message:
          type: string
          description: Error message description

````