Audio Generation

TanStack AI's generateAudio() activity produces audio content — music, soundscapes, or sound effects — from a text prompt. It's distinct from Text-to-Speech, which is optimized for spoken-word synthesis.

Overview

Audio generation is handled by audio adapters that follow the same tree-shakeable architecture as other adapters in TanStack AI.

Currently supported:

Google Gemini: Lyria 3 Pro and Lyria 3 Clip music generation
fal.ai: MiniMax Music, DiffRhythm, Google Lyria 2, Stable Audio 2.5, MMAudio, ElevenLabs sound effects, Thinksound, and more

Basic Usage

Google Lyria (Music)

Google's Lyria models generate full-length songs with vocals and instrumentation. lyria-3-pro-preview handles multi-verse compositions, while lyria-3-clip-preview produces 30-second clips.

typescript

import { generateAudio } from '@tanstack/ai'
import { geminiAudio } from '@tanstack/ai-gemini'

const result = await generateAudio({
  adapter: geminiAudio('lyria-3-pro-preview'),
  prompt: 'Uplifting indie pop with layered vocals and jangly guitars',
})

console.log(result.audio.b64Json) // Base64-encoded audio bytes (Gemini)
console.log(result.audio.contentType) // e.g. "audio/mpeg"

import { generateAudio } from '@tanstack/ai'
import { geminiAudio } from '@tanstack/ai-gemini'

const result = await generateAudio({
  adapter: geminiAudio('lyria-3-pro-preview'),
  prompt: 'Uplifting indie pop with layered vocals and jangly guitars',
})

console.log(result.audio.b64Json) // Base64-encoded audio bytes (Gemini)
console.log(result.audio.contentType) // e.g. "audio/mpeg"

fal.ai

fal.ai gives access to a broad catalogue of music, SFX, and general audio models through a single falAudio adapter.

Music Generation (MiniMax Music 2.6)

MiniMax's latest music model creates full compositions — vocals, backing music, and arrangements — from a single prompt.

typescript

import { generateAudio } from '@tanstack/ai'
import { falAudio } from '@tanstack/ai-fal'

const result = await generateAudio({
  adapter: falAudio('fal-ai/minimax-music/v2.6'),
  prompt: 'City Pop, 80s retro, groovy synth bass, warm female vocal, 104 BPM',
})

console.log(result.audio.url) // URL to the generated audio file
console.log(result.audio.contentType) // e.g. "audio/wav"

import { generateAudio } from '@tanstack/ai'
import { falAudio } from '@tanstack/ai-fal'

const result = await generateAudio({
  adapter: falAudio('fal-ai/minimax-music/v2.6'),
  prompt: 'City Pop, 80s retro, groovy synth bass, warm female vocal, 104 BPM',
})

console.log(result.audio.url) // URL to the generated audio file
console.log(result.audio.contentType) // e.g. "audio/wav"

Music with Explicit Lyrics (DiffRhythm)

typescript

const result = await generateAudio({
  adapter: falAudio('fal-ai/diffrhythm'),
  prompt: 'An upbeat electronic track with synths',
  modelOptions: {
    lyrics: '[verse]\nHello world\n[chorus]\nLa la la',
  },
})

const result = await generateAudio({
  adapter: falAudio('fal-ai/diffrhythm'),
  prompt: 'An upbeat electronic track with synths',
  modelOptions: {
    lyrics: '[verse]\nHello world\n[chorus]\nLa la la',
  },
})

Sound Effects

typescript

const result = await generateAudio({
  adapter: falAudio('fal-ai/elevenlabs/sound-effects/v2'),
  prompt: 'Thunderclap followed by heavy rain',
  duration: 5,
})

const result = await generateAudio({
  adapter: falAudio('fal-ai/elevenlabs/sound-effects/v2'),
  prompt: 'Thunderclap followed by heavy rain',
  duration: 5,
})

MiniMax Music v2 (lyrics_prompt)

Earlier MiniMax variants use a lyrics_prompt field for lyric guidance.

typescript

const result = await generateAudio({
  adapter: falAudio('fal-ai/minimax-music/v2'),
  prompt: 'A dreamy pop ballad in the style of the 80s',
  modelOptions: {
    lyrics_prompt: '[instrumental]',
  },
})

const result = await generateAudio({
  adapter: falAudio('fal-ai/minimax-music/v2'),
  prompt: 'A dreamy pop ballad in the style of the 80s',
  modelOptions: {
    lyrics_prompt: '[instrumental]',
  },
})

If a request doesn't return the audio you expected — a model silently truncates, a provider rejects a prompt, or the response shape looks off — pass debug: true to see every chunk the provider SDK emits. See Debug Logging.

Options

Option	Type	Description
adapter	AudioAdapter	The adapter created via falAudio() (required)
prompt	string	Text description of the audio to generate (required)
duration	number	Desired duration in seconds (model-dependent)
modelOptions	object	Provider-specific options (fully typed when the model ID is passed as a string literal)
debug	DebugOption	Enable per-category debug logging (true, false, or a DebugConfig — see Debug Logging)

Result Shape

typescript

interface AudioGenerationResult {
  id: string
  model: string
  audio: {
    url?: string
    b64Json?: string
    contentType?: string
    duration?: number
  }
  // Canonical TokenUsage (same shape as chat), present when the provider
  // reports it (e.g. Gemini Lyria via generateContent). Usage-billed providers
  // (fal) instead surface `usage.unitsBilled` — the real billed quantity read
  // from fal's `x-fal-billable-units` result header. Multiply by the endpoint's
  // unit price (fal pricing API) for the exact cost.
  usage?: TokenUsage
}

interface AudioGenerationResult {
  id: string
  model: string
  audio: {
    url?: string
    b64Json?: string
    contentType?: string
    duration?: number
  }
  // Canonical TokenUsage (same shape as chat), present when the provider
  // reports it (e.g. Gemini Lyria via generateContent). Usage-billed providers
  // (fal) instead surface `usage.unitsBilled` — the real billed quantity read
  // from fal's `x-fal-billable-units` result header. Multiply by the endpoint's
  // unit price (fal pricing API) for the exact cost.
  usage?: TokenUsage
}

Gemini returns base64-encoded bytes in result.audio.b64Json. The fal adapter returns a URL in result.audio.url — if you need raw bytes, fetch() the URL yourself:

typescript

const bytes = new Uint8Array(
  await (await fetch(result.audio.url!)).arrayBuffer()
)

const bytes = new Uint8Array(
  await (await fetch(result.audio.url!)).arrayBuffer()
)

Client Hook (useGenerateAudio)

For client-side usage, framework integrations expose a useGenerateAudio hook (or createGenerateAudio in Svelte) that wraps the same generation flow. It mirrors the API of useGenerateSpeech, useGenerateImage, and other media hooks — see Generation Hooks for the full shape.

Server (streaming SSE route)

typescript

// routes/api/generate/audio.ts
import { generateAudio, toServerSentEventsResponse } from '@tanstack/ai'
import { falAudio } from '@tanstack/ai-fal'

export async function POST(req: Request) {
  const { prompt, duration } = await req.json()

  return toServerSentEventsResponse(
    generateAudio({
      adapter: falAudio('fal-ai/diffrhythm'),
      prompt,
      duration,
      stream: true,
    }),
  )
}

// routes/api/generate/audio.ts
import { generateAudio, toServerSentEventsResponse } from '@tanstack/ai'
import { falAudio } from '@tanstack/ai-fal'

export async function POST(req: Request) {
  const { prompt, duration } = await req.json()

  return toServerSentEventsResponse(
    generateAudio({
      adapter: falAudio('fal-ai/diffrhythm'),
      prompt,
      duration,
      stream: true,
    }),
  )
}

Client (React)

tsx

import { useGenerateAudio } from '@tanstack/ai-react'
import { fetchServerSentEvents } from '@tanstack/ai-client'

function AudioGenerator() {
  const { generate, result, isLoading, error, reset } = useGenerateAudio({
    connection: fetchServerSentEvents('/api/generate/audio'),
  })

  return (
    <div>
      <button
        onClick={() =>
          generate({ prompt: 'An upbeat electronic track', duration: 10 })
        }
        disabled={isLoading}
      >
        {isLoading ? 'Generating...' : 'Generate'}
      </button>
      {error && <p>Error: {error.message}</p>}
      {result?.audio.url && <audio src={result.audio.url} controls />}
      {result && <button onClick={reset}>Clear</button>}
    </div>
  )
}

import { useGenerateAudio } from '@tanstack/ai-react'
import { fetchServerSentEvents } from '@tanstack/ai-client'

function AudioGenerator() {
  const { generate, result, isLoading, error, reset } = useGenerateAudio({
    connection: fetchServerSentEvents('/api/generate/audio'),
  })

  return (
    <div>
      <button
        onClick={() =>
          generate({ prompt: 'An upbeat electronic track', duration: 10 })
        }
        disabled={isLoading}
      >
        {isLoading ? 'Generating...' : 'Generate'}
      </button>
      {error && <p>Error: {error.message}</p>}
      {result?.audio.url && <audio src={result.audio.url} controls />}
      {result && <button onClick={reset}>Clear</button>}
    </div>
  )
}

Use the fetcher option instead of connection when calling a TanStack Start server function directly.

Differences vs Text-to-Speech

	generateAudio()	generateSpeech()
Purpose	Music, soundscapes, SFX	Spoken-word TTS
Result	result.audio.url or result.audio.b64Json	Base64 in result.audio
Primary input	prompt	text
Voice/speed controls	No	Yes (voice, speed)

Use generateSpeech() when you want a spoken voice, and generateAudio() when you want non-speech audio.

Environment Variables

Each provider reads its own API key from the environment by default:

GOOGLE_API_KEY=your-google-api-key
FAL_KEY=your-fal-api-key

GOOGLE_API_KEY=your-google-api-key
FAL_KEY=your-fal-api-key

Or pass it explicitly to the adapter:

typescript

import { createGeminiAudio } from '@tanstack/ai-gemini'
import { falAudio } from '@tanstack/ai-fal'

createGeminiAudio('lyria-3-pro-preview', 'your-key')
falAudio('fal-ai/diffrhythm', { apiKey: 'your-key' })

import { createGeminiAudio } from '@tanstack/ai-gemini'
import { falAudio } from '@tanstack/ai-fal'

createGeminiAudio('lyria-3-pro-preview', 'your-key')
falAudio('fal-ai/diffrhythm', { apiKey: 'your-key' })

Audio Generation

Audio Generation#

Overview#

Basic Usage#

Google Lyria (Music)#

fal.ai#

Music Generation (MiniMax Music 2.6)#

Music with Explicit Lyrics (DiffRhythm)#

Sound Effects#

MiniMax Music v2 (lyrics_prompt)#

Options#

Result Shape#

Client Hook (useGenerateAudio)#

Server (streaming SSE route)#

Client (React)#

Differences vs Text-to-Speech#

Environment Variables#

Audio Generation

Overview

Basic Usage

Google Lyria (Music)

fal.ai

Music Generation (MiniMax Music 2.6)

Music with Explicit Lyrics (DiffRhythm)

Sound Effects

MiniMax Music v2 (lyrics_prompt)

Options

Result Shape

Client Hook (useGenerateAudio)

Server (streaming SSE route)

Client (React)

Differences vs Text-to-Speech

Environment Variables