Docs
CodeRabbit
Cloudflare
AG Grid
SerpAPI
Netlify
OpenRouter
Neon
WorkOS
Clerk
Electric
PowerSync
Sentry
Railway
Prisma
Strapi
Unkey
CodeRabbit
Cloudflare
AG Grid
SerpAPI
Netlify
OpenRouter
Neon
WorkOS
Clerk
Electric
PowerSync
Sentry
Railway
Prisma
Strapi
Unkey
Class References
Function References
Interface References
Type Alias References
Variable References
Media

Audio Generation

Audio Generation

TanStack AI's generateAudio() activity produces audio content — music, soundscapes, or sound effects — from a text prompt. It's distinct from Text-to-Speech, which is optimized for spoken-word synthesis.

Overview

Audio generation is handled by audio adapters that follow the same tree-shakeable architecture as other adapters in TanStack AI.

Currently supported:

  • Google Gemini: Lyria 3 Pro and Lyria 3 Clip music generation
  • fal.ai: MiniMax Music, DiffRhythm, Google Lyria 2, Stable Audio 2.5, MMAudio, ElevenLabs sound effects, Thinksound, and more

Basic Usage

Google Lyria (Music)

Google's Lyria models generate full-length songs with vocals and instrumentation. lyria-3-pro-preview handles multi-verse compositions, while lyria-3-clip-preview produces 30-second clips.

typescript
import { generateAudio } from '@tanstack/ai'
import { geminiAudio } from '@tanstack/ai-gemini'

const result = await generateAudio({
  adapter: geminiAudio('lyria-3-pro-preview'),
  prompt: 'Uplifting indie pop with layered vocals and jangly guitars',
})

console.log(result.audio.url) // URL to the generated audio file
console.log(result.audio.contentType) // e.g. "audio/mpeg"
import { generateAudio } from '@tanstack/ai'
import { geminiAudio } from '@tanstack/ai-gemini'

const result = await generateAudio({
  adapter: geminiAudio('lyria-3-pro-preview'),
  prompt: 'Uplifting indie pop with layered vocals and jangly guitars',
})

console.log(result.audio.url) // URL to the generated audio file
console.log(result.audio.contentType) // e.g. "audio/mpeg"

fal.ai

fal.ai gives access to a broad catalogue of music, SFX, and general audio models through a single falAudio adapter.

Music Generation (MiniMax Music 2.6)

MiniMax's latest music model creates full compositions — vocals, backing music, and arrangements — from a single prompt.

typescript
import { generateAudio } from '@tanstack/ai'
import { falAudio } from '@tanstack/ai-fal'

const result = await generateAudio({
  adapter: falAudio('fal-ai/minimax-music/v2.6'),
  prompt: 'City Pop, 80s retro, groovy synth bass, warm female vocal, 104 BPM',
})

console.log(result.audio.url) // URL to the generated audio file
console.log(result.audio.contentType) // e.g. "audio/wav"
import { generateAudio } from '@tanstack/ai'
import { falAudio } from '@tanstack/ai-fal'

const result = await generateAudio({
  adapter: falAudio('fal-ai/minimax-music/v2.6'),
  prompt: 'City Pop, 80s retro, groovy synth bass, warm female vocal, 104 BPM',
})

console.log(result.audio.url) // URL to the generated audio file
console.log(result.audio.contentType) // e.g. "audio/wav"

Music with Explicit Lyrics (DiffRhythm)

typescript
const result = await generateAudio({
  adapter: falAudio('fal-ai/diffrhythm'),
  prompt: 'An upbeat electronic track with synths',
  modelOptions: {
    lyrics: '[verse]\nHello world\n[chorus]\nLa la la',
  },
})
const result = await generateAudio({
  adapter: falAudio('fal-ai/diffrhythm'),
  prompt: 'An upbeat electronic track with synths',
  modelOptions: {
    lyrics: '[verse]\nHello world\n[chorus]\nLa la la',
  },
})

Sound Effects

typescript
const result = await generateAudio({
  adapter: falAudio('fal-ai/elevenlabs/sound-effects/v2'),
  prompt: 'Thunderclap followed by heavy rain',
  duration: 5,
})
const result = await generateAudio({
  adapter: falAudio('fal-ai/elevenlabs/sound-effects/v2'),
  prompt: 'Thunderclap followed by heavy rain',
  duration: 5,
})

MiniMax Music v2 (lyrics_prompt)

Earlier MiniMax variants use a lyrics_prompt field for lyric guidance.

typescript
const result = await generateAudio({
  adapter: falAudio('fal-ai/minimax-music/v2'),
  prompt: 'A dreamy pop ballad in the style of the 80s',
  modelOptions: {
    lyrics_prompt: '[instrumental]',
  },
})
const result = await generateAudio({
  adapter: falAudio('fal-ai/minimax-music/v2'),
  prompt: 'A dreamy pop ballad in the style of the 80s',
  modelOptions: {
    lyrics_prompt: '[instrumental]',
  },
})

If a request doesn't return the audio you expected — a model silently truncates, a provider rejects a prompt, or the response shape looks off — pass debug: true to see every chunk the provider SDK emits. See Debug Logging.

Options

OptionTypeDescription
adapterAudioAdapterThe adapter created via falAudio() (required)
promptstringText description of the audio to generate (required)
durationnumberDesired duration in seconds (model-dependent)
modelOptionsobjectProvider-specific options (fully typed when the model ID is passed as a string literal)
debugDebugOptionEnable per-category debug logging (true, false, or a DebugConfig — see Debug Logging)

Result Shape

typescript
interface AudioGenerationResult {
  id: string
  model: string
  audio: {
    url?: string
    b64Json?: string
    contentType?: string
    duration?: number
  }
  usage?: { inputTokens?: number; outputTokens?: number; totalTokens?: number }
}
interface AudioGenerationResult {
  id: string
  model: string
  audio: {
    url?: string
    b64Json?: string
    contentType?: string
    duration?: number
  }
  usage?: { inputTokens?: number; outputTokens?: number; totalTokens?: number }
}

Gemini returns base64-encoded bytes in result.audio.b64Json. The fal adapter returns a URL in result.audio.url — if you need raw bytes, fetch() the URL yourself:

typescript
const bytes = new Uint8Array(
  await (await fetch(result.audio.url!)).arrayBuffer()
)
const bytes = new Uint8Array(
  await (await fetch(result.audio.url!)).arrayBuffer()
)

Client Hook (useGenerateAudio)

For client-side usage, framework integrations expose a useGenerateAudio hook (or createGenerateAudio in Svelte) that wraps the same generation flow. It mirrors the API of useGenerateSpeech, useGenerateImage, and other media hooks — see Generation Hooks for the full shape.

Server (streaming SSE route)

typescript
// routes/api/generate/audio.ts
import { generateAudio, toServerSentEventsResponse } from '@tanstack/ai'
import { falAudio } from '@tanstack/ai-fal'

export async function POST(req: Request) {
  const { prompt, duration } = await req.json()

  return toServerSentEventsResponse(
    generateAudio({
      adapter: falAudio('fal-ai/diffrhythm'),
      prompt,
      duration,
      stream: true,
    }),
  )
}
// routes/api/generate/audio.ts
import { generateAudio, toServerSentEventsResponse } from '@tanstack/ai'
import { falAudio } from '@tanstack/ai-fal'

export async function POST(req: Request) {
  const { prompt, duration } = await req.json()

  return toServerSentEventsResponse(
    generateAudio({
      adapter: falAudio('fal-ai/diffrhythm'),
      prompt,
      duration,
      stream: true,
    }),
  )
}

Client (React)

tsx
import { useGenerateAudio } from '@tanstack/ai-react'
import { fetchServerSentEvents } from '@tanstack/ai-client'

function AudioGenerator() {
  const { generate, result, isLoading, error, reset } = useGenerateAudio({
    connection: fetchServerSentEvents('/api/generate/audio'),
  })

  return (
    <div>
      <button
        onClick={() =>
          generate({ prompt: 'An upbeat electronic track', duration: 10 })
        }
        disabled={isLoading}
      >
        {isLoading ? 'Generating...' : 'Generate'}
      </button>
      {error && <p>Error: {error.message}</p>}
      {result?.audio.url && <audio src={result.audio.url} controls />}
      {result && <button onClick={reset}>Clear</button>}
    </div>
  )
}
import { useGenerateAudio } from '@tanstack/ai-react'
import { fetchServerSentEvents } from '@tanstack/ai-client'

function AudioGenerator() {
  const { generate, result, isLoading, error, reset } = useGenerateAudio({
    connection: fetchServerSentEvents('/api/generate/audio'),
  })

  return (
    <div>
      <button
        onClick={() =>
          generate({ prompt: 'An upbeat electronic track', duration: 10 })
        }
        disabled={isLoading}
      >
        {isLoading ? 'Generating...' : 'Generate'}
      </button>
      {error && <p>Error: {error.message}</p>}
      {result?.audio.url && <audio src={result.audio.url} controls />}
      {result && <button onClick={reset}>Clear</button>}
    </div>
  )
}

Use the fetcher option instead of connection when calling a TanStack Start server function directly.

Differences vs Text-to-Speech

generateAudio()generateSpeech()
PurposeMusic, soundscapes, SFXSpoken-word TTS
Resultresult.audio.url or result.audio.b64JsonBase64 in result.audio
Primary inputprompttext
Voice/speed controlsNoYes (voice, speed)

Use generateSpeech() when you want a spoken voice, and generateAudio() when you want non-speech audio.

Environment Variables

Each provider reads its own API key from the environment by default:

sh
GOOGLE_API_KEY=your-google-api-key
FAL_KEY=your-fal-api-key
GOOGLE_API_KEY=your-google-api-key
FAL_KEY=your-fal-api-key

Or pass it explicitly to the adapter:

typescript
geminiAudio('lyria-3-pro-preview', { apiKey: 'your-key' })
falAudio('fal-ai/diffrhythm', { apiKey: 'your-key' })
geminiAudio('lyria-3-pro-preview', { apiKey: 'your-key' })
falAudio('fal-ai/diffrhythm', { apiKey: 'your-key' })