Stream AI Responses in Real-Time with AWS Lambda and Vercel AI SDK
Stream AI Responses in Real-Time with AWS Lambda and Vercel AI SDK
Ever waited 30 seconds for an AI response? That spinning loader kills user engagement. Traditional APIs weren't built for AI workloads where responses can take forever to generate.
The Vercel AI SDK plus AWS Lambda response streaming fixes this. Instead of waiting, users see content appear as it's written - first words show up in under 500ms.
This works with any LLM provider (Bedrock, OpenAI, Anthropic) and keeps memory usage flat no matter how long the response gets. Here's how to build it.
Overview
The setup is built around AWS Lambda Function URLs - direct HTTPS endpoints for your functions. You can't use API Gateway here because it doesn't support streaming (everything gets buffered). Lambda Function URLs were added in 2022 specifically for streaming use cases.
Users now expect real-time AI responses. Whether it's generating code, writing content, or analyzing data, people want to see the AI "thinking" rather than staring at a blank screen. This turns boring request/response into something that feels alive.
Implementation
Lambda has built-in streaming support with awslambda.streamifyResponse
. The Vercel AI SDK's streamText().toDataStream()
handles the client format:
import { bedrock } from "@ai-sdk/amazon-bedrock";
import { streamText } from "ai";
export const handler = awslambda.streamifyResponse(
async (event, responseStream, context) => {
const body = JSON.parse(event.body);
const stream = streamText({
model: bedrock("anthropic.claude-3-haiku-20240307-v1:0"),
messages: body?.messages ?? [],
}).toDataStream();
for await (const chunk of stream) {
responseStream.write(chunk);
}
responseStream.end();
},
);
That's it. Chunks go straight to the client as they're generated. No buffering, constant memory use, real streaming. The Vercel AI SDK handles all the connection stuff on the frontend.
Lambda Function URLs just work with awslambda.streamifyResponse
- no extra setup needed. You don't have to mess with WebSockets or server-sent events.
TypeScript Support
The awslambda
global isn't in the standard Lambda types, so you'll need these:
// types/aws-lambda-runtime.d.ts
declare global {
namespace awslambda {
interface ResponseStream {
write(chunk: string | Buffer): void;
end(): void;
}
function streamifyResponse<T extends any[], R>(
handler: (
event: T[0],
responseStream: ResponseStream,
context: import('aws-lambda').Context
) => Promise<R> | R
): (event: T[0], context: import('aws-lambda').Context) => Promise<R>;
}
}
export {};
Infrastructure as Code
AWS CDK handles the deployment and gets all the permissions right:
import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as nodejs from 'aws-cdk-lib/aws-lambda-nodejs';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';
export class StreamingAIStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
// Lambda function for streaming AI responses
const streamingFunction = new nodejs.NodejsFunction(this, 'StreamingAIFunction', {
runtime: lambda.Runtime.NODEJS_20_X,
entry: 'lambda/index.ts',
timeout: cdk.Duration.minutes(15), // Maximum for streaming
memorySize: 1024,
});
// Add IAM permissions for Bedrock
streamingFunction.addToRolePolicy(
new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
actions: [
'bedrock:InvokeModel',
'bedrock:InvokeModelWithResponseStream',
],
resources: [
`arn:aws:bedrock:${this.region}::foundation-model/anthropic.claude-3-haiku-20240307-v1:0`,
],
})
);
// Create Function URL with streaming support
const functionUrl = streamingFunction.addFunctionUrl({
authType: lambda.FunctionUrlAuthType.NONE,
cors: {
allowCredentials: false,
allowedHeaders: ['*'],
allowedMethods: [lambda.HttpMethod.ALL],
allowedOrigins: ['*'],
maxAge: cdk.Duration.hours(1),
},
});
// Outputs
new cdk.CfnOutput(this, 'FunctionUrl', {
value: functionUrl.url,
description: 'Lambda Function URL for direct access',
});
}
}
Frontend Integration
The Vercel AI SDK works with all major frameworks. Here's the React version:
import { useChat } from 'ai/react';
function ChatComponent() {
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: 'https://your-lambda-url.lambda-url.region.on.aws/',
});
return (
<div>
{messages.map(m => (
<div key={m.id}>{m.role}: {m.content}</div>
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} />
<button type="submit">Send</button>
</form>
</div>
);
}
Advanced Use Cases
This gets more interesting with advanced patterns:
RAG (Retrieval-Augmented Generation): Stream search results and AI responses at the same time. Users see the documents being found while the AI writes its answer.
Tool Usage: When your AI calls APIs or databases, users see each tool being used and results coming back live.
Multi-Step Reasoning: For complex questions, stream each step of the AI's thinking process as it happens.
Conclusion
The Vercel AI SDK and AWS Lambda streaming work really well together. Setup is minimal, it works with any LLM or frontend framework, and gives you proper real-time responses. Add CloudFront if you need global distribution. This handles everything from basic chat to complex RAG and tool workflows.