Back

Stream AI Responses in Real-Time with AWS Lambda and Vercel AI SDK

Stream AI Responses in Real-Time with AWS Lambda and Vercel AI SDK

Ever waited 30 seconds for an AI response? That spinning loader kills user engagement. Traditional APIs weren't built for AI workloads where responses can take forever to generate.

The Vercel AI SDK plus AWS Lambda response streaming fixes this. Instead of waiting, users see content appear as it's written - first words show up in under 500ms.

This works with any LLM provider (Bedrock, OpenAI, Anthropic) and keeps memory usage flat no matter how long the response gets. Here's how to build it.

Overview

The setup is built around AWS Lambda Function URLs - direct HTTPS endpoints for your functions. You can't use API Gateway here because it doesn't support streaming (everything gets buffered). Lambda Function URLs were added in 2022 specifically for streaming use cases.

Users now expect real-time AI responses. Whether it's generating code, writing content, or analyzing data, people want to see the AI "thinking" rather than staring at a blank screen. This turns boring request/response into something that feels alive.

Implementation

Lambda has built-in streaming support with awslambda.streamifyResponse. The Vercel AI SDK's streamText().toDataStream() handles the client format:

import { bedrock } from "@ai-sdk/amazon-bedrock";
import { streamText } from "ai";

export const handler = awslambda.streamifyResponse(
  async (event, responseStream, context) => {
    const body = JSON.parse(event.body);

    const stream = streamText({
      model: bedrock("anthropic.claude-3-haiku-20240307-v1:0"),
      messages: body?.messages ?? [],
    }).toDataStream();

    for await (const chunk of stream) {
      responseStream.write(chunk);
    }

    responseStream.end();
  },
);

That's it. Chunks go straight to the client as they're generated. No buffering, constant memory use, real streaming. The Vercel AI SDK handles all the connection stuff on the frontend.

Lambda Function URLs just work with awslambda.streamifyResponse - no extra setup needed. You don't have to mess with WebSockets or server-sent events.

TypeScript Support

The awslambda global isn't in the standard Lambda types, so you'll need these:

// types/aws-lambda-runtime.d.ts
declare global {
  namespace awslambda {
    interface ResponseStream {
      write(chunk: string | Buffer): void;
      end(): void;
    }

    function streamifyResponse<T extends any[], R>(
      handler: (
        event: T[0],
        responseStream: ResponseStream,
        context: import('aws-lambda').Context
      ) => Promise<R> | R
    ): (event: T[0], context: import('aws-lambda').Context) => Promise<R>;
  }
}

export {};

Infrastructure as Code

AWS CDK handles the deployment and gets all the permissions right:

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as nodejs from 'aws-cdk-lib/aws-lambda-nodejs';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';

export class StreamingAIStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Lambda function for streaming AI responses
    const streamingFunction = new nodejs.NodejsFunction(this, 'StreamingAIFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      entry: 'lambda/index.ts',
      timeout: cdk.Duration.minutes(15), // Maximum for streaming
      memorySize: 1024,
    });

    // Add IAM permissions for Bedrock
    streamingFunction.addToRolePolicy(
      new iam.PolicyStatement({
        effect: iam.Effect.ALLOW,
        actions: [
          'bedrock:InvokeModel',
          'bedrock:InvokeModelWithResponseStream',
        ],
        resources: [
          `arn:aws:bedrock:${this.region}::foundation-model/anthropic.claude-3-haiku-20240307-v1:0`,
        ],
      })
    );

    // Create Function URL with streaming support
    const functionUrl = streamingFunction.addFunctionUrl({
      authType: lambda.FunctionUrlAuthType.NONE,
      cors: {
        allowCredentials: false,
        allowedHeaders: ['*'],
        allowedMethods: [lambda.HttpMethod.ALL],
        allowedOrigins: ['*'],
        maxAge: cdk.Duration.hours(1),
      },
    });

    // Outputs
    new cdk.CfnOutput(this, 'FunctionUrl', {
      value: functionUrl.url,
      description: 'Lambda Function URL for direct access',
    });
  }
}

Frontend Integration

The Vercel AI SDK works with all major frameworks. Here's the React version:

import { useChat } from 'ai/react';

function ChatComponent() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: 'https://your-lambda-url.lambda-url.region.on.aws/',
  });

  return (
    <div>
      {messages.map(m => (
        <div key={m.id}>{m.role}: {m.content}</div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}

Advanced Use Cases

This gets more interesting with advanced patterns:

RAG (Retrieval-Augmented Generation): Stream search results and AI responses at the same time. Users see the documents being found while the AI writes its answer.

Tool Usage: When your AI calls APIs or databases, users see each tool being used and results coming back live.

Multi-Step Reasoning: For complex questions, stream each step of the AI's thinking process as it happens.

Conclusion

The Vercel AI SDK and AWS Lambda streaming work really well together. Setup is minimal, it works with any LLM or frontend framework, and gives you proper real-time responses. Add CloudFront if you need global distribution. This handles everything from basic chat to complex RAG and tool workflows.

Follow me on