Lab 8: Serverless Deployment with AWS Lambda
Convert your Flask MLOps service to serverless architecture using AWS Lambda and API Gateway for cost-effective, auto-scaling deployment.
What You'll Do: Convert Flask MLOps service to AWS Lambda, set up API Gateway, deploy serverless functions, and compare performance with EC2 containerized deployment
Lab Collaborators:
- • Edward Lampoh - Software Developer & Collaborator
- • Oluwafemi Adebayo, PhD - Academic Professor & Collaborator
Before starting Lab 8, ensure you have:
- Flask MLOps service running on EC2 from Lab 7
- Docker image built from Lab 5
- Next.js deployed to Vercel from Lab 7
- AWS account with active credentials
- Familiarity with AWS Console
🔍 Quick Test
# Verify EC2 deployment is working
curl http://YOUR_EC2_PUBLIC_IP:5001/health
# Should return healthy statusPart A: Understanding Serverless Architecture
Learn what serverless means and why it's revolutionary for modern applications
Serverless doesn't mean "no servers" - it means you don't manage servers. AWS handles all infrastructure, you just upload your code.
💡 Simple Analogy:
Think of serverless like electricity:
Traditional Servers (EC2) = Owning a generator
- You maintain it
- It runs 24/7 even when not needed
- Fixed costs whether you use it or not
Serverless (Lambda) = Using the power grid
- No maintenance
- Only use (and pay for) what you need
- Scales automatically
- Pay per millisecond of use
EC2 (Lab 7 Architecture)
- ❌ Server runs 24/7 (even when idle)
- ❌ You manage OS updates, security patches
- ❌ Manual scaling configuration
- ❌ Fixed capacity (can't handle sudden traffic spikes well)
- ✅ Full control over environment
- ✅ Consistent performance (no cold starts)
Lambda (Lab 8 Architecture)
- ✅ Only runs when needed (triggered by requests)
- ✅ AWS manages all infrastructure
- ✅ Automatic scaling (handles 1 or 1 million requests)
- ✅ Pay only for execution time
- ❌ Cold starts (first request may be slower)
- ❌ 15-minute execution limit
- ❌ Limited control over environment
Use EC2 when:
- Consistent, predictable traffic 24/7
- Long-running processes (>15 minutes)
- Need specific OS configurations
- Complex stateful applications
Use Lambda when:
- Sporadic or unpredictable traffic
- Event-driven workloads (API requests, file uploads)
- Quick processing tasks (<15 minutes)
- Want to minimize operational overhead
- Cost optimization is important
What We're Building:
Users (Internet)
↓
Next.js App (Vercel)
↓
API Gateway (AWS)
↓
Lambda Function (Flask MLOps Logic)
↓
Neon PostgreSQL (Serverless Database)Service Breakdown:
- API Gateway: HTTP endpoint that triggers Lambda
- Lambda Function: Flask MLOps code packaged as serverless function
- Neon Database: Already serverless, perfect match!
What is Lambda?
- Run code without provisioning servers
- Supports Python, Node.js, Java, Go, and more
- Pay per 100ms of execution time
- Automatically scales from 0 to thousands of concurrent executions
Lambda Free Tier (12 months):
- 1 million free requests/month
- 400,000 GB-seconds of compute/month
- For our MLOps service, this is more than enough!
💡 Example Cost Calculation:
- 1,000 requests/day = 30,000/month (well under 1 million)
- Average execution: 200ms, 512MB memory = 0.1 GB-seconds per request
- Total: 3,000 GB-seconds/month (well under 400,000 limit)
- Cost: $0.00 (within free tier)
Part B: Prepare Flask Code for Lambda
Adapt your Flask application to run as a serverless function
Traditional Flask (EC2):
# app.py runs as a server
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5001)Lambda Flask:
# Lambda calls a handler function for each request
def lambda_handler(event, context):
# Process the HTTP request
return responseserverless-wsgi to bridge Flask and Lambda!Create a new file in your local mlops-service/ directory:
File: mlops-service/lambda_function.py
"""
AWS Lambda handler for Flask MLOps Service
Converts Flask WSGI app to Lambda-compatible function
"""
import serverless_wsgi
from app import app
def lambda_handler(event, context):
"""
AWS Lambda handler function
Args:
event: API Gateway request event
context: Lambda execution context
Returns:
API Gateway response format
"""
return serverless_wsgi.handle_request(app, event, context)What this does:
- Imports your existing Flask app
- Uses
serverless-wsgito convert WSGI (Flask) to Lambda format - AWS calls
lambda_handler()for each request
Your mlops-service/requirements.txt needs a new dependency:
Add this line to your existing requirements.txt:
# Existing dependencies
flask==3.0.0
flask-cors==4.0.0
prometheus-client==0.19.0
psycopg2-binary==2.9.9
python-dotenv==1.0.0
# Add for Lambda deployment
serverless-wsgi==3.0.3What is serverless-wsgi?
- Bridges Flask (WSGI) applications to AWS Lambda
- Handles request/response conversion
- Industry-standard for Flask on Lambda
Before deploying to AWS, test the handler works:
# Install new dependency
cd mlops-service
pip install serverless-wsgi==3.0.3
# Test import works
python -c "from lambda_function import lambda_handler; print('Handler imported successfully!')"Part C: Package Lambda Function
Create a deployment package with all dependencies
Lambda needs a ZIP file containing:
- Your application code (
app.py,lambda_function.py) - All Python dependencies (Flask, prometheus-client, etc.)
- Must be structured correctly for Lambda to find the handler
On your local machine, create the deployment package:
Mac/Linux:
# Navigate to mlops-service directory
cd mlops-service
# Create a clean directory for the package
mkdir -p lambda-package
cd lambda-package
# Install dependencies into this directory
pip install -r ../requirements.txt -t .
# Copy application files
cp ../app.py .
cp ../lambda_function.py .
# Create ZIP file
zip -r ../lambda-deployment.zip .
# Go back to mlops-service directory
cd ..
# Verify ZIP was created
ls -lh lambda-deployment.zipWindows (PowerShell):
# Navigate to mlops-service directory
cd mlops-service
# Create a clean directory for the package
New-Item -ItemType Directory -Force -Path lambda-package
cd lambda-package
# Install dependencies into this directory
pip install -r ..\requirements.txt -t .
# Copy application files
Copy-Item ..\app.py .
Copy-Item ..\lambda_function.py .
# Create ZIP file (requires PowerShell 5.0+)
Compress-Archive -Path .\* -DestinationPath ..\lambda-deployment.zip -Force
# Go back to mlops-service directory
cd ..
# Verify ZIP was created
Get-Item lambda-deployment.ziplambda-deployment.zip in your mlops-service/ directory!If your package is too large, use Docker (from Lab 5):
Lambda also supports Docker images! You can deploy your existing Docker image directly:
# Tag your image for AWS ECR (Elastic Container Registry)
docker tag mlops-service:latest <your-ecr-repo-url>/mlops-service:lambda
# Push to ECR (requires AWS CLI configured)
docker push <your-ecr-repo-url>/mlops-service:lambdaWe'll stick with ZIP for simplicity in this lab.
Part D: Create Lambda Function in AWS
Deploy your function to AWS Lambda
Access AWS Lambda:
- Sign in to AWS Console
- Search for "Lambda" in the search bar
- Click "Lambda" to open the Lambda console
- Make sure you're in the same region as Lab 7 (e.g., us-east-1)
Create a new function:
- Click "Create function" button
- Select "Author from scratch"
- Configure:
- Function name:
mlops-service-lambda - Runtime: Python 3.11
- Architecture: x86_64
- Permissions: Create a new role with basic Lambda permissions
- Function name:
- Click "Create function"
Upload your ZIP file:
- In the function page, scroll to "Code source" section
- Click "Upload from" dropdown
- Select ".zip file"
- Click "Upload"
- Select your
lambda-deployment.zipfile - Click "Save"
Wait for upload to complete (may take 30-60 seconds for large files)
app.py, lambda_function.py) in the code editorSet the Lambda handler:
- Scroll to "Runtime settings" section
- Click "Edit"
- Set Handler to:
lambda_function.lambda_handler - Click "Save"
What this does: Tells Lambda to call lambda_handler() function from lambda_function.py
Add your environment variables:
- Click "Configuration" tab
- Click "Environment variables" in left sidebar
- Click "Edit"
- Add variables (click "Add environment variable" for each):
DATABASE_URL=your_neon_database_url_here
FLASK_ENV=production
FLASK_DEBUG=False
SERVICE_PORT=5001
PROMETHEUS_PORT=8001
ENVIRONMENT=production- Click "Save"
.env file!Adjust function settings:
- Stay in "Configuration" tab
- Click "General configuration" in left sidebar
- Click "Edit"
- Set:
- Memory: 512 MB (enough for Flask + Prometheus)
- Timeout: 30 seconds (longer than default 3 seconds)
- Click "Save"
💡 Why these values?
- 512 MB: Sufficient for Flask app and dependencies
- 30 seconds: Enough time for database queries and metric processing
Part E: Set Up API Gateway
Create HTTP endpoint to trigger your Lambda function
API Gateway creates a public HTTP endpoint that triggers your Lambda function.
Flow:
User Request (https://your-api.execute-api.us-east-1.amazonaws.com/prod/health)
↓
API Gateway (receives HTTP request)
↓
Lambda Function (processes request)
↓
API Gateway (returns HTTP response)
↓
User (receives response)From Lambda console:
- In your Lambda function page, click "Add trigger"
- Select "API Gateway"
- Configure:
- API type: HTTP API (simpler than REST API)
- Security: Open (we'll add security later if needed)
- Click "Add"
Copy your API endpoint:
- In the "Triggers" section, click on the API Gateway trigger
- You'll see "API endpoint" URL like:
https://abc123xyz.execute-api.us-east-1.amazonaws.com/default/mlops-service-lambdaCopy this URL - this is your new MLOps service endpoint!
Test the health endpoint:
# Replace with your actual API Gateway URL
curl https://YOUR_API_GATEWAY_URL
# Expected: Health check JSON responseTest the /health endpoint specifically:
curl https://YOUR_API_GATEWAY_URL/healthExpected Response:
{
"status": "healthy",
"service": "mlops-service-prometheus",
"timestamp": "2024-01-15T10:30:00.000000",
"monitoring": "prometheus",
"metrics_endpoint": "/metrics",
"environment": "production"
}Test Prometheus metrics:
curl https://YOUR_API_GATEWAY_URL/metricsYou should see Prometheus metrics output:
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="11"...} 1.0
...Part F: Connect Vercel to Lambda
Update your Next.js app to use the new serverless endpoint
Switch from EC2 to Lambda:
- Go to vercel.com
- Click on your project
- Go to Settings → Environment Variables
- Find
MLOPS_SERVICE_URL - Click "Edit"
- Update to your API Gateway URL:
https://YOUR_API_GATEWAY_URL- Click "Save"
/health or any path - just the base URL!Redeploy to use new environment variable:
- Go to Deployments tab
- Click on the latest deployment
- Click "Redeploy" button
- Wait for deployment to complete
Test the complete serverless flow:
- Visit your Vercel URL (e.g.,
https://your-app.vercel.app) - Create a new business or use existing
- Open the chat interface
- Send a message to the AI
- Check if metrics are tracked
Verify metrics on Lambda:
# Check metrics endpoint
curl https://YOUR_API_GATEWAY_URL/metrics | grep ai_requests_total
# Should show incremented counter✅ Success Indicators:
- AI chat responds on Vercel
- Metrics endpoint shows updated counters
- No errors in browser console
- Lambda executes successfully
Check Lambda CloudWatch logs:
- Go to Lambda console
- Click on your function
- Click "Monitor" tab
- Click "View CloudWatch logs"
- Click on the latest log stream
You should see:
- Incoming requests
- Prometheus metrics updates
- Any errors or warnings
Part G: Performance Comparison
Compare EC2 vs Lambda performance and costs
Test EC2 response time:
# Time 10 requests to EC2
for i in {1..10}; do
time curl -s http://YOUR_EC2_IP:5001/health > /dev/null
doneTest Lambda response time:
# Time 10 requests to Lambda
for i in {1..10}; do
time curl -s https://YOUR_API_GATEWAY_URL/health > /dev/null
done💡 Expected Results:
- EC2 First Request: ~50-100ms (consistent)
- EC2 Subsequent: ~50-100ms (consistent)
- Lambda First Request: ~500-2000ms (cold start)
- Lambda Subsequent: ~50-150ms (warm)
What is a Cold Start?
When Lambda hasn't been used for ~5-15 minutes, AWS pauses the function. The next request must:
- Start a new execution environment
- Load your code
- Initialize Python and libraries
- Then process the request
Warm Starts:
After the first request, Lambda keeps the environment "warm" for ~15 minutes. Subsequent requests are fast (~50-150ms).
💡 Mitigation Strategies:
- Scheduled pings: Keep function warm with CloudWatch Events
- Provisioned concurrency: AWS keeps environments ready (costs extra)
- Accept the trade-off: For low-traffic apps, occasional cold starts are acceptable
EC2 Cost (t2.micro)
- Free Tier: 750 hours/month for 12 months
- After Free Tier: ~$8-10/month (24/7 operation)
- Fixed cost regardless of usage
Lambda Cost
- Free Tier: 1M requests + 400,000 GB-seconds/month (forever)
- After Free Tier: $0.20 per 1M requests + $0.0000166667 per GB-second
- Variable cost based on actual usage
Example Scenario (1,000 requests/day):
EC2:
- Monthly cost: $0 (free tier) or $8-10 (after free tier)
- Runs 24/7 even if no requests
Lambda:
- Requests: 30,000/month × $0.20/1M = $0.006
- Compute: 3,000 GB-seconds × $0.0000166667 = $0.05
- Total: ~$0.06/month (well within free tier = $0)
- Only runs when triggered
💡 Winner for low-traffic apps: Lambda (much cheaper)
Use EC2 when:
- High, consistent traffic (thousands of requests/hour)
- Long-running processes
- Need predictable latency (no cold starts)
- Complex state management
Use Lambda when:
- Low to moderate traffic
- Unpredictable traffic patterns
- Cost optimization is priority
- Cold starts are acceptable
- Event-driven architecture
For our AI receptionist:
- During development/testing: Lambda (cheaper)
- High-traffic production: EC2 (better performance)
- Low-traffic production: Lambda (cost-effective)
Part H: Clean Up and Resource Management
Manage your AWS resources to optimize costs
You now have TWO deployments:
- EC2 instance with Docker (Lab 7)
- Lambda function (Lab 8)
Decision Time:
Option 1: Keep Both (Recommended for Learning)
- Compare performance in real-world usage
- Learn the trade-offs firsthand
- Total cost: Still in free tier!
Option 2: Stop EC2, Use Lambda Only
- Save EC2 hours for other projects
- Simpler to manage one deployment
- Cheaper after free tier expires
Option 3: Stop Lambda, Use EC2 Only
- More consistent performance
- Better for high-traffic scenarios
If you want to pause EC2 to save hours:
# Via AWS Console:
# 1. Go to EC2 → Instances
# 2. Select mlops-service-production
# 3. Instance state → Stop instance
# 4. Confirm
# This STOPS the instance (can restart later)
# Does NOT delete itTo restart later:
# Instance state → Start instance
# Get new public IP (changes after stop/start)
# Update Vercel MLOPS_SERVICE_URL if switching backIf you want to remove Lambda:
- Go to Lambda console
- Select your function
- Actions → Delete
- Type "delete" to confirm
Check Lambda usage:
- Lambda console → Functions
- Click "Monitor" tab
- View invocations, duration, errors
Check EC2 usage:
- EC2 console → Instances
- Check instance hours used
Check free tier usage:
- AWS Console → Billing
- Free Tier → View usage
- See Lambda requests and EC2 hours remaining
Part I: Cold Start Optimization (Optional)
Reduce Lambda cold start times
Keep Lambda warm with CloudWatch Events:
Create EventBridge Rule:
- Go to AWS Console → EventBridge
- Create rule → Schedule
- Configure:
- Name:
mlops-lambda-warmer - Schedule: Rate expression:
rate(5 minutes)
- Name:
- Target: Lambda function →
mlops-service-lambda - Create
AWS can keep environments ready 24/7:
- Lambda console → Your function
- Configuration → Provisioned concurrency
- Set desired number of ready environments
Reduce cold start time by optimizing imports:
# BAD: Import everything at module level
import heavy_library # Loaded during cold start
def lambda_handler(event, context):
heavy_library.do_something()
# GOOD: Import only when needed
def lambda_handler(event, context):
import heavy_library # Loaded only when called
heavy_library.do_something()For our app, the difference is minimal, but good to know!
Lambda function returns 502 Bad Gateway:
Check handler configuration:
- Ensure handler is
lambda_function.lambda_handler - Verify
lambda_function.pyis in the ZIP root
Lambda times out:
Increase timeout:
- Configuration → General configuration → Timeout → 30 seconds
- Check CloudWatch logs for specific errors
Environment variables not working:
Verify configuration:
- Configuration → Environment variables
- Check DATABASE_URL includes
?sslmode=require
Deployment package too large:
Reduce size or use Docker:
- Remove unnecessary files from package
- Or deploy Docker image to ECR and use container image
Cold starts are too slow:
Optimization options:
- Set up EventBridge warming (Part I)
- Reduce package size
- Optimize Python imports
Can't connect to database from Lambda:
Check VPC settings:
- Lambda functions can access public internet by default
- Neon is publicly accessible, should work
- Verify DATABASE_URL is correct
Congratulations! You've successfully deployed your Flask MLOps service as a serverless function. Here's what you accomplished:
✅ Serverless Skills Gained
- Lambda Fundamentals: Function-as-a-Service deployment
- API Gateway: HTTP endpoints for serverless functions
- Serverless Architecture: Event-driven, auto-scaling design
- Cost Optimization: Pay-per-use pricing model
- Performance Analysis: EC2 vs Lambda trade-offs
🚀 What You Built
- Serverless MLOps Service: Flask running on AWS Lambda
- API Gateway Endpoint: Public HTTPS endpoint for your service
- Auto-Scaling: Handles 1 to 1,000,000 requests automatically
- Cost-Effective: $0 within free tier, pennies beyond
- Production Comparison: Two deployment strategies to compare
🔑 Key Takeaways
- Serverless = No server management, not "no servers"
- Lambda scales automatically from 0 to millions of requests
- Cold starts are real (~1-2 seconds for first request)
- Cost-effective for low-traffic or unpredictable workloads
- Trade-offs exist between serverless and traditional deployments
- Choose the right tool based on your requirements
📝 Test Your Knowledge
Complete the Lab 8 quiz to test your understanding of serverless deployment with AWS Lambda and API Gateway.
Take Lab 8 Quiz →