Lab 10: Security & Compliance for AI Systems
Learn fundamental security practices for protecting your AI receptionist system, including API authentication, rate limiting, secure configuration management, and GDPR compliance basics.
What You'll Do: Understand security fundamentals for AI systems, implement API key authentication, add rate limiting to prevent abuse, create secure environment variable templates, and learn GDPR compliance basics for AI chat data
What You'll Build:
- Secured MLOps service with API key authentication
- Rate-limited endpoints to prevent abuse
- .env.example template for secure configuration
- Security best practices checklist
Lab Collaborators:
- • Edward Lampoh - Software Developer & Collaborator
- • Oluwafemi Adebayo, PhD - Academic Professor & Collaborator
Before starting Lab 10, ensure you have:
- Completed Labs 1-9
- Flask MLOps service running locally
- Basic understanding of HTTP requests and headers
- Familiarity with environment variables
Quick Test
# Verify Flask service is running
curl http://localhost:5001/health
# Should return healthy statusPart A: Understanding Security for AI Systems
Learn the fundamentals of protecting AI applications from threats
Application security is the practice of protecting software applications from threats throughout their lifecycle.
The CIA Triad
Security professionals use the CIA Triad to think about security:
Confidentiality
Definition: Only authorized users can access data
Example: API keys prevent unauthorized access to metrics
Integrity
Definition: Data remains accurate and unmodified
Example: Validate metrics data before storing
Availability
Definition: System remains accessible when needed
Example: Rate limiting prevents DDoS attacks
API Key Leakage
What it is: Accidentally exposing API keys in code, git commits, or logs
Risk: Attackers can use your API keys to access your system, rack up API costs, or steal sensitive data
Real Example:
A student accidentally committed AWS credentials to GitHub. Within 15 minutes, automated bots found the keys and started mining cryptocurrency. Bill: $2,500.
Denial of Service (DoS)
What it is: Overwhelming your service with requests until it crashes
Risk: Without rate limiting, attackers (or bugs!) can make thousands of requests per second, crash your Flask service, max out your AI API quota, and generate huge cloud bills
SQL Injection
What it is: Inserting malicious SQL code through user input
Risk: Attackers can read your entire database, delete all data, or modify business information
# VULNERABLE CODE (Never do this!)
query = f"SELECT * FROM businesses WHERE name = '{user_input}'"
# If user_input is: "'; DROP TABLE businesses; --"
# The query becomes:
# SELECT * FROM businesses WHERE name = ''; DROP TABLE businesses; --'Safe Approach: Always use parameterized queries (which our app already does!)
Security
What: Protecting your system from threats
Example: API key authentication
Who enforces: Attackers exploit weaknesses
Goal: Prevent unauthorized access
Compliance
What: Following laws and regulations
Example: GDPR data protection
Who enforces: Governments fine violations
Goal: Handle data ethically and legally
Part B: Secure Environment Variables
Learn how to manage sensitive configuration securely
Environment variables store sensitive configuration like API keys, database URLs, and secrets.
Bad Practice
# Hardcoded secret in code (NEVER DO THIS!)
API_KEY = "sk-abc123xyz456"
DATABASE_URL = "postgresql://user:password@host/db"Problems:
- Visible in code
- Committed to git history
- Exposed to anyone with code access
- Can't change without redeploying
Good Practice
# Load from environment variables
import os
API_KEY = os.getenv('API_KEY')
DATABASE_URL = os.getenv('DATABASE_URL')Benefits:
- Secrets stay out of code
- Different values per environment (dev, staging, production)
- Easy to rotate/change
- Not committed to git
Check where your secrets are stored:
Verify .env file and .gitignore:
cd mlops-service
# View your .env file (contains real secrets)
cat .env
# Check if .env is in .gitignore
cat ../.gitignore | grep .envThe .env.example file has already been updated in your mlops-service/ directory.
View the template:
# View the template
cat mlops-service/.env.exampleWhat's in .env.example:
- Template with placeholder values
- Comments explaining each variable
- NO real secrets (safe to commit to git)
- Instructions for developers to copy and fill in
Usage for new team members:
# Copy template and fill in actual values
cp .env.example .env
# Then edit .env with your actual credentialsDO:
- Use environment variables for all secrets
- Add .env to .gitignore
- Use different secrets for dev/staging/production
- Rotate secrets regularly (quarterly)
- Use secret management tools in production
DON'T:
- Hardcode secrets in code
- Commit .env files to git
- Share secrets via email or Slack
- Reuse the same secret across environments
- Log sensitive values (API keys, passwords)
Check Git History for Leaked Secrets
# Search entire git history for .env files
git log --all --full-history -- "*/.env"
# If you find any results, the .env file was committed before!Expected: No results (clean history). In real projects with leaks, you'd need to rotate all credentials and use tools like git-secrets or BFG Repo-Cleaner.
Part C: API Rate Limiting
Prevent abuse by restricting request frequency
Rate limiting restricts how many requests a user can make in a time period.
Example:
Rate limit: 100 requests per hour
• User makes 100 requests → All allowed
• User makes 101st request → Blocked! (429 Too Many Requests)
• User waits 1 hour → Counter resets, can make 100 more
Prevent Abuse
Malicious users can't overwhelm your service. Bugs (infinite loops) won't crash your system.
Control Costs
AI APIs charge per request. Rate limits cap your maximum spend.
Ensure Fair Access
One user can't monopolize resources. All users get reasonable access.
Example Scenario Without Rate Limiting:
User 1: Accidentally creates infinite loop
Sends 10,000 requests in 5 minutes
Result:
- Your Flask service crashes
- AI API bill: $500
- Other users can't access serviceOur Flask service now has these rate limits:
/ (Dashboard)
No limit - Public page, static content
/health
100/minute - Health checks should be frequent
/health/detailed
50/minute - More expensive to compute
/metrics
No limit - Prometheus scrapes frequently
/track
100/hour - Metrics tracking, needs protection
/analytics/<id>
50/hour - Database queries, more expensive
/refresh-metrics
10/hour - Expensive operation
Start Flask Service:
cd mlops-service
python app.py
# You should see:
# 🚦 Rate Limiting: ENABLEDTest Normal Request (Within Limit):
# Single request - should work
curl http://localhost:5001/healthExpected: Status 200, JSON response
Test Rate Limit Exceeded:
# Send 101 requests in a loop (exceeds 100/minute limit)
for i in {1..101}; do
curl -s http://localhost:5001/health
echo "Request $i"
doneExpected: First 100 requests succeed (200), 101st request returns 429 Too Many Requests
Response when rate limited:
{
"error": "429 Too Many Requests",
"message": "1 per 1 minute"
}View Rate Limit Headers:
# Check rate limit status in headers
curl -I http://localhost:5001/healthResponse Headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99
X-RateLimit-Reset: 1704567890Our implementation uses flask-limiter:
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
# Initialize rate limiter
limiter = Limiter(
get_remote_address, # Track by IP address
app=app,
default_limits=["200 per day", "50 per hour"],
storage_uri="memory://", # Store in memory
)
# Apply to endpoint
@app.route('/health')
@limiter.limit("100 per minute")
def health_check():
return jsonify({'status': 'healthy'})Key Points:
- Tracked by IP address: Each IP has separate limits
- Memory storage: Limits reset when service restarts
- Per-endpoint: Different limits for different endpoints
Production Considerations:
- Use Redis for persistent storage (limits survive restarts)
- Track by user ID instead of IP (more accurate)
- Implement tiered limits (free users: 100/day, paid: 10,000/day)
Part D: API Key Authentication
Protect sensitive endpoints with API key authentication
API keys are secret tokens that identify and authenticate applications.
Analogy:
An API key is like a house key:
- You need the right key to enter
- Don't share your key with strangers
- Change locks if key is stolen
Example API Key:
MLOPS_API_KEY=9a7f2c8e5d1b3f6a4e8c9d2f1a7b5c3eWithout Authentication:
User → Request → Flask → Always accepts
With API Key Authentication:
User → Request with X-API-Key header → Flask → Check key → Allow/Deny
Flow:
- User includes
X-API-Key: your-key-herein request header - Flask checks if key matches
MLOPS_API_KEYenvironment variable - If match: Allow request
- If no match: Return 401 Unauthorized or 403 Forbidden
Generate a random API key:
# Mac/Linux
openssl rand -hex 32
# Output example:
# 9a7f2c8e5d1b3f6a4e8c9d2f1a7b5c3e8f4d2a6b9c1e7f3a5d8b2c6e4f1a3d7Add your generated API key to .env:
cd mlops-service
nano .env # or use your preferred editorAdd this line:
MLOPS_API_KEY=your-generated-key-hereSave and restart Flask service:
# Stop current service (Ctrl+C)
# Start again
python app.py
# You should see:
# 🔐 Security: API Key Authentication ENABLEDRequest Without API Key (Unauthorized):
# Try to track metrics without API key
curl -X POST http://localhost:5001/track \
-H "Content-Type: application/json" \
-d '{
"business_id": "test",
"response_time_ms": 100,
"tokens_used": 50
}'Expected Response (401):
{
"error": "Unauthorized",
"message": "API key required. Include X-API-Key header."
}Request With Wrong API Key (Forbidden):
# Try with incorrect key
curl -X POST http://localhost:5001/track \
-H "Content-Type: application/json" \
-H "X-API-Key: wrong-key-here" \
-d '{
"business_id": "test",
"response_time_ms": 100,
"tokens_used": 50
}'Expected Response (403):
{
"error": "Forbidden",
"message": "Invalid API key"
}Request With Correct API Key (Success):
# Replace YOUR_API_KEY with your actual key from .env
curl -X POST http://localhost:5001/track \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{
"business_id": "test-business",
"response_time_ms": 150,
"tokens_used": 75,
"api_cost_usd": 0.002,
"intent_detected": "appointment",
"response_type": "booking",
"appointment_requested": false
}'Expected Response (200):
{
"status": "success",
"message": "Metrics tracked successfully",
"prometheus_updated": true,
"database_stored": true,
"timestamp": "2024-01-15T10:30:00.000000"
}Protected Endpoints (API Key Required):
- /track - Metrics submission
- /analytics/<business_id> - Analytics data
- /refresh-metrics - Metrics refresh
Public Endpoints (No API Key):
- / - Dashboard
- /health - Health check
- /metrics - Prometheus metrics
Part E: GDPR Compliance Basics
Learn fundamental privacy and compliance requirements for AI systems
GDPR (General Data Protection Regulation) is a European law that protects people's personal data.
Applies When:
- Your users are in the European Union
- Your company is based in the EU
- You process EU citizens' data
Personal Data (Requires Protection):
- User Messages: "I want to book an appointment"
- Email Addresses: john@example.com
- Names: "John Smith"
- Phone Numbers: +1-555-1234
Non-Personal Data (Anonymous):
- Business ID: uuid-abc-123
- Metrics: Response time, token count
Lawful Basis for Processing
Question: Why are you collecting this data?
Our Answer: Providing the service (appointment booking)
Consent
Requirement: Users must agree to data collection
Implementation example:
☑ I agree to allow this chat to be stored for service purposes.
Privacy Policy
Right to Access
Requirement: Users can request their data
Implementation: Provide endpoint to export user's chat history
Right to Deletion
Requirement: Users can request data deletion
Implementation: Provide "delete my data" button/endpoint
Data Retention
Requirement: Don't keep data longer than necessary
Implementation: Delete chat logs after 90 days
For a college project, we'll implement basic compliance:
Inform Users
Add privacy notice: "Your chat messages are stored to provide this service"
Link to privacy policy (can be simple)
Data Minimization
Only collect what's needed (email, name for appointments)
Don't collect unnecessary data (age, address, etc.)
Data Retention
Plan to delete old chats (conceptual for now)
Production would implement: DELETE FROM chats WHERE created_at < NOW() - INTERVAL '90 days'
Secure Storage
Database uses SSL (?sslmode=require in DATABASE_URL)
Environment variables secure API keys
SQL Query to Delete Old Chat Logs:
-- Delete chat messages older than 90 days
DELETE FROM chat_messages
WHERE created_at < NOW() - INTERVAL '90 days';
-- Delete old AI metrics
DELETE FROM ai_metrics
WHERE created_at < NOW() - INTERVAL '90 days';Scheduled Job (Production):
- Run this query daily via cron job or AWS Lambda
- Log deletions for compliance audit trail
DO:
- Be transparent about data collection
- Provide privacy policy
- Implement data retention policies
- Use encryption (HTTPS, SSL database connections)
- Give users control over their data
DON'T:
- Collect data you don't need
- Share data with third parties without consent
- Keep data indefinitely
- Log sensitive information (passwords, API keys)
Part F: Security Best Practices Checklist
Production-ready security measures for AI systems
Use this checklist to verify your AI application is secured:
Environment Variables
- All secrets in environment variables (not hardcoded)
- .env file in .gitignore
- .env.example template in repository
- Different secrets for dev/staging/production
- Secrets rotated quarterly
API Security
- API key authentication on sensitive endpoints
- Rate limiting enabled
- HTTPS only (in production)
- CORS configured properly
- Input validation on all endpoints
Database Security
- SSL/TLS encryption enabled (?sslmode=require)
- Parameterized queries (no SQL injection)
- Database credentials in environment variables
- Least privilege principle (app user can't drop tables)
Logging & Monitoring
- Log authentication attempts
- Don't log sensitive data (API keys, passwords)
- Monitor for suspicious activity
- Set up alerts for security events
Compliance
- Privacy policy published
- User consent for data collection
- Data retention policy defined
- User data export/deletion available
What We Implemented in Lab 10:
- API key authentication
- Rate limiting
- Secure environment variables
- .env.example template
- GDPR awareness
What's Left for Real Production:
- OAuth 2.0 / JWT (Industry-standard auth)
- Data Encryption at Rest
- WAF (Web Application Firewall)
- Security Audit Logging
- Penetration Testing
- Container Scanning
- HTTPS Certificates (SSL/TLS)
Flask won't start after adding flask-limiter:
Error: ModuleNotFoundError: No module named 'flask_limiter'
cd mlops-service
pip install flask-limiter==3.5.0API key authentication not working:
Symptoms: All requests return 401 Unauthorized, even correct API key fails
Check:
- Is MLOPS_API_KEY set in .env?
cat .env | grep MLOPS_API_KEY - Did you restart Flask after updating .env?
- Is the API key in the header correct?
Rate limit headers not showing:
Symptoms: Requests work but no X-RateLimit-* headers
This is normal! Flask-Limiter only adds headers when rate limit is close to being hit or after hitting the limit.
Development mode - API key not required:
Symptoms: Requests work without API key, logs show "API Key Authentication DISABLED"
This is by design! If MLOPS_API_KEY is not set in .env, authentication is disabled for easier development. To enable: generate API key, add to .env, restart Flask.
Cannot test API key with curl:
Symptoms: curl commands too complex, syntax errors
# Create test file with API key
echo "YOUR_API_KEY" > api_key.txt
# Use file in request
curl -X POST http://localhost:5001/track \
-H "Content-Type: application/json" \
-H "X-API-Key: $(cat api_key.txt)" \
-d '{"business_id":"test","response_time_ms":100,"tokens_used":50}'Congratulations! You've implemented fundamental security practices for your AI application. Here's what you accomplished:
Security Concepts Learned
- CIA Triad: Confidentiality, Integrity, Availability
- Common Threats: API key leaks, DoS, SQL injection
- Security vs Compliance: Understanding the difference
Practical Implementation
- API key authentication for protected endpoints
- Rate limiting to prevent abuse
- Secure environment variable management
- .env.example template for team collaboration
GDPR Basics
- What personal data is
- User rights (access, deletion, consent)
- Data retention policies
- Privacy best practices
Key Takeaways
- Never commit secrets to git - Use environment variables
- Rate limiting prevents abuse - Protects against DoS and controls costs
- Authentication protects sensitive endpoints - API keys are simplest form
- GDPR requires transparency - Tell users what data you collect
- Security is a process, not a product - Continuous improvement
Test Your Knowledge
Complete the Lab 10 quiz to test your understanding of security and compliance for AI systems.
Quiz Submission Checklist:
- Complete all 5 multiple-choice questions
- Take a screenshot of your results page showing:
- Your name
- Your score (aim for 4/5 or 5/5)
- Session ID
- Timestamp
- Submit screenshot as proof of completion