Security•Security & Compliance for AI Systems

Lab 10: Security & Compliance for AI Systems

Learn fundamental security practices for protecting your AI receptionist system, including API authentication, rate limiting, secure configuration management, and GDPR compliance basics.

Lab Overview

What You'll Do: Understand security fundamentals for AI systems, implement API key authentication, add rate limiting to prevent abuse, create secure environment variable templates, and learn GDPR compliance basics for AI chat data

What You'll Build:

Secured MLOps service with API key authentication
Rate-limited endpoints to prevent abuse
.env.example template for secure configuration
Security best practices checklist

Lab Collaborators:

• Edward Lampoh - Software Developer & Collaborator
• Oluwafemi Adebayo, PhD - Academic Professor & Collaborator

Note: This lab focuses on practical, fundamental security appropriate for a college course. Production systems require additional security measures beyond this lab's scope.

Prerequisites Required

Complete Labs 1-9 before starting

You must complete Labs 1-9 before starting Lab 10.

Before starting Lab 10, ensure you have:

Completed Labs 1-9
Flask MLOps service running locally
Basic understanding of HTTP requests and headers
Familiarity with environment variables

Quick Test

# Verify Flask service is running
curl http://localhost:5001/health

# Should return healthy status

Part A: Understanding Security for AI Systems

Learn the fundamentals of protecting AI applications from threats

1. What is Application Security?

Application security is the practice of protecting software applications from threats throughout their lifecycle.

The CIA Triad

Security professionals use the CIA Triad to think about security:

Confidentiality

Definition: Only authorized users can access data
Example: API keys prevent unauthorized access to metrics

Integrity

Definition: Data remains accurate and unmodified
Example: Validate metrics data before storing

Availability

Definition: System remains accessible when needed
Example: Rate limiting prevents DDoS attacks

2. Common Security Threats for AI Applications

API Key Leakage

What it is: Accidentally exposing API keys in code, git commits, or logs

Risk: Attackers can use your API keys to access your system, rack up API costs, or steal sensitive data

Real Example:

A student accidentally committed AWS credentials to GitHub. Within 15 minutes, automated bots found the keys and started mining cryptocurrency. Bill: $2,500.

Denial of Service (DoS)

What it is: Overwhelming your service with requests until it crashes

Risk: Without rate limiting, attackers (or bugs!) can make thousands of requests per second, crash your Flask service, max out your AI API quota, and generate huge cloud bills

SQL Injection

What it is: Inserting malicious SQL code through user input

Risk: Attackers can read your entire database, delete all data, or modify business information

# VULNERABLE CODE (Never do this!)
query = f"SELECT * FROM businesses WHERE name = '{user_input}'"

# If user_input is: "'; DROP TABLE businesses; --"
# The query becomes:
# SELECT * FROM businesses WHERE name = ''; DROP TABLE businesses; --'

Safe Approach: Always use parameterized queries (which our app already does!)

3. Security vs Compliance

Security

What: Protecting your system from threats

Example: API key authentication

Who enforces: Attackers exploit weaknesses

Goal: Prevent unauthorized access

Compliance

What: Following laws and regulations

Example: GDPR data protection

Who enforces: Governments fine violations

Goal: Handle data ethically and legally

Both are important! Security keeps attackers out. Compliance keeps you out of legal trouble.

Part B: Secure Environment Variables

Learn how to manage sensitive configuration securely

1. Why Environment Variables Matter

Environment variables store sensitive configuration like API keys, database URLs, and secrets.

Bad Practice

# Hardcoded secret in code (NEVER DO THIS!)
API_KEY = "sk-abc123xyz456"
DATABASE_URL = "postgresql://user:password@host/db"

Problems:

Visible in code
Committed to git history
Exposed to anyone with code access
Can't change without redeploying

Good Practice

# Load from environment variables
import os
API_KEY = os.getenv('API_KEY')
DATABASE_URL = os.getenv('DATABASE_URL')

Benefits:

Secrets stay out of code
Different values per environment (dev, staging, production)
Easy to rotate/change
Not committed to git

2. Review Current Setup

Check where your secrets are stored:

Verify .env file and .gitignore:

cd mlops-service

# View your .env file (contains real secrets)
cat .env

# Check if .env is in .gitignore
cat ../.gitignore | grep .env

Expected: .env should be listed in .gitignore and won't be committed.

3. Create .env.example Template

The .env.example file has already been updated in your mlops-service/ directory.

View the template:

# View the template
cat mlops-service/.env.example

What's in .env.example:

Template with placeholder values
Comments explaining each variable
NO real secrets (safe to commit to git)
Instructions for developers to copy and fill in

Usage for new team members:

# Copy template and fill in actual values
cp .env.example .env
# Then edit .env with your actual credentials

4. Security Best Practices for Secrets

DO:

Use environment variables for all secrets
Add .env to .gitignore
Use different secrets for dev/staging/production
Rotate secrets regularly (quarterly)
Use secret management tools in production

DON'T:

Hardcode secrets in code
Commit .env files to git
Share secrets via email or Slack
Reuse the same secret across environments
Log sensitive values (API keys, passwords)

Check Git History for Leaked Secrets

# Search entire git history for .env files
git log --all --full-history -- "*/.env"

# If you find any results, the .env file was committed before!

Expected: No results (clean history). In real projects with leaks, you'd need to rotate all credentials and use tools like git-secrets or BFG Repo-Cleaner.

Part C: API Rate Limiting

Prevent abuse by restricting request frequency

1. What is Rate Limiting?

Rate limiting restricts how many requests a user can make in a time period.

Example:

Rate limit: 100 requests per hour

• User makes 100 requests → All allowed

• User makes 101st request → Blocked! (429 Too Many Requests)

• User waits 1 hour → Counter resets, can make 100 more

2. Why Rate Limiting is Important

Prevent Abuse

Malicious users can't overwhelm your service. Bugs (infinite loops) won't crash your system.

Control Costs

AI APIs charge per request. Rate limits cap your maximum spend.

Ensure Fair Access

One user can't monopolize resources. All users get reasonable access.

Example Scenario Without Rate Limiting:

User 1: Accidentally creates infinite loop
        Sends 10,000 requests in 5 minutes

Result:
- Your Flask service crashes
- AI API bill: $500
- Other users can't access service

3. Understanding Our Rate Limits

Our Flask service now has these rate limits:

/ (Dashboard)

No limit - Public page, static content

/health

100/minute - Health checks should be frequent

/health/detailed

50/minute - More expensive to compute

/metrics

No limit - Prometheus scrapes frequently

/track

100/hour - Metrics tracking, needs protection

/analytics/<id>

50/hour - Database queries, more expensive

/refresh-metrics

10/hour - Expensive operation

Default for other endpoints: 200/day, 50/hour

4. Test Rate Limiting

Start Flask Service:

cd mlops-service
python app.py

# You should see:
# 🚦 Rate Limiting: ENABLED

Test Normal Request (Within Limit):

# Single request - should work
curl http://localhost:5001/health

Expected: Status 200, JSON response

Test Rate Limit Exceeded:

# Send 101 requests in a loop (exceeds 100/minute limit)
for i in {1..101}; do
  curl -s http://localhost:5001/health
  echo "Request $i"
done

Expected: First 100 requests succeed (200), 101st request returns 429 Too Many Requests

Response when rate limited:

{
  "error": "429 Too Many Requests",
  "message": "1 per 1 minute"
}

View Rate Limit Headers:

# Check rate limit status in headers
curl -I http://localhost:5001/health

Response Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99
X-RateLimit-Reset: 1704567890

5. How Rate Limiting Works

Our implementation uses flask-limiter:

from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

# Initialize rate limiter
limiter = Limiter(
    get_remote_address,  # Track by IP address
    app=app,
    default_limits=["200 per day", "50 per hour"],
    storage_uri="memory://",  # Store in memory
)

# Apply to endpoint
@app.route('/health')
@limiter.limit("100 per minute")
def health_check():
    return jsonify({'status': 'healthy'})

Key Points:

Tracked by IP address: Each IP has separate limits
Memory storage: Limits reset when service restarts
Per-endpoint: Different limits for different endpoints

Production Considerations:

Use Redis for persistent storage (limits survive restarts)
Track by user ID instead of IP (more accurate)
Implement tiered limits (free users: 100/day, paid: 10,000/day)

Part D: API Key Authentication

Protect sensitive endpoints with API key authentication

1. What are API Keys?

API keys are secret tokens that identify and authenticate applications.

Analogy:

An API key is like a house key:

You need the right key to enter
Don't share your key with strangers
Change locks if key is stolen

Example API Key:

MLOPS_API_KEY=9a7f2c8e5d1b3f6a4e8c9d2f1a7b5c3e

2. How API Key Authentication Works

Without Authentication:

User → Request → Flask → Always accepts

With API Key Authentication:

User → Request with X-API-Key header → Flask → Check key → Allow/Deny

Flow:

User includes X-API-Key: your-key-here in request header
Flask checks if key matches MLOPS_API_KEY environment variable
If match: Allow request
If no match: Return 401 Unauthorized or 403 Forbidden

3. Generate Secure API Key

Generate a random API key:

# Mac/Linux
openssl rand -hex 32

# Output example:
# 9a7f2c8e5d1b3f6a4e8c9d2f1a7b5c3e8f4d2a6b9c1e7f3a5d8b2c6e4f1a3d7

Copy this key - you'll need it in the next step!

4. Configure API Key

Add your generated API key to .env:

cd mlops-service
nano .env  # or use your preferred editor

Add this line:

MLOPS_API_KEY=your-generated-key-here

Save and restart Flask service:

# Stop current service (Ctrl+C)
# Start again
python app.py

# You should see:
# 🔐 Security: API Key Authentication ENABLED

5. Test API Key Authentication

Request Without API Key (Unauthorized):

# Try to track metrics without API key
curl -X POST http://localhost:5001/track \
  -H "Content-Type: application/json" \
  -d '{
    "business_id": "test",
    "response_time_ms": 100,
    "tokens_used": 50
  }'

Expected Response (401):

{
  "error": "Unauthorized",
  "message": "API key required. Include X-API-Key header."
}

Request With Wrong API Key (Forbidden):

# Try with incorrect key
curl -X POST http://localhost:5001/track \
  -H "Content-Type: application/json" \
  -H "X-API-Key: wrong-key-here" \
  -d '{
    "business_id": "test",
    "response_time_ms": 100,
    "tokens_used": 50
  }'

Expected Response (403):

{
  "error": "Forbidden",
  "message": "Invalid API key"
}

Request With Correct API Key (Success):

# Replace YOUR_API_KEY with your actual key from .env
curl -X POST http://localhost:5001/track \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{
    "business_id": "test-business",
    "response_time_ms": 150,
    "tokens_used": 75,
    "api_cost_usd": 0.002,
    "intent_detected": "appointment",
    "response_type": "booking",
    "appointment_requested": false
  }'

Expected Response (200):

{
  "status": "success",
  "message": "Metrics tracked successfully",
  "prometheus_updated": true,
  "database_stored": true,
  "timestamp": "2024-01-15T10:30:00.000000"
}

Success Check: If you receive a 200 response with valid API key, authentication is working!

6. Protected vs Public Endpoints

Protected Endpoints (API Key Required):

/track - Metrics submission
/analytics/<business_id> - Analytics data
/refresh-metrics - Metrics refresh

Public Endpoints (No API Key):

/ - Dashboard
/health - Health check
/metrics - Prometheus metrics

Part E: GDPR Compliance Basics

Learn fundamental privacy and compliance requirements for AI systems

2. What Data Does Our AI Chat Collect?

Personal Data (Requires Protection):

User Messages: "I want to book an appointment"
Email Addresses: john@example.com
Names: "John Smith"
Phone Numbers: +1-555-1234

Non-Personal Data (Anonymous):

Business ID: uuid-abc-123
Metrics: Response time, token count

Important: Our AI chat collects personal data! We need to handle it properly.

5. Implementing Data Retention (Conceptual)

SQL Query to Delete Old Chat Logs:

-- Delete chat messages older than 90 days
DELETE FROM chat_messages
WHERE created_at < NOW() - INTERVAL '90 days';

-- Delete old AI metrics
DELETE FROM ai_metrics
WHERE created_at < NOW() - INTERVAL '90 days';

Scheduled Job (Production):

Run this query daily via cron job or AWS Lambda
Log deletions for compliance audit trail

For Lab 10: We won't implement the actual scheduled deletion, but understanding the concept is important!

6. Privacy Best Practices

DO:

Be transparent about data collection
Provide privacy policy
Implement data retention policies
Use encryption (HTTPS, SSL database connections)
Give users control over their data

DON'T:

Collect data you don't need
Share data with third parties without consent
Keep data indefinitely
Log sensitive information (passwords, API keys)

Part F: Security Best Practices Checklist

Production-ready security measures for AI systems

Production Security Checklist

Use this checklist to verify your AI application is secured:

Environment Variables

All secrets in environment variables (not hardcoded)
.env file in .gitignore
.env.example template in repository
Different secrets for dev/staging/production
Secrets rotated quarterly

API Security

API key authentication on sensitive endpoints
Rate limiting enabled
HTTPS only (in production)
CORS configured properly
Input validation on all endpoints

Database Security

SSL/TLS encryption enabled (?sslmode=require)
Parameterized queries (no SQL injection)
Database credentials in environment variables
Least privilege principle (app user can't drop tables)

Logging & Monitoring

Log authentication attempts
Don't log sensitive data (API keys, passwords)
Monitor for suspicious activity
Set up alerts for security events

Compliance

Privacy policy published
User consent for data collection
Data retention policy defined
User data export/deletion available

What We Implemented vs Real Production

What We Implemented in Lab 10:

API key authentication
Rate limiting
Secure environment variables
.env.example template
GDPR awareness

What's Left for Real Production:

OAuth 2.0 / JWT (Industry-standard auth)
Data Encryption at Rest
WAF (Web Application Firewall)
Security Audit Logging
Penetration Testing
Container Scanning
HTTPS Certificates (SSL/TLS)

Remember: Lab 10 covers fundamental security. Production systems need much more!

Troubleshooting

Flask won't start after adding flask-limiter:

Error: ModuleNotFoundError: No module named 'flask_limiter'

cd mlops-service
pip install flask-limiter==3.5.0

API key authentication not working:

Symptoms: All requests return 401 Unauthorized, even correct API key fails

Check:

Is MLOPS_API_KEY set in .env? cat .env | grep MLOPS_API_KEY
Did you restart Flask after updating .env?
Is the API key in the header correct?

Rate limit headers not showing:

Symptoms: Requests work but no X-RateLimit-* headers

This is normal! Flask-Limiter only adds headers when rate limit is close to being hit or after hitting the limit.

Development mode - API key not required:

Symptoms: Requests work without API key, logs show "API Key Authentication DISABLED"

This is by design! If MLOPS_API_KEY is not set in .env, authentication is disabled for easier development. To enable: generate API key, add to .env, restart Flask.

Cannot test API key with curl:

Symptoms: curl commands too complex, syntax errors

# Create test file with API key
echo "YOUR_API_KEY" > api_key.txt

# Use file in request
curl -X POST http://localhost:5001/track \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $(cat api_key.txt)" \
  -d '{"business_id":"test","response_time_ms":100,"tokens_used":50}'

Lab 10 Summary - What You Learned

Congratulations! You've implemented fundamental security practices for your AI application. Here's what you accomplished:

Security Concepts Learned

CIA Triad: Confidentiality, Integrity, Availability
Common Threats: API key leaks, DoS, SQL injection
Security vs Compliance: Understanding the difference

Practical Implementation

API key authentication for protected endpoints
Rate limiting to prevent abuse
Secure environment variable management
.env.example template for team collaboration

GDPR Basics

What personal data is
User rights (access, deletion, consent)
Data retention policies
Privacy best practices

Key Takeaways

Never commit secrets to git - Use environment variables
Rate limiting prevents abuse - Protects against DoS and controls costs
Authentication protects sensitive endpoints - API keys are simplest form
GDPR requires transparency - Tell users what data you collect
Security is a process, not a product - Continuous improvement

Test Your Knowledge

Complete the Lab 10 quiz to test your understanding of security and compliance for AI systems.

Take Lab 10 Quiz →

Quiz Submission Checklist:

Complete all 5 multiple-choice questions
Take a screenshot of your results page showing:

Your name
Your score (aim for 4/5 or 5/5)
Session ID
Timestamp

Submit screenshot as proof of completion

← Lab 9: Monitoring & Logging Back to Labs →