← Labs

Lab 10: Security & Compliance for AI Systems

SecuritySecurity & Compliance for AI Systems

Lab 10: Security & Compliance for AI Systems

Learn fundamental security practices for protecting your AI receptionist system, including API authentication, rate limiting, secure configuration management, and GDPR compliance basics.

Lab Overview

What You'll Do: Understand security fundamentals for AI systems, implement API key authentication, add rate limiting to prevent abuse, create secure environment variable templates, and learn GDPR compliance basics for AI chat data

What You'll Build:

  • Secured MLOps service with API key authentication
  • Rate-limited endpoints to prevent abuse
  • .env.example template for secure configuration
  • Security best practices checklist

Lab Collaborators:

  • • Edward Lampoh - Software Developer & Collaborator
  • • Oluwafemi Adebayo, PhD - Academic Professor & Collaborator
Prerequisites Required
Complete Labs 1-9 before starting

Before starting Lab 10, ensure you have:

  • Completed Labs 1-9
  • Flask MLOps service running locally
  • Basic understanding of HTTP requests and headers
  • Familiarity with environment variables

Quick Test

# Verify Flask service is running
curl http://localhost:5001/health

# Should return healthy status

Part A: Understanding Security for AI Systems

Learn the fundamentals of protecting AI applications from threats

1. What is Application Security?

Application security is the practice of protecting software applications from threats throughout their lifecycle.

The CIA Triad

Security professionals use the CIA Triad to think about security:

Confidentiality

Definition: Only authorized users can access data
Example: API keys prevent unauthorized access to metrics

Integrity

Definition: Data remains accurate and unmodified
Example: Validate metrics data before storing

Availability

Definition: System remains accessible when needed
Example: Rate limiting prevents DDoS attacks

2. Common Security Threats for AI Applications

API Key Leakage

What it is: Accidentally exposing API keys in code, git commits, or logs

Risk: Attackers can use your API keys to access your system, rack up API costs, or steal sensitive data

Real Example:

A student accidentally committed AWS credentials to GitHub. Within 15 minutes, automated bots found the keys and started mining cryptocurrency. Bill: $2,500.

Denial of Service (DoS)

What it is: Overwhelming your service with requests until it crashes

Risk: Without rate limiting, attackers (or bugs!) can make thousands of requests per second, crash your Flask service, max out your AI API quota, and generate huge cloud bills

SQL Injection

What it is: Inserting malicious SQL code through user input

Risk: Attackers can read your entire database, delete all data, or modify business information

# VULNERABLE CODE (Never do this!)
query = f"SELECT * FROM businesses WHERE name = '{user_input}'"

# If user_input is: "'; DROP TABLE businesses; --"
# The query becomes:
# SELECT * FROM businesses WHERE name = ''; DROP TABLE businesses; --'

Safe Approach: Always use parameterized queries (which our app already does!)

3. Security vs Compliance

Security

What: Protecting your system from threats

Example: API key authentication

Who enforces: Attackers exploit weaknesses

Goal: Prevent unauthorized access

Compliance

What: Following laws and regulations

Example: GDPR data protection

Who enforces: Governments fine violations

Goal: Handle data ethically and legally

Part B: Secure Environment Variables

Learn how to manage sensitive configuration securely

1. Why Environment Variables Matter

Environment variables store sensitive configuration like API keys, database URLs, and secrets.

Bad Practice

# Hardcoded secret in code (NEVER DO THIS!)
API_KEY = "sk-abc123xyz456"
DATABASE_URL = "postgresql://user:password@host/db"

Problems:

  • Visible in code
  • Committed to git history
  • Exposed to anyone with code access
  • Can't change without redeploying

Good Practice

# Load from environment variables
import os
API_KEY = os.getenv('API_KEY')
DATABASE_URL = os.getenv('DATABASE_URL')

Benefits:

  • Secrets stay out of code
  • Different values per environment (dev, staging, production)
  • Easy to rotate/change
  • Not committed to git
2. Review Current Setup

Check where your secrets are stored:

Verify .env file and .gitignore:

cd mlops-service

# View your .env file (contains real secrets)
cat .env

# Check if .env is in .gitignore
cat ../.gitignore | grep .env
3. Create .env.example Template

The .env.example file has already been updated in your mlops-service/ directory.

View the template:

# View the template
cat mlops-service/.env.example

What's in .env.example:

  • Template with placeholder values
  • Comments explaining each variable
  • NO real secrets (safe to commit to git)
  • Instructions for developers to copy and fill in

Usage for new team members:

# Copy template and fill in actual values
cp .env.example .env
# Then edit .env with your actual credentials
4. Security Best Practices for Secrets

DO:

  • Use environment variables for all secrets
  • Add .env to .gitignore
  • Use different secrets for dev/staging/production
  • Rotate secrets regularly (quarterly)
  • Use secret management tools in production

DON'T:

  • Hardcode secrets in code
  • Commit .env files to git
  • Share secrets via email or Slack
  • Reuse the same secret across environments
  • Log sensitive values (API keys, passwords)

Check Git History for Leaked Secrets

# Search entire git history for .env files
git log --all --full-history -- "*/.env"

# If you find any results, the .env file was committed before!

Expected: No results (clean history). In real projects with leaks, you'd need to rotate all credentials and use tools like git-secrets or BFG Repo-Cleaner.

Part C: API Rate Limiting

Prevent abuse by restricting request frequency

1. What is Rate Limiting?

Rate limiting restricts how many requests a user can make in a time period.

Example:

Rate limit: 100 requests per hour

• User makes 100 requests → All allowed

• User makes 101st request → Blocked! (429 Too Many Requests)

• User waits 1 hour → Counter resets, can make 100 more

2. Why Rate Limiting is Important

Prevent Abuse

Malicious users can't overwhelm your service. Bugs (infinite loops) won't crash your system.

Control Costs

AI APIs charge per request. Rate limits cap your maximum spend.

Ensure Fair Access

One user can't monopolize resources. All users get reasonable access.

Example Scenario Without Rate Limiting:

User 1: Accidentally creates infinite loop
        Sends 10,000 requests in 5 minutes

Result:
- Your Flask service crashes
- AI API bill: $500
- Other users can't access service
3. Understanding Our Rate Limits

Our Flask service now has these rate limits:

/ (Dashboard)

No limit - Public page, static content

/health

100/minute - Health checks should be frequent

/health/detailed

50/minute - More expensive to compute

/metrics

No limit - Prometheus scrapes frequently

/track

100/hour - Metrics tracking, needs protection

/analytics/<id>

50/hour - Database queries, more expensive

/refresh-metrics

10/hour - Expensive operation

4. Test Rate Limiting

Start Flask Service:

cd mlops-service
python app.py

# You should see:
# 🚦 Rate Limiting: ENABLED

Test Normal Request (Within Limit):

# Single request - should work
curl http://localhost:5001/health

Expected: Status 200, JSON response

Test Rate Limit Exceeded:

# Send 101 requests in a loop (exceeds 100/minute limit)
for i in {1..101}; do
  curl -s http://localhost:5001/health
  echo "Request $i"
done

Expected: First 100 requests succeed (200), 101st request returns 429 Too Many Requests

Response when rate limited:

{
  "error": "429 Too Many Requests",
  "message": "1 per 1 minute"
}

View Rate Limit Headers:

# Check rate limit status in headers
curl -I http://localhost:5001/health

Response Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99
X-RateLimit-Reset: 1704567890
5. How Rate Limiting Works

Our implementation uses flask-limiter:

from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

# Initialize rate limiter
limiter = Limiter(
    get_remote_address,  # Track by IP address
    app=app,
    default_limits=["200 per day", "50 per hour"],
    storage_uri="memory://",  # Store in memory
)

# Apply to endpoint
@app.route('/health')
@limiter.limit("100 per minute")
def health_check():
    return jsonify({'status': 'healthy'})

Key Points:

  • Tracked by IP address: Each IP has separate limits
  • Memory storage: Limits reset when service restarts
  • Per-endpoint: Different limits for different endpoints

Production Considerations:

  • Use Redis for persistent storage (limits survive restarts)
  • Track by user ID instead of IP (more accurate)
  • Implement tiered limits (free users: 100/day, paid: 10,000/day)

Part D: API Key Authentication

Protect sensitive endpoints with API key authentication

1. What are API Keys?

API keys are secret tokens that identify and authenticate applications.

Analogy:

An API key is like a house key:

  • You need the right key to enter
  • Don't share your key with strangers
  • Change locks if key is stolen

Example API Key:

MLOPS_API_KEY=9a7f2c8e5d1b3f6a4e8c9d2f1a7b5c3e
2. How API Key Authentication Works

Without Authentication:

User → Request → Flask → Always accepts

With API Key Authentication:

User → Request with X-API-Key header → Flask → Check key → Allow/Deny

Flow:

  1. User includes X-API-Key: your-key-here in request header
  2. Flask checks if key matches MLOPS_API_KEY environment variable
  3. If match: Allow request
  4. If no match: Return 401 Unauthorized or 403 Forbidden
3. Generate Secure API Key

Generate a random API key:

# Mac/Linux
openssl rand -hex 32

# Output example:
# 9a7f2c8e5d1b3f6a4e8c9d2f1a7b5c3e8f4d2a6b9c1e7f3a5d8b2c6e4f1a3d7
4. Configure API Key

Add your generated API key to .env:

cd mlops-service
nano .env  # or use your preferred editor

Add this line:

MLOPS_API_KEY=your-generated-key-here

Save and restart Flask service:

# Stop current service (Ctrl+C)
# Start again
python app.py

# You should see:
# 🔐 Security: API Key Authentication ENABLED
5. Test API Key Authentication

Request Without API Key (Unauthorized):

# Try to track metrics without API key
curl -X POST http://localhost:5001/track \
  -H "Content-Type: application/json" \
  -d '{
    "business_id": "test",
    "response_time_ms": 100,
    "tokens_used": 50
  }'

Expected Response (401):

{
  "error": "Unauthorized",
  "message": "API key required. Include X-API-Key header."
}

Request With Wrong API Key (Forbidden):

# Try with incorrect key
curl -X POST http://localhost:5001/track \
  -H "Content-Type: application/json" \
  -H "X-API-Key: wrong-key-here" \
  -d '{
    "business_id": "test",
    "response_time_ms": 100,
    "tokens_used": 50
  }'

Expected Response (403):

{
  "error": "Forbidden",
  "message": "Invalid API key"
}

Request With Correct API Key (Success):

# Replace YOUR_API_KEY with your actual key from .env
curl -X POST http://localhost:5001/track \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{
    "business_id": "test-business",
    "response_time_ms": 150,
    "tokens_used": 75,
    "api_cost_usd": 0.002,
    "intent_detected": "appointment",
    "response_type": "booking",
    "appointment_requested": false
  }'

Expected Response (200):

{
  "status": "success",
  "message": "Metrics tracked successfully",
  "prometheus_updated": true,
  "database_stored": true,
  "timestamp": "2024-01-15T10:30:00.000000"
}
6. Protected vs Public Endpoints

Protected Endpoints (API Key Required):

  • /track - Metrics submission
  • /analytics/<business_id> - Analytics data
  • /refresh-metrics - Metrics refresh

Public Endpoints (No API Key):

  • / - Dashboard
  • /health - Health check
  • /metrics - Prometheus metrics

Part E: GDPR Compliance Basics

Learn fundamental privacy and compliance requirements for AI systems

1. What is GDPR?

GDPR (General Data Protection Regulation) is a European law that protects people's personal data.

Applies When:

  • Your users are in the European Union
  • Your company is based in the EU
  • You process EU citizens' data
2. What Data Does Our AI Chat Collect?

Personal Data (Requires Protection):

  • User Messages: "I want to book an appointment"
  • Email Addresses: john@example.com
  • Names: "John Smith"
  • Phone Numbers: +1-555-1234

Non-Personal Data (Anonymous):

  • Business ID: uuid-abc-123
  • Metrics: Response time, token count
3. GDPR Requirements (Simplified)

Lawful Basis for Processing

Question: Why are you collecting this data?
Our Answer: Providing the service (appointment booking)

Consent

Requirement: Users must agree to data collection

Implementation example:

☑ I agree to allow this chat to be stored for service purposes.
Privacy Policy

Right to Access

Requirement: Users can request their data
Implementation: Provide endpoint to export user's chat history

Right to Deletion

Requirement: Users can request data deletion
Implementation: Provide "delete my data" button/endpoint

Data Retention

Requirement: Don't keep data longer than necessary
Implementation: Delete chat logs after 90 days

4. Simple GDPR Checklist for Our Project

For a college project, we'll implement basic compliance:

Inform Users

Add privacy notice: "Your chat messages are stored to provide this service"
Link to privacy policy (can be simple)

Data Minimization

Only collect what's needed (email, name for appointments)
Don't collect unnecessary data (age, address, etc.)

Data Retention

Plan to delete old chats (conceptual for now)
Production would implement: DELETE FROM chats WHERE created_at < NOW() - INTERVAL '90 days'

Secure Storage

Database uses SSL (?sslmode=require in DATABASE_URL)
Environment variables secure API keys

5. Implementing Data Retention (Conceptual)

SQL Query to Delete Old Chat Logs:

-- Delete chat messages older than 90 days
DELETE FROM chat_messages
WHERE created_at < NOW() - INTERVAL '90 days';

-- Delete old AI metrics
DELETE FROM ai_metrics
WHERE created_at < NOW() - INTERVAL '90 days';

Scheduled Job (Production):

  • Run this query daily via cron job or AWS Lambda
  • Log deletions for compliance audit trail
6. Privacy Best Practices

DO:

  • Be transparent about data collection
  • Provide privacy policy
  • Implement data retention policies
  • Use encryption (HTTPS, SSL database connections)
  • Give users control over their data

DON'T:

  • Collect data you don't need
  • Share data with third parties without consent
  • Keep data indefinitely
  • Log sensitive information (passwords, API keys)

Part F: Security Best Practices Checklist

Production-ready security measures for AI systems

Production Security Checklist

Use this checklist to verify your AI application is secured:

Environment Variables

  • All secrets in environment variables (not hardcoded)
  • .env file in .gitignore
  • .env.example template in repository
  • Different secrets for dev/staging/production
  • Secrets rotated quarterly

API Security

  • API key authentication on sensitive endpoints
  • Rate limiting enabled
  • HTTPS only (in production)
  • CORS configured properly
  • Input validation on all endpoints

Database Security

  • SSL/TLS encryption enabled (?sslmode=require)
  • Parameterized queries (no SQL injection)
  • Database credentials in environment variables
  • Least privilege principle (app user can't drop tables)

Logging & Monitoring

  • Log authentication attempts
  • Don't log sensitive data (API keys, passwords)
  • Monitor for suspicious activity
  • Set up alerts for security events

Compliance

  • Privacy policy published
  • User consent for data collection
  • Data retention policy defined
  • User data export/deletion available
What We Implemented vs Real Production

What We Implemented in Lab 10:

  • API key authentication
  • Rate limiting
  • Secure environment variables
  • .env.example template
  • GDPR awareness

What's Left for Real Production:

  • OAuth 2.0 / JWT (Industry-standard auth)
  • Data Encryption at Rest
  • WAF (Web Application Firewall)
  • Security Audit Logging
  • Penetration Testing
  • Container Scanning
  • HTTPS Certificates (SSL/TLS)
Troubleshooting

Flask won't start after adding flask-limiter:

Error: ModuleNotFoundError: No module named 'flask_limiter'

cd mlops-service
pip install flask-limiter==3.5.0

API key authentication not working:

Symptoms: All requests return 401 Unauthorized, even correct API key fails

Check:

  • Is MLOPS_API_KEY set in .env? cat .env | grep MLOPS_API_KEY
  • Did you restart Flask after updating .env?
  • Is the API key in the header correct?

Rate limit headers not showing:

Symptoms: Requests work but no X-RateLimit-* headers

This is normal! Flask-Limiter only adds headers when rate limit is close to being hit or after hitting the limit.

Development mode - API key not required:

Symptoms: Requests work without API key, logs show "API Key Authentication DISABLED"

This is by design! If MLOPS_API_KEY is not set in .env, authentication is disabled for easier development. To enable: generate API key, add to .env, restart Flask.

Cannot test API key with curl:

Symptoms: curl commands too complex, syntax errors

# Create test file with API key
echo "YOUR_API_KEY" > api_key.txt

# Use file in request
curl -X POST http://localhost:5001/track \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $(cat api_key.txt)" \
  -d '{"business_id":"test","response_time_ms":100,"tokens_used":50}'
Lab 10 Summary - What You Learned

Congratulations! You've implemented fundamental security practices for your AI application. Here's what you accomplished:

Security Concepts Learned

  • CIA Triad: Confidentiality, Integrity, Availability
  • Common Threats: API key leaks, DoS, SQL injection
  • Security vs Compliance: Understanding the difference

Practical Implementation

  • API key authentication for protected endpoints
  • Rate limiting to prevent abuse
  • Secure environment variable management
  • .env.example template for team collaboration

GDPR Basics

  • What personal data is
  • User rights (access, deletion, consent)
  • Data retention policies
  • Privacy best practices

Key Takeaways

  • Never commit secrets to git - Use environment variables
  • Rate limiting prevents abuse - Protects against DoS and controls costs
  • Authentication protects sensitive endpoints - API keys are simplest form
  • GDPR requires transparency - Tell users what data you collect
  • Security is a process, not a product - Continuous improvement

Test Your Knowledge

Complete the Lab 10 quiz to test your understanding of security and compliance for AI systems.

Take Lab 10 Quiz →

Quiz Submission Checklist:

  • Complete all 5 multiple-choice questions
  • Take a screenshot of your results page showing:
    • Your name
    • Your score (aim for 4/5 or 5/5)
    • Session ID
    • Timestamp
  • Submit screenshot as proof of completion