A powerful Site Reliability Engineering (SRE) assistant built with Google's Agent Development Kit (ADK), featuring specialized agents for AWS cost analysis, Kubernetes operations, and operational best practices.
- Docker and Docker Compose
- AI Provider API key (see AI Model Configuration below)
- (Optional) AWS credentials and Kubernetes config for respective features
git clone <your-repo-url>
cd sre-bot
# Copy environment files and customize
cp .env.example .env
cp agents/.env.example agents/.env
cp slack_bot/.env.example slack_bot/.env
Edit agents/.env
with your AI provider credentials (see AI Model Configuration for details):
# Option 1: Google Gemini (Recommended)
GOOGLE_API_KEY=your_google_api_key_here
GOOGLE_AI_MODEL=gemini-2.0-flash # optional
# Option 2: Anthropic Claude
ANTHROPIC_API_KEY=your_anthropic_api_key_here
ANTHROPIC_MODEL=claude-3-5-sonnet-20240620 # optional
# Option 3: AWS Bedrock (requires AWS credentials)
BEDROCK_INFERENCE_PROFILE=arn:aws:bedrock:us-west-2:812201244513:inference-profile/us.anthropic.claude-opus-4-1-20250805-v1:0
# Optional: AWS and Kubernetes configurations
AWS_PROFILE=your_aws_profile
KUBE_CONTEXT=your_kube_context
# Build and start all services
docker compose build
docker compose up -d
# Check if services are running
docker compose ps
- Web Interface: http://localhost:8000
- API Server: http://localhost:8001
- Health Check: http://localhost:8000/health
The SRE bot follows a modular architecture with specialized sub-agents:
agents/sre_agent/
βββ agent.py # Main SRE agent orchestrator
βββ serve.py # FastAPI server with health checks
βββ utils.py # Shared utilities
βββ sub_agents/
βββ aws_cost/ # AWS cost analysis module
βββ agent.py # Agent configuration
βββ tools/ # Cost analysis tools
βββ prompts/ # Agent instructions
- Retrieve and analyze AWS cost data for specific time periods
- Filter costs by services, tags, or accounts
- Calculate cost trends over time
- Provide average daily costs (including or excluding weekends)
- Identify the most expensive AWS accounts
- Compare costs across different time periods
- Generate cost optimization recommendations
- Infrastructure monitoring and troubleshooting
- Operational best practices and recommendations
- Performance optimization guidance
- Natural language interaction with technical systems
# Run linting and formatting
ruff check .
ruff format .
ruff check . --fix
# Run pre-commit hooks manually
pre-commit run --all-files
For rapid development and testing:
# Install dependencies
pip install -r agents/sre_agent/requirements.txt
pip install -r requirements-dev.txt
# Use built-in ADK web interface for rapid bot testing
adk web --session_service_uri=postgresql://postgres:password@localhost:5432/srebot
# Or use custom serve.py for API-only development
cd agents/sre_agent
python serve.py
- sre-bot-web: Web interface using ADK's built-in UI (port 8000)
- sre-bot-api: API-only server using custom
serve.py
(port 8001) - slack-bot: Slack integration service (port 8002)
- postgres: PostgreSQL database for session persistence
# Start specific services
docker compose up -d sre-bot-web # Web interface
docker compose up -d sre-bot-api # API server
docker compose up -d slack-bot # Slack bot
# View logs
docker compose logs [service-name]
# Stop services
docker compose down
curl -X POST http://localhost:8001/apps/sre_agent/users/u_123/sessions/s_123 \
-H "Content-Type: application/json" \
-d '{"state": {"key1": "value1"}}'
curl -X POST http://localhost:8001/run \
-H "Content-Type: application/json" \
-d '{
"app_name": "sre_agent",
"user_id": "u_123",
"session_id": "s_123",
"new_message": {
"role": "user",
"parts": [{"text": "How many pods are running in the default namespace?"}]
}
}'
-
Configure Slack App (see detailed instructions below)
-
Set environment variables in
slack_bot/.env
:SLACK_BOT_TOKEN=xoxb-your-slack-bot-token SLACK_SIGNING_SECRET=your-slack-signing-secret SLACK_APP_TOKEN=xapp-your-slack-app-token
-
Start the Slack bot:
docker compose up -d slack-bot
- Go to https://api.slack.com/apps and click "Create New App"
- Name it and choose a workspace
- Add Bot Token Scopes:
app_mentions:read
- View messages that mention the botchat:write
- Send messageschannels:join
- Join channelschat:write.public
- Send messages to channels the bot isn't in
- Install App to Workspace and get approval if needed
- Set up Event Subscriptions pointing to your ngrok URL
- Configure Slash Commands if desired
display_information:
name: sre-bot
features:
bot_user:
display_name: sre-bot
always_online: false
oauth_config:
scopes:
bot:
- app_mentions:read
- channels:join
- channels:history
- chat:write
- chat:write.public
- commands
- reactions:read
settings:
event_subscriptions:
request_url: https://your-ngrok-url.ngrok-free.app/slack/events
bot_events:
- app_mention
org_deploy_enabled: false
socket_mode_enabled: false
The SRE bot uses separate environment files for better organization:
.env
: Main Docker Compose configurationagents/.env
: SRE Agent specific settingsslack_bot/.env
: Slack Bot configuration
# Main Configuration (.env)
GOOGLE_API_KEY=your_google_api_key
GOOGLE_AI_MODEL=gemini-2.0-flash
POSTGRES_PASSWORD=postgres
LOG_LEVEL=INFO
# Agent Configuration (agents/.env)
PORT=8000
DB_HOST=localhost
DB_PORT=5432
# Slack Bot Configuration (slack_bot/.env)
SLACK_BOT_TOKEN=xoxb-your-token
SLACK_SIGNING_SECRET=your-secret
SRE_AGENT_API_URL=http://sre-bot-api:8001
The SRE bot supports multiple AI providers with automatic provider detection based on your environment variables. The system checks for API keys in priority order and configures the appropriate model.
Best for: Google Cloud users, fastest setup, most reliable
# Required
GOOGLE_API_KEY=your_google_api_key_here
# Optional (defaults shown)
GOOGLE_AI_MODEL=gemini-2.0-flash
Get API Key: Google AI Studio
Best for: Advanced reasoning tasks, detailed analysis
# Required
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Optional (defaults shown)
ANTHROPIC_MODEL=claude-3-5-sonnet-20240620
Get API Key: Anthropic Console
Best for: AWS-native deployments, enterprise compliance
# Required
BEDROCK_INFERENCE_PROFILE=arn:aws:bedrock:us-west-2:812201244513:inference-profile/us.anthropic.claude-opus-4-1-20250805-v1:0
# AWS credentials also required (one of the following):
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
# OR
AWS_PROFILE=your_aws_profile
Setup: Configure AWS Bedrock access in your AWS account
The system automatically selects providers in this order:
- Google Gemini (if
GOOGLE_API_KEY
is set) - Anthropic Claude (if
ANTHROPIC_API_KEY
is set) - AWS Bedrock (if
BEDROCK_INFERENCE_PROFILE
is set)
# agents/.env
GOOGLE_API_KEY=AIzaSyD4R5T6Y7U8I9O0P1A2S3D4F5G6H7J8K9L0
# agents/.env
ANTHROPIC_API_KEY=sk-ant-api03-A1B2C3D4E5F6G7H8I9J0
ANTHROPIC_MODEL=claude-3-opus-20240229
# agents/.env
BEDROCK_INFERENCE_PROFILE=arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0
AWS_PROFILE=bedrock-user
AWS_REGION=us-east-1
ERROR: No AI provider configured!
Please configure one of the following providers...
Solution: Set at least one API key as shown above.
ERROR: BEDROCK_INFERENCE_PROFILE is set but AWS credentials are not configured
Solution: Configure AWS credentials via environment variables or AWS profiles.
ERROR: Authentication failed with provider
Solution: Verify your API key is correct and has necessary permissions.
Use Case | Recommended Provider | Model | Why |
---|---|---|---|
General SRE Tasks | Google Gemini | gemini-2.0-flash |
Fast, reliable, good for operations |
Complex Analysis | Anthropic Claude | claude-3-5-sonnet-20240620 |
Superior reasoning for complex problems |
Enterprise/AWS | AWS Bedrock | claude-3-opus-* |
Enterprise compliance, AWS integration |
Cost-Sensitive | Google Gemini | gemini-2.0-flash |
Most cost-effective for high-volume usage |
- Store sensitive credentials in environment variables
- Use separate credentials for production vs development
- Follow principle of least privilege for AWS and Kubernetes access
- Never commit actual
.env
files to version control - Review audit logs periodically
-
Service Communication Issues:
docker compose ps # Check if all containers are running docker compose logs [service-name] # Check specific service logs
-
Database Connection Issues:
docker compose logs postgres # Check PostgreSQL logs
-
AI Model Configuration Issues:
docker compose logs sre-bot-api | grep -E "(ERROR|model|provider)"
Common errors:
No AI provider configured!
β Set at least one API keyBedrock requires valid AWS credentials
β Configure AWS accessAuthentication failed
β Verify API key is valid- See AI Model Configuration for detailed setup
# Check overall health
curl http://localhost:8000/health
# Kubernetes readiness/liveness probes
curl http://localhost:8000/health/readiness
curl http://localhost:8000/health/liveness
get_cost_for_period
- Get costs for specific date rangesget_monthly_cost
- Monthly cost summariesget_cost_trend
- Cost trend analysisget_cost_by_service
- Service-level cost breakdownget_cost_by_tag
- Tag-based cost analysisget_most_expensive_account
- Identify highest-cost accounts
-
Follow the established code structure and patterns
-
Use shared utilities from
agents/sre_agent/utils.py
-
Run code quality checks before committing:
ruff check . --fix ruff format . pre-commit run --all-files
-
Test your changes with Docker Compose
-
Update documentation as needed
[Add your license here]
Need help? Check the troubleshooting section above or review the service logs with docker compose logs [service-name]
.