-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
Co-Created with AI Agent:
Alembic Database Connection Fails in Docker Compose Setup
Issue Description
When running Skyvern in Docker using the provided docker-compose.yml, the Alembic database migrations fail with a "connection refused" error, causing the container to restart continuously (400+ times). This happens despite PostgreSQL being healthy and accessible.
Environment
- Skyvern Version: Latest Docker image (public.ecr.aws/skyvern/skyvern:latest)
- Docker Version: 20.10+
- OS: Linux (Ubuntu/ArchLinux)
- Docker Compose Version: 2.0+
Problem Details
Error Message
sqlalchemy.exc.OperationalError: (psycopg.OperationalError) connection failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Root Cause
The issue occurs because:
- The
DATABASE_URL
environment variable is correctly set topostgresql://skyvern:skyvern@postgres:5432/skyvern
- However, Alembic uses
SettingsManager.get_settings().DATABASE_STRING
which may not be reading the environment variable correctly - The default alembic.ini contains
sqlalchemy.url = postgresql+psycopg://skyvern@localhost/skyvern
which points to localhost instead of the postgres service
Symptoms
- Container enters restart loop immediately after "Alembic mode: online"
- Direct Python connection tests work:
psycopg.connect('postgresql://skyvern:skyvern@postgres:5432/skyvern')
succeeds - The wrapper script can connect successfully, but Alembic fails immediately after
Reproduction Steps
- Use the standard docker-compose.yml:
services:
postgres:
image: postgres:14-alpine
environment:
POSTGRES_DB: skyvern
POSTGRES_USER: skyvern
POSTGRES_PASSWORD: skyvern
healthcheck:
test: ["CMD-SHELL", "pg_isready -U skyvern"]
skyvern:
image: public.ecr.aws/skyvern/skyvern:latest
environment:
DATABASE_URL: postgresql://skyvern:skyvern@postgres:5432/skyvern
depends_on:
postgres:
condition: service_healthy
- Run
docker-compose up -d
- Observe container restarting:
docker ps --filter name=skyvern
- Check logs:
docker logs skyvern-container
Solution/Workaround
Add the DATABASE_STRING
environment variable explicitly:
skyvern:
environment:
DATABASE_URL: postgresql://skyvern:skyvern@postgres:5432/skyvern
DATABASE_STRING: postgresql+psycopg://skyvern:skyvern@postgres:5432/skyvern # Add this line
This ensures that SettingsManager gets the correct connection string for Alembic.
Suggested Fixes
- Option 1: Update SettingsManager to properly read DATABASE_URL and construct DATABASE_STRING from it
- Option 2: Update alembic/env.py to use DATABASE_URL directly instead of SettingsManager
- Option 3: Document the need for DATABASE_STRING in Docker deployment docs
Additional Context
- The issue only occurs in containerized environments where service names are used for hostname resolution
- Direct database connections work fine, only Alembic migrations fail
- The health check passes but Alembic still fails to connect
Related Issues
- Similar to issues with localhost vs service name resolution in Docker networks
- May be related to the difference between DATABASE_URL format and SQLAlchemy's required format
Note: This issue causes high CPU usage due to rapid container restarts, making it a critical issue for Docker deployments.
k2riddim
Metadata
Metadata
Assignees
Labels
No labels