Skip to content

Alembic Database Connection Fails in Docker Compose Setup #3398

@gwpl

Description

@gwpl

Co-Created with AI Agent:

Alembic Database Connection Fails in Docker Compose Setup

Issue Description

When running Skyvern in Docker using the provided docker-compose.yml, the Alembic database migrations fail with a "connection refused" error, causing the container to restart continuously (400+ times). This happens despite PostgreSQL being healthy and accessible.

Environment

  • Skyvern Version: Latest Docker image (public.ecr.aws/skyvern/skyvern:latest)
  • Docker Version: 20.10+
  • OS: Linux (Ubuntu/ArchLinux)
  • Docker Compose Version: 2.0+

Problem Details

Error Message

sqlalchemy.exc.OperationalError: (psycopg.OperationalError) connection failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
(Background on this error at: https://sqlalche.me/e/20/e3q8)

Root Cause

The issue occurs because:

  1. The DATABASE_URL environment variable is correctly set to postgresql://skyvern:skyvern@postgres:5432/skyvern
  2. However, Alembic uses SettingsManager.get_settings().DATABASE_STRING which may not be reading the environment variable correctly
  3. The default alembic.ini contains sqlalchemy.url = postgresql+psycopg://skyvern@localhost/skyvern which points to localhost instead of the postgres service

Symptoms

  • Container enters restart loop immediately after "Alembic mode: online"
  • Direct Python connection tests work: psycopg.connect('postgresql://skyvern:skyvern@postgres:5432/skyvern') succeeds
  • The wrapper script can connect successfully, but Alembic fails immediately after

Reproduction Steps

  1. Use the standard docker-compose.yml:
services:
  postgres:
    image: postgres:14-alpine
    environment:
      POSTGRES_DB: skyvern
      POSTGRES_USER: skyvern
      POSTGRES_PASSWORD: skyvern
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U skyvern"]
      
  skyvern:
    image: public.ecr.aws/skyvern/skyvern:latest
    environment:
      DATABASE_URL: postgresql://skyvern:skyvern@postgres:5432/skyvern
    depends_on:
      postgres:
        condition: service_healthy
  1. Run docker-compose up -d
  2. Observe container restarting: docker ps --filter name=skyvern
  3. Check logs: docker logs skyvern-container

Solution/Workaround

Add the DATABASE_STRING environment variable explicitly:

skyvern:
  environment:
    DATABASE_URL: postgresql://skyvern:skyvern@postgres:5432/skyvern
    DATABASE_STRING: postgresql+psycopg://skyvern:skyvern@postgres:5432/skyvern  # Add this line

This ensures that SettingsManager gets the correct connection string for Alembic.

Suggested Fixes

  1. Option 1: Update SettingsManager to properly read DATABASE_URL and construct DATABASE_STRING from it
  2. Option 2: Update alembic/env.py to use DATABASE_URL directly instead of SettingsManager
  3. Option 3: Document the need for DATABASE_STRING in Docker deployment docs

Additional Context

  • The issue only occurs in containerized environments where service names are used for hostname resolution
  • Direct database connections work fine, only Alembic migrations fail
  • The health check passes but Alembic still fails to connect

Related Issues

  • Similar to issues with localhost vs service name resolution in Docker networks
  • May be related to the difference between DATABASE_URL format and SQLAlchemy's required format

Note: This issue causes high CPU usage due to rapid container restarts, making it a critical issue for Docker deployments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions