-
Notifications
You must be signed in to change notification settings - Fork 842
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Do you need to ask a question?
- I have searched the existing question and discussions and this question is not already answered.
- I believe this is a legitimate question, not just a bug or feature request.
Your Question
I'm having persistent timeout issues when trying to use local Ollama models through LiteLLM-Proxy in my RAG-Anything setup. The API endpoints work fine, but local model calls always timeout despite various attempts to fix it.
Additional Context
Environment Setup:
- Ollama Service: Running locally on port 11434
- LiteLLM-Proxy: Running on port 9000, configured with base_url: http://ollama:11434/v1
- RAG-Anything: Running on port 8801, using API base: http://litellm-proxy:9000/v1
- Model: Testing with Qwen3-8B-Q6_K and smaller 4B models
- Hardware: Ubuntu 22.04, RTX 4090 (48GB modified)
- litellm config:
- model_name: Qwen3-8B-Q6_K
litellm_params:
model: openai/Qwen3-8B-Q6_K
api_base: http://ollama:11434/v1
api_key: dummy
enable_thinking: false
timeout: 600
What I've Already Tried:
- Increased timeout settings significantly
- Limited max_workers to reduce load
- Switched to smaller 4B parameter models
- Verified API endpoints work independently
- Confirmed Ollama service responds to direct calls
Some of the console logs:
- app |Received empty content from OpenAI API
- app |WARNING: limit_async: Worker timeout for task 140201160394224_2571.427 after 210s
- app |TimeoutError: [LLM func] limit_async: Worker execution timeout after 210s
- ollama | [GIN] 2025/09/10 - 12:32:42 | 500 | 10m0s | 172.18.0.2 | POST "/v1/chat/completions"
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested