-
Notifications
You must be signed in to change notification settings - Fork 357
Add DSPy GEPA Optimization Tutorial for Mathematical Reasoning #333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DSPy GEPA Optimization Tutorial for Mathematical Reasoning #333
Conversation
Introduce comprehensive notebook demonstrating automated prompt optimization using DSPy's GEPA (Generalized Error-driven Prompt Augmentation) optimizer on the NuminaMath-1.5 dataset. Key features: - Complete setup guide for both local (Ollama) and cloud (OpenRouter) LLMs - Dataset processing and filtering for mathematical problems with numeric answers - Baseline Chain-of-Thought implementation achieving 42.3% accuracy - GEPA optimization workflow with error-driven feedback mechanism - Performance improvement to 64.0% accuracy (+21.7% gain) - Detailed evaluation and metrics tracking The notebook showcases how GEPA automatically refines prompts by analyzing errors and generating targeted feedback, making it particularly effective for complex reasoning tasks where prompt quality significantly impacts model performance. Includes comprehensive documentation, code examples, and performance benchmarks demonstrating the power of automated prompt engineering for mathematical reasoning tasks.
Add author attribution and comprehensive section headers following cookbook standards: - Include author credit with GitHub profile link - Add descriptive markdown headers for each major section - Update metadata with Colab GPU configuration - Improve overall notebook organization and readability Sections include: - Installation and Setup - Language Model Configuration (Ollama/OpenRouter) - Dataset Loading and Filtering - Dataset Preparation Functions - Baseline Chain-of-Thought Program - Evaluation Metric - Baseline Evaluation - GEPA Optimization - Optimized Program Evaluation The enhanced structure makes the notebook more accessible and easier to follow while maintaining consistency with other cookbook tutorials.
Enhance code quality with docstrings, type hints, and inline comments: - is_numeric_answer: Type hints (str -> bool) + docstring explaining validation logic - init_dataset: Full type hints + comprehensive docstring covering all parameters, returns, and raises - metric: Type hints + docstring explaining evaluation logic and return values - metric_with_feedback: Type hints + detailed docstring explaining GEPA feedback generation All functions now include: - Google-style docstrings with Args, Returns, and Raises sections - Type hints for parameters and return values - Inline comments explaining key logic steps - Clear parameter descriptions and default values Improves code readability, maintainability, and serves as educational reference for DSPy users.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
8904c69
to
8076deb
Compare
Remove duplicate 'import dspy' from cell 20 (already imported in cell 2). Comprehensive verification completed: ✅ All markdown headers properly aligned with code cells ✅ All imports present and non-duplicated ✅ All variables defined in correct order ✅ Code flow is logical and sequential ✅ No syntax errors or undefined references ✅ Function definitions have proper type hints and docstrings Notebook structure: - Installation and Setup (cells 1-2) - Language Model Configuration (cells 3-4) - Dataset Loading and Filtering (cells 5-9) - Dataset Preparation Functions (cells 10-14) - Baseline Chain-of-Thought Program (cells 15-16) - Evaluation Metric (cells 17-18) - Baseline Evaluation (cells 19-20) - GEPA Optimization (cells 21-25) - Optimized Program Evaluation (cells 26-27) The notebook is now ready for production use with no bugs or alignment issues.
8076deb
to
aac07df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Could you resolve the conflicts? 🙌
- Add uv installation instructions with pip alternative - Add detailed explanation of GEPA's two-model architecture - Update API call ratio to accurate ~5-10% (not 1%) - Add 'Learn more' section with curated resources: * DSPy framework documentation and papers * Prompt optimization techniques and comparisons * Mathematical reasoning datasets and surveys * Related techniques (few-shot, self-consistency, ReAct) * Tools and platforms - Add inline resource links throughout notebook - Link to research paper on reflective prompt evolution
- Keep both DSPy GEPA and GRPO vLLM entries in _toctree.yml - Keep both entries in index.md latest notebooks section
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Remove 6th entry as requested by reviewer to maintain only the last 5 added notebooks in the list.
- Add resource links in introduction section - Link to DSPy, NuminaMath dataset, and OpenRouter - Add GEPA optimizer documentation link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition!!
Summary
This PR adds a comprehensive tutorial demonstrating DSPy's GEPA (Generalized Error-driven Prompt Augmentation) optimizer for improving language model performance on mathematical reasoning tasks.
What's New
notebooks/en/dspy_gepa.ipynb
Key Features
Learning Objectives
Technical Highlights
Implementation Details
Code Quality
Model Configuration
openrouter/openai/gpt-4.1-nano
- Fast, cost-effective inferenceopenrouter/qwen/qwen3-next-80b-a3b-thinking
- Advanced reasoning for optimizationDataset Processing
Files Modified
notebooks/en/dspy_gepa.ipynb
- New tutorial notebooknotebooks/en/index.md
- Added to latest notebooks sectionnotebooks/en/_toctree.yml
- Added to LLM Recipes sectionTesting
Checklist
Additional Notes
This tutorial showcases how GEPA's error-driven approach can significantly improve LLM performance through automatic prompt refinement, making it valuable for users working on complex reasoning tasks where prompt quality is critical.