Skip to content

Conversation

behroozazarkhalili
Copy link
Contributor

Summary

This PR adds a comprehensive tutorial demonstrating DSPy's GEPA (Generalized Error-driven Prompt Augmentation) optimizer for improving language model performance on mathematical reasoning tasks.

What's New

  • New Notebook: notebooks/en/dspy_gepa.ipynb
  • Tutorial Focus: Automated prompt optimization using error-driven feedback
  • Dataset: NuminaMath-1.5 (900k competition-level math problems)
  • Models: OpenRouter integration with GPT-4.1 Nano and Qwen3 Next

Key Features

Learning Objectives

  • Setting up DSPy with OpenRouter language models
  • Processing and filtering mathematical problem datasets
  • Building baseline Chain-of-Thought reasoning programs
  • Optimizing prompts with GEPA using error-driven feedback
  • Evaluating improvements in model accuracy

Technical Highlights

  • Comprehensive documentation with docstrings and type hints
  • Memory-efficient dataset filtering for numeric answers
  • Dual-model architecture (inference + reflection)
  • Detailed feedback mechanism for GEPA optimization
  • Parallel evaluation with configurable threading

Implementation Details

Code Quality

  • All functions include comprehensive docstrings with Args/Returns/Raises
  • Type hints for improved code clarity and IDE support
  • Inline comments explaining complex logic
  • Clean cell organization with markdown section headers

Model Configuration

  • Main LM: openrouter/openai/gpt-4.1-nano - Fast, cost-effective inference
  • Reflection LM: openrouter/qwen/qwen3-next-80b-a3b-thinking - Advanced reasoning for optimization

Dataset Processing

  • Filtered NuminaMath-1.5 for numeric answers (298k → 13.4k examples)
  • Train/val/test split: 50%/10%/40%
  • Reproducible shuffling with fixed seed
  • Configurable sampling for testing

Files Modified

  • notebooks/en/dspy_gepa.ipynb - New tutorial notebook
  • notebooks/en/index.md - Added to latest notebooks section
  • notebooks/en/_toctree.yml - Added to LLM Recipes section

Testing

  • All code cells execute without errors
  • Model configurations verified against OpenRouter API
  • Dataset filtering logic validated
  • GEPA optimizer parameters aligned with official DSPy documentation
  • Evaluation metrics tested for correctness

Checklist

  • Notebook follows cookbook structure and formatting guidelines
  • Author attribution included in notebook header
  • All functions documented with docstrings and type hints
  • Code cells aligned with markdown section headers
  • No duplicate imports or undefined variables
  • Added to index.md and _toctree.yml
  • Technical components verified (models, dataset, algorithms)
  • Reproducible results with fixed random seeds

Additional Notes

This tutorial showcases how GEPA's error-driven approach can significantly improve LLM performance through automatic prompt refinement, making it valuable for users working on complex reasoning tasks where prompt quality is critical.

Introduce comprehensive notebook demonstrating automated prompt optimization using DSPy's GEPA (Generalized Error-driven Prompt Augmentation) optimizer on the NuminaMath-1.5 dataset.

Key features:
- Complete setup guide for both local (Ollama) and cloud (OpenRouter) LLMs
- Dataset processing and filtering for mathematical problems with numeric answers
- Baseline Chain-of-Thought implementation achieving 42.3% accuracy
- GEPA optimization workflow with error-driven feedback mechanism
- Performance improvement to 64.0% accuracy (+21.7% gain)
- Detailed evaluation and metrics tracking

The notebook showcases how GEPA automatically refines prompts by analyzing errors and generating targeted feedback, making it particularly effective for complex reasoning tasks where prompt quality significantly impacts model performance.

Includes comprehensive documentation, code examples, and performance benchmarks demonstrating the power of automated prompt engineering for mathematical reasoning tasks.
Add author attribution and comprehensive section headers following cookbook standards:
- Include author credit with GitHub profile link
- Add descriptive markdown headers for each major section
- Update metadata with Colab GPU configuration
- Improve overall notebook organization and readability

Sections include:
- Installation and Setup
- Language Model Configuration (Ollama/OpenRouter)
- Dataset Loading and Filtering
- Dataset Preparation Functions
- Baseline Chain-of-Thought Program
- Evaluation Metric
- Baseline Evaluation
- GEPA Optimization
- Optimized Program Evaluation

The enhanced structure makes the notebook more accessible and easier to follow while maintaining consistency with other cookbook tutorials.
Enhance code quality with docstrings, type hints, and inline comments:

- is_numeric_answer: Type hints (str -> bool) + docstring explaining validation logic
- init_dataset: Full type hints + comprehensive docstring covering all parameters, returns, and raises
- metric: Type hints + docstring explaining evaluation logic and return values
- metric_with_feedback: Type hints + detailed docstring explaining GEPA feedback generation

All functions now include:
- Google-style docstrings with Args, Returns, and Raises sections
- Type hints for parameters and return values
- Inline comments explaining key logic steps
- Clear parameter descriptions and default values

Improves code readability, maintainability, and serves as educational reference for DSPy users.
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@behroozazarkhalili behroozazarkhalili force-pushed the add-dspy-gepa-notebook branch 2 times, most recently from 8904c69 to 8076deb Compare October 5, 2025 14:07
Remove duplicate 'import dspy' from cell 20 (already imported in cell 2).

Comprehensive verification completed:
✅ All markdown headers properly aligned with code cells
✅ All imports present and non-duplicated
✅ All variables defined in correct order
✅ Code flow is logical and sequential
✅ No syntax errors or undefined references
✅ Function definitions have proper type hints and docstrings

Notebook structure:
- Installation and Setup (cells 1-2)
- Language Model Configuration (cells 3-4)
- Dataset Loading and Filtering (cells 5-9)
- Dataset Preparation Functions (cells 10-14)
- Baseline Chain-of-Thought Program (cells 15-16)
- Evaluation Metric (cells 17-18)
- Baseline Evaluation (cells 19-20)
- GEPA Optimization (cells 21-25)
- Optimized Program Evaluation (cells 26-27)

The notebook is now ready for production use with no bugs or alignment issues.
Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you resolve the conflicts? 🙌

- Add uv installation instructions with pip alternative
- Add detailed explanation of GEPA's two-model architecture
- Update API call ratio to accurate ~5-10% (not 1%)
- Add 'Learn more' section with curated resources:
  * DSPy framework documentation and papers
  * Prompt optimization techniques and comparisons
  * Mathematical reasoning datasets and surveys
  * Related techniques (few-shot, self-consistency, ReAct)
  * Tools and platforms
- Add inline resource links throughout notebook
- Link to research paper on reflective prompt evolution
- Keep both DSPy GEPA and GRPO vLLM entries in _toctree.yml
- Keep both entries in index.md latest notebooks section
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Remove 6th entry as requested by reviewer to maintain only the last 5 added notebooks in the list.
- Add resource links in introduction section
- Link to DSPy, NuminaMath dataset, and OpenRouter
- Add GEPA optimizer documentation link
Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition!!

@sergiopaniego sergiopaniego merged commit 815874f into huggingface:main Oct 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants