Add DSPy GEPA Optimization Tutorial for Mathematical Reasoning #333

behroozazarkhalili · 2025-09-30T19:55:47Z

Summary

This PR adds a comprehensive tutorial demonstrating DSPy's GEPA (Generalized Error-driven Prompt Augmentation) optimizer for improving language model performance on mathematical reasoning tasks.

What's New

New Notebook: notebooks/en/dspy_gepa.ipynb
Tutorial Focus: Automated prompt optimization using error-driven feedback
Dataset: NuminaMath-1.5 (900k competition-level math problems)
Models: OpenRouter integration with GPT-4.1 Nano and Qwen3 Next

Key Features

Learning Objectives

Setting up DSPy with OpenRouter language models
Processing and filtering mathematical problem datasets
Building baseline Chain-of-Thought reasoning programs
Optimizing prompts with GEPA using error-driven feedback
Evaluating improvements in model accuracy

Technical Highlights

Comprehensive documentation with docstrings and type hints
Memory-efficient dataset filtering for numeric answers
Dual-model architecture (inference + reflection)
Detailed feedback mechanism for GEPA optimization
Parallel evaluation with configurable threading

Implementation Details

Code Quality

All functions include comprehensive docstrings with Args/Returns/Raises
Type hints for improved code clarity and IDE support
Inline comments explaining complex logic
Clean cell organization with markdown section headers

Model Configuration

Main LM: openrouter/openai/gpt-4.1-nano - Fast, cost-effective inference
Reflection LM: openrouter/qwen/qwen3-next-80b-a3b-thinking - Advanced reasoning for optimization

Dataset Processing

Filtered NuminaMath-1.5 for numeric answers (298k → 13.4k examples)
Train/val/test split: 50%/10%/40%
Reproducible shuffling with fixed seed
Configurable sampling for testing

Files Modified

✅ notebooks/en/dspy_gepa.ipynb - New tutorial notebook
✅ notebooks/en/index.md - Added to latest notebooks section
✅ notebooks/en/_toctree.yml - Added to LLM Recipes section

Testing

All code cells execute without errors
Model configurations verified against OpenRouter API
Dataset filtering logic validated
GEPA optimizer parameters aligned with official DSPy documentation
Evaluation metrics tested for correctness

Checklist

Notebook follows cookbook structure and formatting guidelines
Author attribution included in notebook header
All functions documented with docstrings and type hints
Code cells aligned with markdown section headers
No duplicate imports or undefined variables
Added to index.md and _toctree.yml
Technical components verified (models, dataset, algorithms)
Reproducible results with fixed random seeds

Additional Notes

This tutorial showcases how GEPA's error-driven approach can significantly improve LLM performance through automatic prompt refinement, making it valuable for users working on complex reasoning tasks where prompt quality is critical.

Introduce comprehensive notebook demonstrating automated prompt optimization using DSPy's GEPA (Generalized Error-driven Prompt Augmentation) optimizer on the NuminaMath-1.5 dataset. Key features: - Complete setup guide for both local (Ollama) and cloud (OpenRouter) LLMs - Dataset processing and filtering for mathematical problems with numeric answers - Baseline Chain-of-Thought implementation achieving 42.3% accuracy - GEPA optimization workflow with error-driven feedback mechanism - Performance improvement to 64.0% accuracy (+21.7% gain) - Detailed evaluation and metrics tracking The notebook showcases how GEPA automatically refines prompts by analyzing errors and generating targeted feedback, making it particularly effective for complex reasoning tasks where prompt quality significantly impacts model performance. Includes comprehensive documentation, code examples, and performance benchmarks demonstrating the power of automated prompt engineering for mathematical reasoning tasks.

Add author attribution and comprehensive section headers following cookbook standards: - Include author credit with GitHub profile link - Add descriptive markdown headers for each major section - Update metadata with Colab GPU configuration - Improve overall notebook organization and readability Sections include: - Installation and Setup - Language Model Configuration (Ollama/OpenRouter) - Dataset Loading and Filtering - Dataset Preparation Functions - Baseline Chain-of-Thought Program - Evaluation Metric - Baseline Evaluation - GEPA Optimization - Optimized Program Evaluation The enhanced structure makes the notebook more accessible and easier to follow while maintaining consistency with other cookbook tutorials.

Enhance code quality with docstrings, type hints, and inline comments: - is_numeric_answer: Type hints (str -> bool) + docstring explaining validation logic - init_dataset: Full type hints + comprehensive docstring covering all parameters, returns, and raises - metric: Type hints + docstring explaining evaluation logic and return values - metric_with_feedback: Type hints + detailed docstring explaining GEPA feedback generation All functions now include: - Google-style docstrings with Args, Returns, and Raises sections - Type hints for parameters and return values - Inline comments explaining key logic steps - Clear parameter descriptions and default values Improves code readability, maintainability, and serves as educational reference for DSPy users.

review-notebook-app · 2025-09-30T19:55:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

notebooks/en/dspy_gepa.ipynb

Remove duplicate 'import dspy' from cell 20 (already imported in cell 2). Comprehensive verification completed: ✅ All markdown headers properly aligned with code cells ✅ All imports present and non-duplicated ✅ All variables defined in correct order ✅ Code flow is logical and sequential ✅ No syntax errors or undefined references ✅ Function definitions have proper type hints and docstrings Notebook structure: - Installation and Setup (cells 1-2) - Language Model Configuration (cells 3-4) - Dataset Loading and Filtering (cells 5-9) - Dataset Preparation Functions (cells 10-14) - Baseline Chain-of-Thought Program (cells 15-16) - Evaluation Metric (cells 17-18) - Baseline Evaluation (cells 19-20) - GEPA Optimization (cells 21-25) - Optimized Program Evaluation (cells 26-27) The notebook is now ready for production use with no bugs or alignment issues.

notebooks/en/dspy_gepa.ipynb

sergiopaniego

Thanks! Could you resolve the conflicts? 🙌

- Add uv installation instructions with pip alternative - Add detailed explanation of GEPA's two-model architecture - Update API call ratio to accurate ~5-10% (not 1%) - Add 'Learn more' section with curated resources: * DSPy framework documentation and papers * Prompt optimization techniques and comparisons * Mathematical reasoning datasets and surveys * Related techniques (few-shot, self-consistency, ReAct) * Tools and platforms - Add inline resource links throughout notebook - Link to research paper on reflective prompt evolution

- Keep both DSPy GEPA and GRPO vLLM entries in _toctree.yml - Keep both entries in index.md latest notebooks section

HuggingFaceDocBuilderDev · 2025-10-15T16:04:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

notebooks/en/index.md

Remove 6th entry as requested by reviewer to maintain only the last 5 added notebooks in the list.

- Add resource links in introduction section - Link to DSPy, NuminaMath dataset, and OpenRouter - Add GEPA optimizer documentation link

sergiopaniego

Thanks for the addition!!

behroozazarkhalili added 4 commits September 30, 2025 06:20

adding GEPA notebook

77f51c0

sergiopaniego reviewed Oct 3, 2025

View reviewed changes

behroozazarkhalili force-pushed the add-dspy-gepa-notebook branch 2 times, most recently from 8904c69 to 8076deb Compare October 5, 2025 14:07

behroozazarkhalili force-pushed the add-dspy-gepa-notebook branch from 8076deb to aac07df Compare October 5, 2025 14:10

sergiopaniego reviewed Oct 10, 2025

View reviewed changes

notebooks/en/dspy_gepa.ipynb Show resolved Hide resolved

notebooks/en/dspy_gepa.ipynb Show resolved Hide resolved

notebooks/en/dspy_gepa.ipynb Show resolved Hide resolved

sergiopaniego reviewed Oct 10, 2025

View reviewed changes

behroozazarkhalili added 2 commits October 10, 2025 06:46

Resolve merge conflicts with origin/main

b491aaa

- Keep both DSPy GEPA and GRPO vLLM entries in _toctree.yml - Keep both entries in index.md latest notebooks section

sergiopaniego reviewed Oct 15, 2025

View reviewed changes

notebooks/en/index.md Show resolved Hide resolved

behroozazarkhalili added 2 commits October 15, 2025 09:48

Keep only the 5 most recent notebooks in index.md

eb4e702

Remove 6th entry as requested by reviewer to maintain only the last 5 added notebooks in the list.

Update DSPy GEPA notebook with inline resource links

88c201f

- Add resource links in introduction section - Link to DSPy, NuminaMath dataset, and OpenRouter - Add GEPA optimizer documentation link

sergiopaniego approved these changes Oct 16, 2025

View reviewed changes

sergiopaniego merged commit 815874f into huggingface:main Oct 16, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DSPy GEPA Optimization Tutorial for Mathematical Reasoning #333

Add DSPy GEPA Optimization Tutorial for Mathematical Reasoning #333

Uh oh!

behroozazarkhalili commented Sep 30, 2025

Uh oh!

review-notebook-app bot commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sergiopaniego left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2025

Uh oh!

Uh oh!

sergiopaniego left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add DSPy GEPA Optimization Tutorial for Mathematical Reasoning #333

Add DSPy GEPA Optimization Tutorial for Mathematical Reasoning #333

Uh oh!

Conversation

behroozazarkhalili commented Sep 30, 2025

Summary

What's New

Key Features

Learning Objectives

Technical Highlights

Implementation Details

Code Quality

Model Configuration

Dataset Processing

Files Modified

Testing

Checklist

Additional Notes

Uh oh!

review-notebook-app bot commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2025

Uh oh!

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants