A Python web scraper built with Scrapy to extract data from the Python Enhancement Proposals (PEP) website.
It collects structured information on all PEPs and saves it into two CSV files:
- 📄 PEP List — number, title, and status for each proposal
- 📊 Status Summary — aggregated statistics by status type
- 🕸️ Full PEP parsing from the official Python website
- 📂 CSV export — structured results for further analysis
- 📊 Status aggregation with counts by type
- ⚙️ ItemLoaders for structured and normalized data
- 🛠️ Configurable settings in
settings.py
- 📑 Detailed logging for debugging and transparency
- Clone the repository
git clone https://github.com/Riadnov-dev/scrapy_parser_pep.git
cd scrapy_parser_pep
- Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
Run the pep spider:
scrapy crawl pep
After execution, two CSV files will be created inside the results/ directory:
pep_.csv — full list of PEPs with number, title, and status
status_summary_.csv — aggregated count of PEPs by status
scrapy_parser_pep/
├── pep_parse/
│ ├── spiders/
│ │ ├── __init__.py
│ │ └── pep.py
│ ├── __init__.py
│ ├── items.py
│ ├── middlewares.py
│ ├── pipelines.py
│ └── settings.py
├── results/
│ ├── pep_<datetime>.csv
│ └── status_summary_<datetime>.csv
├── tests/
├── .flake8
├── .gitignore
├── README.md
├── pytest.ini
├── requirements.txt
└── scrapy.cfg
Nikita Riadnov
GitHub: https://github.com/Riadnov-dev