🕷️ Scrapy PEP Parser

A Python web scraper built with Scrapy to extract data from the Python Enhancement Proposals (PEP) website.
It collects structured information on all PEPs and saves it into two CSV files:

📄 PEP List — number, title, and status for each proposal
📊 Status Summary — aggregated statistics by status type

🧰 Tech Stack

✨ Features

🕸️ Full PEP parsing from the official Python website
📂 CSV export — structured results for further analysis
📊 Status aggregation with counts by type
⚙️ ItemLoaders for structured and normalized data
🛠️ Configurable settings in settings.py
📑 Detailed logging for debugging and transparency

📦 Installation

Clone the repository

git clone https://github.com/Riadnov-dev/scrapy_parser_pep.git

cd scrapy_parser_pep

Create and activate a virtual environment

python -m venv venv

source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

🚀 Usage

Run the pep spider:

scrapy crawl pep

After execution, two CSV files will be created inside the results/ directory:

pep_.csv — full list of PEPs with number, title, and status

status_summary_.csv — aggregated count of PEPs by status

📂 Project Structure

scrapy_parser_pep/
 ├── pep_parse/
 │   ├── spiders/
 │   │   ├── __init__.py
 │   │   └── pep.py
 │   ├── __init__.py
 │   ├── items.py
 │   ├── middlewares.py
 │   ├── pipelines.py
 │   └── settings.py
 ├── results/
 │   ├── pep_<datetime>.csv
 │   └── status_summary_<datetime>.csv
 ├── tests/
 ├── .flake8
 ├── .gitignore
 ├── README.md
 ├── pytest.ini
 ├── requirements.txt
 └── scrapy.cfg

👤 Author

Nikita Riadnov

GitHub: https://github.com/Riadnov-dev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕷️ Scrapy PEP Parser

🧰 Tech Stack

✨ Features

📦 Installation

🚀 Usage

📂 Project Structure

👤 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
pep_parse		pep_parse
results		results
tests		tests
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Riadnov-dev/scrapy_parser_pep

Folders and files

Latest commit

History

Repository files navigation

🕷️ Scrapy PEP Parser

🧰 Tech Stack

✨ Features

📦 Installation

🚀 Usage

📂 Project Structure

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages