A professional, modular data exploration platform for Afrobarometer Round 9 survey data, featuring both a Jupyter notebook for detailed analysis and a refactored Streamlit web application for interactive exploration.
This application was originally built for the UN Office of the Special Adviser to Africa to facilitate data exploration and analysis of Afrobarometer survey data. However, recognizing the value of open-source tools for research and development, this application has been made publicly available for use by researchers, analysts, and anyone interested in exploring African public opinion data.
- Dataset: Afrobarometer Round 9 survey data only
- Geographic Coverage: 39 African countries
- Survey Period: 2021-2023
- Total Observations: 53,444 respondents
- Variables: 426 variables including Q3-Q116 survey questions
Afrobarometer/
βββ app/ # Main application package
β βββ components/ # UI components
β β βββ sidebar.py # Sidebar with filters and controls
β βββ pages/ # Page components
β β βββ overview.py # Dataset overview page
β β βββ visualizations.py # Data visualizations page
β βββ utils/ # Utility modules
β β βββ data_loader.py # Original data loading and processing
β β βββ preprocessed_data_loader.py # Efficient preprocessed data loader
β β βββ visualizations.py # Chart creation functions
β β βββ export.py # Data export functionality
β βββ __init__.py
βββ config/ # Configuration
β βββ settings.py # Application settings
β βββ __init__.py
βββ data/ # Data files
β βββ raw_data/ # Original SPSS files
β βββ processed/ # Preprocessed datasets
β βββ reference/ # Codebooks and documentation
βββ tests/ # Test files
βββ docs/ # Documentation
βββ afrobarometer_env/ # Virtual environment
βββ app.py # Main Streamlit application
βββ afrobarometer_data_analysis.ipynb # Jupyter notebook
βββ preprocess_data.py # Data preprocessing pipeline
βββ requirements.txt # Python dependencies
βββ run_app.sh # Application launcher
βββ README.md # This file
# Clone or navigate to the project directory
cd Afrobarometer
# Run the launcher script (handles everything automatically)
./run_app.sh# Create and activate virtual environment
python -m venv afrobarometer_env
source afrobarometer_env/bin/activate # On Windows: afrobarometer_env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Preprocess the data (one-time setup)
python preprocess_data.py
# Run the application
streamlit run app.pyThe app will open in your browser at: http://localhost:8501
- 5 Comprehensive Tabs: Overview, Visualizations, Data Explorer, Summary Stats, Export
- Real-time Filtering: Filter by country and variable type
- Responsive Design: Works on desktop and mobile devices
- Professional UI: Clean, intuitive interface with custom styling
- Country Selection: Filter data by any of the 39 countries
- Variable Analysis: Interactive charts for numeric and categorical variables
- Data Comparison: Cross-variable analysis and correlation
- Missing Data Visualization: Heatmaps and comprehensive analysis
- Distribution Plots: Histograms and bar charts
- Box Plots: Statistical summaries for numeric variables
- Scatter Plots: Correlation analysis between variables
- Pie Charts: Categorical variable distributions
- Interactive Charts: Powered by Plotly for zooming and filtering
- Multiple Formats: CSV, Excel, JSON
- Filtered Data: Export only selected country/variables
- Metadata Export: Include variable labels in Excel files
- Summary Reports: Generate comprehensive markdown reports
The Jupyter notebook is designed for researchers and analysts who prefer working with notebooks over web applications. It provides direct access to the processed Afrobarometer data from the GitHub repository, enabling detailed statistical analysis and custom research workflows.
- π Direct GitHub Access: Loads data directly from the repository without local file downloads
- π Comprehensive Analysis: Dataset overview, country distribution, survey questions analysis
- π Interactive Tools: Built-in functions for variable analysis, country filtering, and topic-based question search
- π Advanced Visualizations: Professional charts and graphs for research presentations
- πΎ Flexible Export: Multiple export formats (CSV, Excel, Parquet) with timestamped filenames
- π― Survey Focus: Specialized analysis for Q3-Q116 survey questions
- Dataset Overview: Complete statistics and data quality assessment
- Country Analysis: Geographic distribution and UN country grouping analysis
- Survey Questions: Automatic identification and analysis of Q3-Q116 questions
- Data Quality: Missing values analysis and completeness metrics
- Interactive Functions:
analyze_variable()- Deep dive into any variablefilter_by_country()- Country-specific analysisget_questions_by_topic()- Topic-based question discovery
- Export Tools: Export filtered data and summary statistics
- π¬ Research Flexibility: Custom analysis workflows and statistical modeling
- π Reproducible Research: Shareable code and analysis steps
- π Educational Value: Learn data analysis techniques with real survey data
- β‘ No Setup Required: Direct access to processed data from GitHub
- π Version Control: Track analysis changes and collaborate with others
The project includes a comprehensive data preprocessing pipeline (preprocess_data.py) that optimizes the Afrobarometer dataset for efficient web application performance:
- Label Application: Converts numeric codes to readable labels using the official codebook
- Country Integration: Merges with UN country grouping data for enhanced analysis
- Format Optimization: Compresses data to 8.1MB (85% size reduction) using Parquet + GZIP
- Performance Boost: Reduces load time from 5+ seconds to <1 second
- Metadata Preservation: Maintains all variable and value labels from original SPSS file
- Read .sav File: Loads original SPSS data with metadata
- Parse Codebook: Extracts variable and value labels from Excel codebook
- Apply Labels: Converts numeric codes to descriptive text
- Merge Country Data: Adds UN country grouping information
- Optimize Format: Saves in space-efficient Parquet format
- Generate Metadata: Creates comprehensive metadata file
- File Size: 8.1MB vs 67MB original (85% reduction)
- Load Time: <1 second vs 5+ seconds
- Memory Usage: Optimized for web applications
- Streamlit Compatible: Pre-cleaned for immediate use
- Separation of Concerns: Each module has a specific responsibility
- Reusable Components: UI components can be easily modified
- Clean Code: Well-documented, maintainable codebase
- Error Handling: Robust error handling throughout
- Original Loader (
app/utils/data_loader.py): Loads SPSS files withpyreadstat - Preprocessed Loader (
app/utils/preprocessed_data_loader.py): Efficient loading of preprocessed data - Preserves variable labels and value labels
- Provides data filtering and processing functions
- Caches data for performance
- Creates interactive charts using Plotly
- Handles both numeric and categorical variables
- Provides consistent styling and formatting
- Supports various chart types
- Exports data in multiple formats
- Generates summary reports
- Handles metadata inclusion
- Provides filename generation
- Centralized application settings
- File paths and constants
- Visualization settings
- Export configurations
- Preprocessed Data: 85% file size reduction with Parquet + GZIP compression
- Fast Loading: <1 second load time vs 5+ seconds for original .sav files
- Data Caching: Streamlit caches loaded data
- Lazy Loading: Variables loaded on demand
- Efficient Filtering: Pandas-based filtering
- Memory Management: Optimized for large datasets
- Select Country: Use sidebar to filter by specific country
- Choose Variables: Select variable type and specific variables
- Explore Data: Use tabs to view overview, visualizations, and statistics
- Export Results: Download filtered data or summary reports
- Open the Notebook: Launch
afrobarometer_data_analysis.ipynbin Jupyter Lab/Notebook - Install Dependencies: Run the first cell to install required packages
- Load Data: Execute cells to load data directly from GitHub repository
- Explore Overview: Review dataset statistics and structure
# Analyze any variable in detail
analyze_variable('Country (COUNTRY)')
# Filter data for specific country
nigeria_data = filter_by_country('Nigeria')
# Find questions by topic
democracy_questions = get_questions_by_topic('democracy')
# Export filtered data
export_data(nigeria_data, "nigeria_analysis")- Data Exploration: Use built-in functions to explore the dataset
- Country Analysis: Filter and analyze specific countries
- Survey Questions: Focus on Q3-Q116 questions for your research
- Custom Analysis: Add your own analysis cells
- Export Results: Save data and visualizations for your research
- Share Notebook: Collaborate with other researchers
- New Visualizations: Add functions to
app/utils/visualizations.py - New Pages: Create components in
app/pages/ - New Utilities: Add functions to
app/utils/ - Configuration: Update
config/settings.py
# Run tests (when implemented)
pytest tests/
# Check code quality
flake8 app/- Follow PEP 8 guidelines
- Use type hints where appropriate
- Document functions with docstrings
- Keep functions focused and small
- pandas: Data manipulation and analysis
- numpy: Numerical operations
- pyreadstat: SPSS file reading
- streamlit: Web application framework
- plotly: Interactive visualizations
- matplotlib: Static plotting
- seaborn: Statistical visualizations
- openpyxl: Excel file support
- pyarrow: Parquet file support
- scipy: Statistical analysis
- Dataset: Afrobarometer Round 9 (39 countries)
- Data File: R9.Merge_39ctry.20Nov23.final_.release_Updated.4Jun25-3.sav
- Preprocessed Format: Parquet with GZIP compression (85% size reduction)
- Loading: Uses preprocessed data loader for optimal performance
- Codebook: AB_R9.MergeCodebook_25Jun24.final_.pdf
- Country Data: UN Country_grouping.csv for enhanced analysis
- Official Data Source: Afrobarometer Merged Data
- Source: Afrobarometer
The original .sav file is not included in this repository due to its large size (67MB). Instead, we use a preprocessed format that achieves 85% size reduction while preserving all data and metadata.
Setup Process:
- Download Data: Visit Afrobarometer Merged Data and save to
data/raw_data/ - Preprocess Data: Run
python preprocess_data.py(one-time setup) - Run Application: The app will automatically use the preprocessed data
File Information:
- Original .sav: 67.3 MB
- Preprocessed format: 8.1 MB (Parquet with GZIP compression)
- Size reduction: 85%
- All metadata preserved from original file
- Labels applied: Numeric codes converted to readable text
- Country data merged: Enhanced with UN country groupings
- Streamlit optimized for immediate use
The preprocessing pipeline creates a ready-to-use dataset with all labels applied and country information merged.
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Test thoroughly
- Commit your changes:
git commit -m "Add feature" - Push to the branch:
git push origin feature-name - Submit a pull request
This project is for educational and research purposes. Please refer to Afrobarometer's data usage policies for the survey data.
- Data file not found: Ensure the .sav file is in
data/raw_data/ - Import errors: Make sure virtual environment is activated
- Port conflicts: Change port in
run_app.shif 8501 is busy
- Check the Jupyter notebook for detailed analysis examples
- Review the Streamlit app documentation
- Consult the Afrobarometer codebook for variable definitions
- Check the requirements.txt for dependency issues
- Advanced statistical analysis tools
- Machine learning integration
- Custom dashboard creation
- Data quality assessment tools
- Automated report generation
- Multi-language support
- Database integration for large datasets
- Advanced caching strategies
- Parallel processing for computations
- Memory optimization
Built with β€οΈ for data exploration and analysis
This project demonstrates professional software development practices with clean architecture, modular design, and comprehensive documentation.