Skip to content

UN-OSAA/Afrobarometer-Data-Explorer

Β 
Β 

Repository files navigation

🌍 Afrobarometer Data Explorer

A professional, modular data exploration platform for Afrobarometer Round 9 survey data, featuring both a Jupyter notebook for detailed analysis and a refactored Streamlit web application for interactive exploration.

πŸ“‹ About This Application

This application was originally built for the UN Office of the Special Adviser to Africa to facilitate data exploration and analysis of Afrobarometer survey data. However, recognizing the value of open-source tools for research and development, this application has been made publicly available for use by researchers, analysts, and anyone interested in exploring African public opinion data.

πŸ“Š Data Scope

  • Dataset: Afrobarometer Round 9 survey data only
  • Geographic Coverage: 39 African countries
  • Survey Period: 2021-2023
  • Total Observations: 53,444 respondents
  • Variables: 426 variables including Q3-Q116 survey questions

πŸ—οΈ Project Structure

Afrobarometer/
β”œβ”€β”€ app/                          # Main application package
β”‚   β”œβ”€β”€ components/              # UI components
β”‚   β”‚   └── sidebar.py          # Sidebar with filters and controls
β”‚   β”œβ”€β”€ pages/                  # Page components
β”‚   β”‚   β”œβ”€β”€ overview.py         # Dataset overview page
β”‚   β”‚   └── visualizations.py   # Data visualizations page
β”‚   β”œβ”€β”€ utils/                  # Utility modules
β”‚   β”‚   β”œβ”€β”€ data_loader.py      # Original data loading and processing
β”‚   β”‚   β”œβ”€β”€ preprocessed_data_loader.py  # Efficient preprocessed data loader
β”‚   β”‚   β”œβ”€β”€ visualizations.py   # Chart creation functions
β”‚   β”‚   └── export.py           # Data export functionality
β”‚   └── __init__.py
β”œβ”€β”€ config/                     # Configuration
β”‚   β”œβ”€β”€ settings.py            # Application settings
β”‚   └── __init__.py
β”œβ”€β”€ data/                      # Data files
β”‚   β”œβ”€β”€ raw_data/             # Original SPSS files
β”‚   β”œβ”€β”€ processed/            # Preprocessed datasets
β”‚   └── reference/            # Codebooks and documentation
β”œβ”€β”€ tests/                    # Test files
β”œβ”€β”€ docs/                     # Documentation
β”œβ”€β”€ afrobarometer_env/        # Virtual environment
β”œβ”€β”€ app.py                    # Main Streamlit application
β”œβ”€β”€ afrobarometer_data_analysis.ipynb  # Jupyter notebook
β”œβ”€β”€ preprocess_data.py        # Data preprocessing pipeline
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ run_app.sh               # Application launcher
└── README.md                # This file

πŸš€ Quick Start

1. Setup Environment

# Clone or navigate to the project directory
cd Afrobarometer

# Run the launcher script (handles everything automatically)
./run_app.sh

2. Manual Setup (Alternative)

# Create and activate virtual environment
python -m venv afrobarometer_env
source afrobarometer_env/bin/activate  # On Windows: afrobarometer_env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Preprocess the data (one-time setup)
python preprocess_data.py

# Run the application
streamlit run app.py

3. Access the Application

The app will open in your browser at: http://localhost:8501

πŸ“Š Features

🌐 Streamlit Web Application

Interactive Dashboard

  • 5 Comprehensive Tabs: Overview, Visualizations, Data Explorer, Summary Stats, Export
  • Real-time Filtering: Filter by country and variable type
  • Responsive Design: Works on desktop and mobile devices
  • Professional UI: Clean, intuitive interface with custom styling

Data Exploration

  • Country Selection: Filter data by any of the 39 countries
  • Variable Analysis: Interactive charts for numeric and categorical variables
  • Data Comparison: Cross-variable analysis and correlation
  • Missing Data Visualization: Heatmaps and comprehensive analysis

Visualizations

  • Distribution Plots: Histograms and bar charts
  • Box Plots: Statistical summaries for numeric variables
  • Scatter Plots: Correlation analysis between variables
  • Pie Charts: Categorical variable distributions
  • Interactive Charts: Powered by Plotly for zooming and filtering

Data Export

  • Multiple Formats: CSV, Excel, JSON
  • Filtered Data: Export only selected country/variables
  • Metadata Export: Include variable labels in Excel files
  • Summary Reports: Generate comprehensive markdown reports

πŸ““ Jupyter Notebook (afrobarometer_data_analysis.ipynb)

Purpose & Target Audience

The Jupyter notebook is designed for researchers and analysts who prefer working with notebooks over web applications. It provides direct access to the processed Afrobarometer data from the GitHub repository, enabling detailed statistical analysis and custom research workflows.

Key Features

  • 🌐 Direct GitHub Access: Loads data directly from the repository without local file downloads
  • πŸ“Š Comprehensive Analysis: Dataset overview, country distribution, survey questions analysis
  • πŸ” Interactive Tools: Built-in functions for variable analysis, country filtering, and topic-based question search
  • πŸ“ˆ Advanced Visualizations: Professional charts and graphs for research presentations
  • πŸ’Ύ Flexible Export: Multiple export formats (CSV, Excel, Parquet) with timestamped filenames
  • 🎯 Survey Focus: Specialized analysis for Q3-Q116 survey questions

Analysis Capabilities

  • Dataset Overview: Complete statistics and data quality assessment
  • Country Analysis: Geographic distribution and UN country grouping analysis
  • Survey Questions: Automatic identification and analysis of Q3-Q116 questions
  • Data Quality: Missing values analysis and completeness metrics
  • Interactive Functions:
    • analyze_variable() - Deep dive into any variable
    • filter_by_country() - Country-specific analysis
    • get_questions_by_topic() - Topic-based question discovery
  • Export Tools: Export filtered data and summary statistics

Why Use the Notebook?

  • πŸ”¬ Research Flexibility: Custom analysis workflows and statistical modeling
  • πŸ“ Reproducible Research: Shareable code and analysis steps
  • πŸŽ“ Educational Value: Learn data analysis techniques with real survey data
  • ⚑ No Setup Required: Direct access to processed data from GitHub
  • πŸ”„ Version Control: Track analysis changes and collaborate with others

πŸ”§ Technical Architecture

Data Preprocessing Pipeline

The project includes a comprehensive data preprocessing pipeline (preprocess_data.py) that optimizes the Afrobarometer dataset for efficient web application performance:

Preprocessing Features

  • Label Application: Converts numeric codes to readable labels using the official codebook
  • Country Integration: Merges with UN country grouping data for enhanced analysis
  • Format Optimization: Compresses data to 8.1MB (85% size reduction) using Parquet + GZIP
  • Performance Boost: Reduces load time from 5+ seconds to <1 second
  • Metadata Preservation: Maintains all variable and value labels from original SPSS file

Preprocessing Process

  1. Read .sav File: Loads original SPSS data with metadata
  2. Parse Codebook: Extracts variable and value labels from Excel codebook
  3. Apply Labels: Converts numeric codes to descriptive text
  4. Merge Country Data: Adds UN country grouping information
  5. Optimize Format: Saves in space-efficient Parquet format
  6. Generate Metadata: Creates comprehensive metadata file

Performance Benefits

  • File Size: 8.1MB vs 67MB original (85% reduction)
  • Load Time: <1 second vs 5+ seconds
  • Memory Usage: Optimized for web applications
  • Streamlit Compatible: Pre-cleaned for immediate use

Modular Design

  • Separation of Concerns: Each module has a specific responsibility
  • Reusable Components: UI components can be easily modified
  • Clean Code: Well-documented, maintainable codebase
  • Error Handling: Robust error handling throughout

Key Modules

Data Loading

  • Original Loader (app/utils/data_loader.py): Loads SPSS files with pyreadstat
  • Preprocessed Loader (app/utils/preprocessed_data_loader.py): Efficient loading of preprocessed data
  • Preserves variable labels and value labels
  • Provides data filtering and processing functions
  • Caches data for performance

Visualizations (app/utils/visualizations.py)

  • Creates interactive charts using Plotly
  • Handles both numeric and categorical variables
  • Provides consistent styling and formatting
  • Supports various chart types

Export (app/utils/export.py)

  • Exports data in multiple formats
  • Generates summary reports
  • Handles metadata inclusion
  • Provides filename generation

Configuration (config/settings.py)

  • Centralized application settings
  • File paths and constants
  • Visualization settings
  • Export configurations

Performance Optimizations

  • Preprocessed Data: 85% file size reduction with Parquet + GZIP compression
  • Fast Loading: <1 second load time vs 5+ seconds for original .sav files
  • Data Caching: Streamlit caches loaded data
  • Lazy Loading: Variables loaded on demand
  • Efficient Filtering: Pandas-based filtering
  • Memory Management: Optimized for large datasets

πŸ“ˆ Usage Examples

Streamlit App

  1. Select Country: Use sidebar to filter by specific country
  2. Choose Variables: Select variable type and specific variables
  3. Explore Data: Use tabs to view overview, visualizations, and statistics
  4. Export Results: Download filtered data or summary reports

Jupyter Notebook

Getting Started

  1. Open the Notebook: Launch afrobarometer_data_analysis.ipynb in Jupyter Lab/Notebook
  2. Install Dependencies: Run the first cell to install required packages
  3. Load Data: Execute cells to load data directly from GitHub repository
  4. Explore Overview: Review dataset statistics and structure

Interactive Analysis

# Analyze any variable in detail
analyze_variable('Country (COUNTRY)')

# Filter data for specific country
nigeria_data = filter_by_country('Nigeria')

# Find questions by topic
democracy_questions = get_questions_by_topic('democracy')

# Export filtered data
export_data(nigeria_data, "nigeria_analysis")

Research Workflow

  1. Data Exploration: Use built-in functions to explore the dataset
  2. Country Analysis: Filter and analyze specific countries
  3. Survey Questions: Focus on Q3-Q116 questions for your research
  4. Custom Analysis: Add your own analysis cells
  5. Export Results: Save data and visualizations for your research
  6. Share Notebook: Collaborate with other researchers

πŸ› οΈ Development

Adding New Features

  1. New Visualizations: Add functions to app/utils/visualizations.py
  2. New Pages: Create components in app/pages/
  3. New Utilities: Add functions to app/utils/
  4. Configuration: Update config/settings.py

Testing

# Run tests (when implemented)
pytest tests/

# Check code quality
flake8 app/

Code Style

  • Follow PEP 8 guidelines
  • Use type hints where appropriate
  • Document functions with docstrings
  • Keep functions focused and small

πŸ“š Dependencies

Core Libraries

  • pandas: Data manipulation and analysis
  • numpy: Numerical operations
  • pyreadstat: SPSS file reading
  • streamlit: Web application framework

Visualization

  • plotly: Interactive visualizations
  • matplotlib: Static plotting
  • seaborn: Statistical visualizations

Export

  • openpyxl: Excel file support
  • pyarrow: Parquet file support

Analysis

  • scipy: Statistical analysis

πŸ“„ Data Source

  • Dataset: Afrobarometer Round 9 (39 countries)
  • Data File: R9.Merge_39ctry.20Nov23.final_.release_Updated.4Jun25-3.sav
  • Preprocessed Format: Parquet with GZIP compression (85% size reduction)
  • Loading: Uses preprocessed data loader for optimal performance
  • Codebook: AB_R9.MergeCodebook_25Jun24.final_.pdf
  • Country Data: UN Country_grouping.csv for enhanced analysis
  • Official Data Source: Afrobarometer Merged Data
  • Source: Afrobarometer

πŸ“₯ Data Setup & Preprocessing

The original .sav file is not included in this repository due to its large size (67MB). Instead, we use a preprocessed format that achieves 85% size reduction while preserving all data and metadata.

Setup Process:

  1. Download Data: Visit Afrobarometer Merged Data and save to data/raw_data/
  2. Preprocess Data: Run python preprocess_data.py (one-time setup)
  3. Run Application: The app will automatically use the preprocessed data

File Information:

  • Original .sav: 67.3 MB
  • Preprocessed format: 8.1 MB (Parquet with GZIP compression)
  • Size reduction: 85%
  • All metadata preserved from original file
  • Labels applied: Numeric codes converted to readable text
  • Country data merged: Enhanced with UN country groupings
  • Streamlit optimized for immediate use

The preprocessing pipeline creates a ready-to-use dataset with all labels applied and country information merged.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes
  4. Test thoroughly
  5. Commit your changes: git commit -m "Add feature"
  6. Push to the branch: git push origin feature-name
  7. Submit a pull request

πŸ“„ License

This project is for educational and research purposes. Please refer to Afrobarometer's data usage policies for the survey data.

πŸ†˜ Support

Common Issues

  1. Data file not found: Ensure the .sav file is in data/raw_data/
  2. Import errors: Make sure virtual environment is activated
  3. Port conflicts: Change port in run_app.sh if 8501 is busy

Getting Help

  1. Check the Jupyter notebook for detailed analysis examples
  2. Review the Streamlit app documentation
  3. Consult the Afrobarometer codebook for variable definitions
  4. Check the requirements.txt for dependency issues

🎯 Roadmap

Planned Features

  • Advanced statistical analysis tools
  • Machine learning integration
  • Custom dashboard creation
  • Data quality assessment tools
  • Automated report generation
  • Multi-language support

Performance Improvements

  • Database integration for large datasets
  • Advanced caching strategies
  • Parallel processing for computations
  • Memory optimization

Built with ❀️ for data exploration and analysis

This project demonstrates professional software development practices with clean architecture, modular design, and comprehensive documentation.

About

a data explorer for Afrobarometer round 9 data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.5%
  • Jupyter Notebook 19.0%
  • Shell 0.5%