Skip to content

atlanhq/Fox-Onsite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

V1 Clean - Nielsen Viewership Data for Cortex Analyst

A streamlined Python toolkit for processing Nielsen television viewership data and creating Cortex Analyst semantic models in Snowflake.

🎯 Overview

This repository contains scripts to:

  • Consolidate multiple CSV files into a single Snowflake table
  • Create network dimension tables for enhanced analytics
  • Generate and deploy semantic models for Snowflake Cortex Analyst
  • Enable natural language querying of television ratings data

πŸ“‹ Prerequisites

  • Python 3.8+
  • Snowflake account with Cortex Analyst access
  • Required Python packages (see requirements section)
  • Nielsen viewership CSV files

πŸ›  Installation

  1. Clone/Download this repository

    git clone <repository-url>
    cd v1-clean
  2. Create Python virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install required packages

    pip install pandas snowflake-connector-python python-dotenv
  4. Configure environment variables Create a .env file in the root directory:

    SNOWFLAKE_ACCOUNT=your_account_identifier
    SNOWFLAKE_USER=your_username
    SNOWFLAKE_PASSWORD=your_password
    SNOWFLAKE_WAREHOUSE=your_warehouse
    SNOWFLAKE_ROLE=your_role

πŸ“ Project Structure

v1-clean/
β”œβ”€β”€ README.md                    # This file
β”œβ”€β”€ complete_setup.py           # Full setup script
β”œβ”€β”€ partial_setup.py            # Partial setup (skip CSV processing)
β”œβ”€β”€ fox_v1.yaml                 # Cortex Analyst semantic model
β”œβ”€β”€ csv_inputs/                 # Directory for Nielsen CSV files
β”‚   β”œβ”€β”€ ratings-dataset-1.csv
β”‚   β”œβ”€β”€ ratings-dataset-2.csv
β”‚   └── ... (additional CSV files)
β”œβ”€β”€ .env                        # Environment configuration
└── requirements.txt            # Python dependencies

πŸ“Š Data Requirements

CSV Input Files

Place your Nielsen viewership CSV files in the csv_inputs/ directory. The scripts expect:

  • CSV files with consistent column structures
  • Required columns: PROGRAM_DISTRIBUTOR, AVERAGE_AUDIENCE_PROJECTION
  • Header row with column names
  • Any number of CSV files (automatically combined)

Snowflake Setup

  • Database: FOX
  • Schema: V1_CLEAN (automatically created from directory name)
  • Required permissions: CREATE TABLE, CREATE STAGE, USAGE on database/schema

πŸš€ Usage

Option 1: Complete Setup (First Time)

Use this when setting up everything from scratch:

python complete_setup.py

What it does:

  1. Creates schema FOX.V1_CLEAN
  2. Combines all CSV files from csv_inputs/
  3. Creates consolidated table: VW_NIELSEN_PROGRAM_VIEWERSHIP_DAILY
  4. Creates dimension table: DIM_CONFORMED_NETWORK
  5. Creates Snowflake stage for semantic models
  6. Uploads fox_v1.yaml semantic model to Cortex Analyst

Option 2: Partial Setup (Update Only)

Use this when the main viewership table already exists:

python partial_setup.py

What it does:

  1. Verifies existing VW_NIELSEN_PROGRAM_VIEWERSHIP_DAILY table
  2. Recreates DIM_CONFORMED_NETWORK dimension table
  3. Updates semantic model in Cortex Analyst
  4. Preserves existing viewership data

πŸ“‹ Generated Tables

VW_NIELSEN_PROGRAM_VIEWERSHIP_DAILY

Consolidated viewership data from all CSV files.

  • Purpose: Main fact table containing all program ratings
  • Key Columns: PROGRAM_DISTRIBUTOR, AVERAGE_AUDIENCE_PROJECTION
  • Rows: Combination of all input CSV files

DIM_CONFORMED_NETWORK

Network dimension for consistent reporting.

  • Purpose: Standardized network lookup table
  • Structure:
    • NETWORK_CODE: Unique program distributor identifier
    • NETWORK_NAME: Friendly network name (e.g., "Fox Broadcasting Company")

Join Pattern

SELECT *
FROM VW_NIELSEN_PROGRAM_VIEWERSHIP_DAILY v
LEFT JOIN DIM_CONFORMED_NETWORK n
  ON v.PROGRAM_DISTRIBUTOR = n.NETWORK_CODE

🧠 Cortex Analyst Integration

After running the setup scripts:

  1. Access Cortex Analyst

    • Go to Snowsight β†’ AI & ML β†’ Cortex Analyst
  2. Load Semantic Model

    • Database: FOX
    • Schema: V1_CLEAN
    • Stage: SEMANTIC_MODELS
    • File: fox_v1.yaml
  3. Example Natural Language Queries

    What are the top networks by total audience?
    Compare Fox viewership to other broadcasters
    Show me audience trends by network
    Which programs have the highest ratings?
    

πŸ”§ Configuration

Environment Variables (.env)

# Snowflake Connection
SNOWFLAKE_ACCOUNT=abc123.us-east-1
SNOWFLAKE_USER=your_username
SNOWFLAKE_PASSWORD=your_password
SNOWFLAKE_WAREHOUSE=COMPUTE_WH
SNOWFLAKE_ROLE=SYSADMIN

# Optional: Custom database name
# DATABASE_NAME=FOX

Semantic Model (fox_v1.yaml)

The semantic model defines:

  • Table relationships and joins
  • Metrics and dimensions for analysis
  • Synonyms for natural language queries
  • Custom instructions for query generation

🚨 Troubleshooting

Common Issues

1. CSV Files Not Found

❌ No CSV files found in csv_inputs
  • Solution: Add CSV files to the csv_inputs/ directory

2. Snowflake Connection Failed

❌ Failed to connect to Snowflake
  • Solution: Verify .env file configuration and network access

3. Column Mismatch in CSVs

⚠️ Column mismatch in filename.csv
  • Solution: Scripts auto-align columns, but verify CSV structure

4. Semantic Model Validation Error

❌ Invalid YAML file - missing required fields
  • Solution: Ensure fox_v1.yaml contains valid YAML with name: and tables: fields

Logs and Debugging

  • Scripts provide detailed logging with timestamps
  • Check Snowflake query history for SQL execution details
  • Verify stage contents: LIST @FOX.V1_CLEAN.SEMANTIC_MODELS

πŸ“ˆ Performance Considerations

  • Batch Processing: CSV data inserted in 1,000-row batches
  • Data Types: Automatic inference with NUMBER(15,2) for metrics
  • Indexing: Consider adding indexes on PROGRAM_DISTRIBUTOR for large datasets

🀝 Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature-name
  3. Commit changes: git commit -am 'Add feature'
  4. Push to branch: git push origin feature-name
  5. Submit pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“ž Support

For issues and questions:

  • Check the troubleshooting section above
  • Review script logs for detailed error messages
  • Verify Snowflake permissions and connectivity

πŸ“ Example Workflow

# 1. Setup environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# 2. Configure Snowflake connection
cp .env.example .env
# Edit .env with your Snowflake credentials

# 3. Add your CSV files
cp /path/to/your/nielsen/*.csv csv_inputs/

# 4. Run complete setup
python complete_setup.py

# 5. Access Cortex Analyst in Snowsight
# Database: FOX, Schema: V1_CLEAN
# Load semantic model: fox_v1.yaml

# 6. Start querying with natural language!

Sample Cortex Analyst Questions:

  • "What are the top 10 Fox programs by audience?"
  • "Compare CBS and NBC viewership performance"
  • "Show me total audience by network type"
  • "Which demographic groups watch the most Fox content?"

Built for Nielsen viewership analysis and Snowflake Cortex Analyst integration


This README provides comprehensive documentation for the v1-clean repository as a standalone project, including setup instructions, usage examples, troubleshooting, and integration details for Cortex Analyst. 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages