A streamlined Python toolkit for processing Nielsen television viewership data and creating Cortex Analyst semantic models in Snowflake.
This repository contains scripts to:
- Consolidate multiple CSV files into a single Snowflake table
- Create network dimension tables for enhanced analytics
- Generate and deploy semantic models for Snowflake Cortex Analyst
- Enable natural language querying of television ratings data
- Python 3.8+
- Snowflake account with Cortex Analyst access
- Required Python packages (see requirements section)
- Nielsen viewership CSV files
-
Clone/Download this repository
git clone <repository-url> cd v1-clean
-
Create Python virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install required packages
pip install pandas snowflake-connector-python python-dotenv
-
Configure environment variables Create a
.env
file in the root directory:SNOWFLAKE_ACCOUNT=your_account_identifier SNOWFLAKE_USER=your_username SNOWFLAKE_PASSWORD=your_password SNOWFLAKE_WAREHOUSE=your_warehouse SNOWFLAKE_ROLE=your_role
v1-clean/
βββ README.md # This file
βββ complete_setup.py # Full setup script
βββ partial_setup.py # Partial setup (skip CSV processing)
βββ fox_v1.yaml # Cortex Analyst semantic model
βββ csv_inputs/ # Directory for Nielsen CSV files
β βββ ratings-dataset-1.csv
β βββ ratings-dataset-2.csv
β βββ ... (additional CSV files)
βββ .env # Environment configuration
βββ requirements.txt # Python dependencies
Place your Nielsen viewership CSV files in the csv_inputs/
directory. The scripts expect:
- CSV files with consistent column structures
- Required columns:
PROGRAM_DISTRIBUTOR
,AVERAGE_AUDIENCE_PROJECTION
- Header row with column names
- Any number of CSV files (automatically combined)
- Database:
FOX
- Schema:
V1_CLEAN
(automatically created from directory name) - Required permissions: CREATE TABLE, CREATE STAGE, USAGE on database/schema
Use this when setting up everything from scratch:
python complete_setup.py
What it does:
- Creates schema
FOX.V1_CLEAN
- Combines all CSV files from
csv_inputs/
- Creates consolidated table:
VW_NIELSEN_PROGRAM_VIEWERSHIP_DAILY
- Creates dimension table:
DIM_CONFORMED_NETWORK
- Creates Snowflake stage for semantic models
- Uploads
fox_v1.yaml
semantic model to Cortex Analyst
Use this when the main viewership table already exists:
python partial_setup.py
What it does:
- Verifies existing
VW_NIELSEN_PROGRAM_VIEWERSHIP_DAILY
table - Recreates
DIM_CONFORMED_NETWORK
dimension table - Updates semantic model in Cortex Analyst
- Preserves existing viewership data
Consolidated viewership data from all CSV files.
- Purpose: Main fact table containing all program ratings
- Key Columns:
PROGRAM_DISTRIBUTOR
,AVERAGE_AUDIENCE_PROJECTION
- Rows: Combination of all input CSV files
Network dimension for consistent reporting.
- Purpose: Standardized network lookup table
- Structure:
NETWORK_CODE
: Unique program distributor identifierNETWORK_NAME
: Friendly network name (e.g., "Fox Broadcasting Company")
SELECT *
FROM VW_NIELSEN_PROGRAM_VIEWERSHIP_DAILY v
LEFT JOIN DIM_CONFORMED_NETWORK n
ON v.PROGRAM_DISTRIBUTOR = n.NETWORK_CODE
After running the setup scripts:
-
Access Cortex Analyst
- Go to Snowsight β AI & ML β Cortex Analyst
-
Load Semantic Model
- Database:
FOX
- Schema:
V1_CLEAN
- Stage:
SEMANTIC_MODELS
- File:
fox_v1.yaml
- Database:
-
Example Natural Language Queries
What are the top networks by total audience? Compare Fox viewership to other broadcasters Show me audience trends by network Which programs have the highest ratings?
# Snowflake Connection
SNOWFLAKE_ACCOUNT=abc123.us-east-1
SNOWFLAKE_USER=your_username
SNOWFLAKE_PASSWORD=your_password
SNOWFLAKE_WAREHOUSE=COMPUTE_WH
SNOWFLAKE_ROLE=SYSADMIN
# Optional: Custom database name
# DATABASE_NAME=FOX
The semantic model defines:
- Table relationships and joins
- Metrics and dimensions for analysis
- Synonyms for natural language queries
- Custom instructions for query generation
1. CSV Files Not Found
β No CSV files found in csv_inputs
- Solution: Add CSV files to the
csv_inputs/
directory
2. Snowflake Connection Failed
β Failed to connect to Snowflake
- Solution: Verify
.env
file configuration and network access
3. Column Mismatch in CSVs
β οΈ Column mismatch in filename.csv
- Solution: Scripts auto-align columns, but verify CSV structure
4. Semantic Model Validation Error
β Invalid YAML file - missing required fields
- Solution: Ensure
fox_v1.yaml
contains valid YAML withname:
andtables:
fields
- Scripts provide detailed logging with timestamps
- Check Snowflake query history for SQL execution details
- Verify stage contents:
LIST @FOX.V1_CLEAN.SEMANTIC_MODELS
- Batch Processing: CSV data inserted in 1,000-row batches
- Data Types: Automatic inference with NUMBER(15,2) for metrics
- Indexing: Consider adding indexes on
PROGRAM_DISTRIBUTOR
for large datasets
- Fork the repository
- Create feature branch:
git checkout -b feature-name
- Commit changes:
git commit -am 'Add feature'
- Push to branch:
git push origin feature-name
- Submit pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check the troubleshooting section above
- Review script logs for detailed error messages
- Verify Snowflake permissions and connectivity
# 1. Setup environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# 2. Configure Snowflake connection
cp .env.example .env
# Edit .env with your Snowflake credentials
# 3. Add your CSV files
cp /path/to/your/nielsen/*.csv csv_inputs/
# 4. Run complete setup
python complete_setup.py
# 5. Access Cortex Analyst in Snowsight
# Database: FOX, Schema: V1_CLEAN
# Load semantic model: fox_v1.yaml
# 6. Start querying with natural language!
Sample Cortex Analyst Questions:
- "What are the top 10 Fox programs by audience?"
- "Compare CBS and NBC viewership performance"
- "Show me total audience by network type"
- "Which demographic groups watch the most Fox content?"
Built for Nielsen viewership analysis and Snowflake Cortex Analyst integration
This README provides comprehensive documentation for the v1-clean repository as a standalone project, including setup instructions, usage examples, troubleshooting, and integration details for Cortex Analyst.