Skip to main content
  1. ~/archivo # Case Studies/

Two Decades of NIH Funding in Kentucky

A SQL case study on twenty years of NIH grant funding in Kentucky, walked through phase by phase.

At a Glance
#

This case study takes a 13,876-project slice of NIH RePORTER, every research grant awarded to a Kentucky institution from fiscal year 2005 through 2025, and walks through how to reason about it. Source, schema, exploration, findings: four phases, each short enough to read on its own, together documenting the full process from a blank query on a federal data portal through to the SQL patterns that surface the interesting answers. The SQLite database produced at the end of phase 02 is queryable directly in the browser via Datasette Lite, so any reader can re-run every query in the case study.

I picked Kentucky because I was born and raised there. The full reasoning, including why this state is well-suited to a single-file SQLite exercise and the export limit that shaped the scope, lives in phase 01.

The Phases
#

Source

12 mins
Phase 1: Source data and provenance
NIH RePORTER · CSV · Public Data · Data Quality

Schema

14 mins
Phase 2: Schema design from EDA
SQL · SQLite · Schema Design · Python · sqlite-utils

Exploration

12 mins
Phase 3: First-pass queries
SQL · SQLite · Exploratory Analysis · Datasette

Findings

12 mins
Phase 4: Window functions and CTEs
SQL · SQLite · Window Functions · CTEs · Datasette