Skip to main content
  1. ~/archivo # Case Studies/

Two Years of NCLEX-RN Outcomes at Penobscot College of Nursing

A SQL case study on two years of NCLEX-RN outcomes across nineteen campuses, walked through phase by phase.

At a Glance
#

This case study takes a 7,635-attempt slice of synthetic NCLEX-RN data, every test attempt by every student across 19 campuses of a multi-campus nursing college from winter 2024 through fall 2025, and walks through how to reason about it. Source, schema, exploration, findings: four phases, each short enough to read on its own, together documenting the full process from a flat CSV through to the SQL patterns that surface the interesting answers. The SQLite database produced at the end of phase 02 is queryable directly in the browser via Datasette Lite, so any reader can re-run every query in the case study.

The data is synthetic, derived from a real institutional engagement with every identifier randomized and every outcome value perturbed. The methodology is not. The point of the SQL-primary approach is to rebut the assumption that statistical work needs a procedural language: confidence intervals, group comparisons, and counterfactual aggregations are all expressible directly in SQL. R appears only in phase 04, where logistic regression hits the SQL ceiling. The full reasoning, including why the dataset is synthetic and what that does and does not change, lives in phase 01.

The Phases
#

Source

10 mins
Phase 1: Synthetic source, de-identification, and the first thread
NCLEX · Nursing Education · CSV · Public Data · Data Quality

Schema

10 mins
Three tables, twelve derived columns, one lookup
SQL · SQLite · Schema Design · Python · ETL

Exploration

16 mins
Six queries that get the shape of the data
SQL · SQLite · Exploratory Analysis · Datasette

Findings

17 mins
Three findings, one R supplement, an honest accounting of what the data can and cannot say
SQL · R · Logistic Regression · NCLEX · Nursing Education · Predictive Modeling