Skip to main content
  1. ~/archivo # Case Studies/
  2. Two Years of NCLEX-RN Outcomes at Penobscot College of Nursing/

Findings

19 mins
Three findings worth following at depth. A campus spread that suggests where intervention should focus, a cohort decline that runs slightly steeper than the national trend, and a retake-conversion pattern that reframes as a problem of engagement rather than conversion.

At a Glance
#

Phase 03 ran six orientation queries that named the obvious dimensions and the shape of the data inside each one. Phase 04 develops three of those threads at depth. The campus spread is the largest in magnitude and the most actionable. The cohort decline is real but smaller than initial framings suggested. The retake conversion finding reframes a common assumption about where to focus retake support. The phase closes with a predictive-modeling subsection in R that names what a model could and could not do with this dataset, given the structural data limitations phase 01 flagged.

Each finding has the same shape: the SQL that produced the numbers, the numbers themselves, a comparison or counterfactual that puts the magnitude in context, and a one-paragraph implication.

Finding 1: The 26-Point Campus Spread
#

The headline finding from phase 01 was a 26-point spread in first-time NCLEX-RN pass rate between the lowest-performing campus (POU at 70.64 percent, $n=235$) and the highest (BTH at 96.39 percent, $n=194$). Phase 03 confirmed this spread is not a sampling artifact: the three lowest campuses have confidence intervals that do not overlap with the three highest. The structural variation is real.

The next question is what the spread implies for intervention. The naive reading is “fix the bottom three campuses.” The more useful reading is to compute what within-region variance does to the analysis. Region is the institution’s natural administrative grouping; campuses report to regional leadership. So the question is whether the spread is between regions (a regional-leadership problem) or within them (a campus-level problem that crosses regional lines).

-- Within-region campus spread. For each region, compute the
-- minimum and maximum campus first-time pass rate and the
-- spread between them. Compare the within-region spreads to
-- the 26-point overall spread to identify which regions are
-- internally heterogeneous and which are internally consistent.
WITH campus AS (
  SELECT
    region, campus,
    COUNT(*)                AS n,
    AVG(CAST(result AS REAL)) AS p
  FROM attempts
  WHERE attempt_number = 1
  GROUP BY region, campus
)
SELECT
  region                                      AS "Region",
  COUNT(*)                                    AS "Campuses",
  printf('%,d', SUM(n))                       AS "First Attempts",
  ROUND(100.0 * MIN(p), 2)                    AS "Min Campus %",
  ROUND(100.0 * MAX(p), 2)                    AS "Max Campus %",
  ROUND(100.0 * (MAX(p) - MIN(p)), 2)         AS "Within-Region Spread %"
FROM campus
GROUP BY region
ORDER BY MAX(p) - MIN(p) DESC;

Run this query in Datasette Lite

Result:

RegionCampusesFirst AttemptsMin Campus %Max Campus %Within-Region Spread %
Hudson Valley Region51,66670.6493.5422.90
Greater Boston Region42,57274.6389.6915.06
Lehigh Valley Region41,62982.9096.3913.49
Northern New England Region24781.8291.679.85
Connecticut Valley Region479186.3588.241.89

The Hudson Valley Region has a within-region campus spread of 22.90 percentage points, which is almost the entire 26-point institution-wide spread compressed into a single region. The same region contains POU (70.64 percent first-time pass) and SCH (93.54 percent), two campuses managed by the same regional leadership with vastly different student outcomes. The Greater Boston Region has a 15-point spread. The Connecticut Valley Region’s spread is only 1.89 points across four campuses, despite covering a comparable testing volume.

The reading: most of the institution’s variance is within regions, not between them. A regional leadership intervention that addresses Hudson Valley’s regional average would do nothing for the POU-versus-SCH gap, since those campuses sit on opposite ends of the regional distribution. The lever is at the campus level, not the regional level.

How much would campus-level intervention lift the institution’s overall rate? The cleanest framing is per-campus lift in absolute passes: for each underperforming campus, the number of additional first-time passes a lift to the institution-wide rate would produce. Filtering to campuses with at least 100 first-time attempts (the threshold below which Wald CIs become uninformative), the two clearest targets are POU (n=235, 70.64 percent first-time pass) and SPR (n=335, 74.63 percent). Lifting each to the institution-wide 85.68 percent would produce 35 additional passes at POU and 37 at SPR: 72 additional first-time passes across the two-year testing window, or roughly 36 per year. That is the magnitude an institution can act on at the campus-by-campus level. It is not a transformation; it is a recoverable gap.

The intervention design that follows from this finding is bottom-up campus-by-campus diagnosis, not top-down regional benchmarking. The case study cannot say what specifically explains the POU-versus-SCH gap; the dataset has no faculty-quality, curriculum-fidelity, or student-support metrics. But the analytical case for prioritizing campus-level inquiry over regional aggregates is direct.

Finding 2: The 2024-to-2025 Cohort Decline
#

Phase 03’s cohort breakdown showed first-time pass rates dropping from a 2024 average of 88.17 percent to a 2025 average of 83.96 percent, a 4.21 percentage point decline across the two calendar years. The natural comparison is against the national NCLEX-RN trend for the same period.

-- Annual first-time pass rate. The CTE aggregates by calendar
-- year by stripping the term suffix from the testing cohort
-- string. The result is a two-row table comparing 2024 and
-- 2025 directly.
SELECT
  CAST(substr(testing_cohort, 1, 4) AS INTEGER)         AS "Year",
  printf('%,d', COUNT(*))                               AS "First Attempts",
  ROUND(100.0 * AVG(CAST(result AS REAL)), 2)           AS "Pass Rate %",
  ROUND(100.0 *
    (AVG(CAST(result AS REAL)) -
     1.96 * SQRT(AVG(CAST(result AS REAL)) *
                 (1 - AVG(CAST(result AS REAL))) / COUNT(*))), 2) AS "CI Lower %",
  ROUND(100.0 *
    (AVG(CAST(result AS REAL)) +
     1.96 * SQRT(AVG(CAST(result AS REAL)) *
                 (1 - AVG(CAST(result AS REAL))) / COUNT(*))), 2) AS "CI Upper %"
FROM attempts
WHERE attempt_number = 1
GROUP BY substr(testing_cohort, 1, 4)
ORDER BY "Year";

Run this query in Datasette Lite

Result:

YearFirst AttemptsPass Rate %CI Lower %CI Upper %
20242,74788.1786.9689.38
20253,95883.9682.8185.10

The 2024 and 2025 confidence intervals do not overlap, so the decline is statistically real. The drop is 4.21 percentage points, with the 2025 rate’s confidence interval ending nearly two percentage points below where the 2024 rate’s confidence interval begins.

For comparison, the NCSBN1 reports first-time, U.S.-educated NCLEX-RN pass rates of 91.2 percent for the full year 2024, declining to roughly 87.5 percent through most of 2025. The national year-over-year drop is approximately 3.7 percentage points. The institution’s drop is 4.21 percentage points: roughly 1.14 times the national magnitude.

This is a meaningfully steeper decline than the national trend, but it is not the “twice the national rate” framing that initial readings of the data sometimes suggest. The honest statement is: the institution declined alongside the national trend, slightly faster than the national rate. The most prominent industry-level event in the window is the Next Generation NCLEX (NGN)2 format launching in April 2023, with 2024 cohorts largely composed of students trained before NGN took effect and 2025 cohorts largely composed of students trained after. The case study does not claim the NGN transition caused the institution-level decline; the national decline is consistent with that explanation, and the institution’s slight extra magnitude is consistent with institution-specific factors compounding the national pattern.

The institution tracks the national decline at a similar pace, sitting 3 to 4 points below national in both years.

First-time NCLEX-RN pass rate, institution vs NCSBN U.S.-educated national benchmark. NCSBN 2024 = 91.20%, 2025 estimate ~ 87.50%.

The chart is interactive. Hover over any bar or point to see the exact value; the chart re-themes automatically when the page toggles light or dark mode.

What this finding does not support is the framing that the institution is in a uniquely bad position. The institution is on the wrong side of the national trend, declining about 1.14 times as fast as the national rate. That is a problem worth addressing, but it is not the dominant story. The campus spread is a larger and more institution-specific concern, both in absolute magnitude (26 points across campuses versus 4.21 points across years) and in actionability (campus-level intervention is more concrete than “respond to NGN”).

Finding 3: Retake Conversion as Engagement Story
#

The retake-attempt distribution from phase 03 showed eventual pass rates remaining above 50 percent through five attempts. The national NCSBN data3 for repeat U.S.-educated NCLEX-RN candidates shows a pass rate around 53 percent for the relevant quarters. Two reasonable definitions of “retake” exist in the data: every row with attempt_number > 1 (which counts all retake sittings, including third and fourth attempts), and the student-level definition of a failed first-timer who returned in a later testing cohort. The two definitions give different numbers, and the methodological choice matters for what the data is being asked.

The student-level definition is more defensible. It counts students rather than test sittings, which avoids double-counting students who failed multiple retakes. It also constrains “retake” to attempts in a strictly later quarter, separating retake behavior from the NCLEX 45-day waiting period that allows two attempts within the same testing cohort. The framing this finding develops is at the student level: of the first-time failers, how many returned, and of those who returned, how many eventually passed.

-- EDA-aligned retake funnel at the student level. The
-- first_failers CTE finds every student whose attempt 1 was
-- a failure, with the ordinal of that testing cohort. The
-- strict_retakers CTE finds those who returned in a strictly
-- later cohort (any later quarter, any attempt number).
-- The strict_eventually_passed CTE filters strict retakers
-- to those who eventually passed on any later-cohort attempt.
-- The final SELECT assembles the funnel: failures, retook,
-- did-not-retake, retook-and-eventually-passed.
WITH first_failers AS (
  SELECT a.student_id, t.ordinal AS fail_ord
  FROM attempts a
  JOIN term_order t ON t.cohort = a.testing_cohort
  WHERE a.attempt_number = 1 AND a.result = 0
),
strict_retakers AS (
  SELECT DISTINCT ff.student_id
  FROM first_failers ff
  JOIN attempts a2  ON a2.student_id = ff.student_id
  JOIN term_order t2 ON t2.cohort = a2.testing_cohort
  WHERE t2.ordinal > ff.fail_ord
),
strict_eventually_passed AS (
  SELECT DISTINCT ff.student_id
  FROM first_failers ff
  JOIN attempts a2  ON a2.student_id = ff.student_id
  JOIN term_order t2 ON t2.cohort = a2.testing_cohort
  WHERE t2.ordinal > ff.fail_ord AND a2.result = 1
)
SELECT
  printf('%,d', (SELECT COUNT(*) FROM first_failers))                                          AS "First-Time Failures",
  printf('%,d', (SELECT COUNT(*) FROM strict_retakers))                                        AS "Retook",
  printf('%,d', (SELECT COUNT(*) FROM first_failers) - (SELECT COUNT(*) FROM strict_retakers)) AS "Did Not Retake",
  ROUND(100.0 * (SELECT COUNT(*) FROM strict_retakers)
                / (SELECT COUNT(*) FROM first_failers), 2)                                     AS "% Retook",
  printf('%,d', (SELECT COUNT(*) FROM strict_eventually_passed))                               AS "Eventually Passed",
  ROUND(100.0 * (SELECT COUNT(*) FROM strict_eventually_passed)
                / (SELECT COUNT(*) FROM strict_retakers), 2)                                   AS "Conversion %";

Run this query in Datasette Lite

Result:

First-Time FailuresRetookDid Not Retake% RetookEventually PassedConversion %
96039156940.7330176.98

The conversion number is striking: 76.98 percent of strict-later-term retakers eventually pass. The NCSBN national benchmark for repeat U.S.-educated NCLEX-RN takers sits around 53 percent. The institution’s retakers convert at about 24 percentage points above national, a gap large enough that it is unlikely to be explained by sampling, cohort timing, or methodological choice.

The naive reading is “the retake support is working.” That reading is correct as far as it goes, but it leads to the wrong next step. If retake support is already converting at 77 percent against a national baseline of 53 percent, the lever for further improvement is not making the retake program better. The lever is getting more first-time failers to engage with the retake program in the first place.

Two caveats on the funnel before the per-cohort breakdown. First, 569 first-time failers did not retake in a later testing cohort within the window. Some unknown share of these students retook after the window closed (right-censoring) and the funnel cannot see those returns. The 40.73 percent retook rate is a within-window observation, not a lifetime estimate. Second, the strict-later-term definition excludes students who retook within the same testing cohort as their first failure. The NCLEX 45-day waiting period allows this, and 196 first-time failers fall into this category. Including them as retakers raises the within-window engagement rate but does not change the conversion direction.

The counterfactual question is what would happen if the 569 first-time failers who did not retake within the window had returned at the observed conversion rate. 569 students multiplied by 76.98 percent conversion equals roughly 438 additional eventual passes. Adding those 438 to the 6,303 students who already eventually passed lifts the eventually-passed rate from 92.43 percent to 98.86 percent: a 6.43 percentage-point gain. That is meaningfully larger than the campus-spread lift from finding one (36 additional first-time passes per year), and it operates at a different leverage point: campus-level intervention is about teaching and curriculum, engagement-level intervention is about outreach and support for students who already failed once.

The counterfactual is bounded above by 100 percent realistic engagement, which no institution achieves. A more realistic target is engagement parity with the higher-engagement cohorts: lifting the 569 non-retakers to a 70 percent retake rate (which the higher-engaging cohorts in the per-cohort breakdown below approximate) would produce roughly 307 additional passes and lift the eventually-passed rate to about 96.94 percent. Either framing makes the same point. The retake program is doing real work; the unfilled space is engagement.

The retake funnel: most first-time failers do not return within the window.

Of 960 first-time failers, 569 did not retake in a later quarter within the testing window, 391 did, and 301 of those retakers eventually passed (76.98% conversion).

The chart is interactive. Hover over any bar or point to see the exact value; the chart re-themes automatically when the page toggles light or dark mode.

-- Per-cohort breakdown of the "did not retake" pattern under the
-- strict-later-term definition. For each testing cohort, count
-- failed first-timers and how many returned in a strictly later
-- quarter. The 2025WIQ row will read 100 percent because 2025WIQ
-- is the last cohort in the window; the right-censoring
-- discussion below addresses why.
WITH first_failers AS (
  SELECT a.student_id, a.testing_cohort AS fail_cohort, t.ordinal AS fail_ord
  FROM attempts a
  JOIN term_order t ON t.cohort = a.testing_cohort
  WHERE a.attempt_number = 1 AND a.result = 0
),
strict_retakers AS (
  SELECT DISTINCT ff.student_id, ff.fail_cohort, ff.fail_ord
  FROM first_failers ff
  JOIN attempts a2  ON a2.student_id = ff.student_id
  JOIN term_order t2 ON t2.cohort = a2.testing_cohort
  WHERE t2.ordinal > ff.fail_ord
)
SELECT
  ff.fail_cohort                                       AS "Testing Cohort",
  printf('%,d', COUNT(*))                              AS "First-Time Failures",
  printf('%,d', COUNT(*) - COUNT(sr.student_id))       AS "Did Not Retake",
  ROUND(100.0 * (COUNT(*) - COUNT(sr.student_id)) /
                COUNT(*), 2)                           AS "% Did Not Retake"
FROM first_failers ff
LEFT JOIN strict_retakers sr ON sr.student_id = ff.student_id
GROUP BY ff.fail_cohort, ff.fail_ord
ORDER BY ff.fail_ord;

Run this query in Datasette Lite

Result:

Testing CohortFirst-Time FailuresDid Not Retake% Did Not Retake
2024SPQ804961.25
2024SUQ513670.59
2024FAQ924650.00
2024WIQ1024847.06
2025SPQ1536341.18
2025SUQ1588453.16
2025FAQ1506946.00
2025WIQ174174100.00

The final row, 2025WIQ at 100 percent no-retake, is right-censoring rather than substance: 2025WIQ is the last cohort in the testing window, so no strictly-later-quarter retakes are visible. Those students may well have retaken in 2026, but the data does not reach forward into 2026 to find them. The same right-censoring affects every cohort to a lesser degree (later cohorts have fewer subsequent quarters in which their retakes could be observed), and the rates above are within-window observations rather than lifetime estimates.

The cohort-to-cohort variation among the seven non-final cohorts is what the case study can actually read. Non-retake rates range from 41 percent (2025SPQ) to 71 percent (2024SUQ), a 29-point spread that does not pattern cleanly by year, season, or testing volume. 2024SUQ at 71 percent and 2025SPQ at 41 percent show substantially different return behavior with no legible operational explanation in the available data. The variation is real, but its source is not legible from the available data.

The institution-level implication is the same regardless: the engagement gap is the lever, not the conversion rate. Two analytical follow-ups would clarify the cohort-to-cohort pattern. First, an extension of the testing window to 2026 cohorts would reduce the right-censoring distortion. Second, an institutional-record investigation of operational differences across the seven cohorts (regional policy changes, financial-aid timing, communication cadence with first-time failers) would surface the operational factors that the funnel arithmetic cannot.

Predictive Modeling: A Ceiling Analysis
#

The case study to this point has been descriptive. A reasonable next step in many engagements is a predictive model: given a student’s region, campus, program, cohort, and graduation timing, can the institution predict who will pass the NCLEX on their first attempt and intervene with targeted support for at-risk students?

The honest answer is that the available features cap what any model can do. The dataset has institution-centric predictors (campus, program, cohort) but no student-centric features that matter most for individual-level prediction: no demographics, no prior academic performance, no faculty assignment, no readiness-test scores, no progression through the curriculum. A logistic regression on the available features will identify campus and program-level differences (which the descriptive analysis above already identified), but will struggle to discriminate at the individual level.

The standard logistic regression model for a binary outcome uses the logit link function:

$$\log\left(\frac{p}{1 - p}\right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k$$

where $p$ is the probability of passing and $x_1$ through $x_k$ are the predictor features. The model is fit by maximum likelihood and produces a coefficient for each predictor. Predicted probabilities are then evaluated against held-out data using area under the receiver operating characteristic curve (AUC), with 0.5 being chance performance and 1.0 being perfect discrimination.

What the modeling adds beyond SQL is the ability to fit such a model and compute a discrimination metric. The R code below connects to the same SQLite the SQL queries above target, builds a feature matrix, fits a logistic regression with a train/test split, and computes the held-out AUC.

library(DBI)
library(RSQLite)
library(dplyr)
library(pROC)

# Connect to the same SQLite the SQL queries above target.
con <- dbConnect(SQLite(), "penobscot-nclex.sqlite")

# Pull first-time attempts joined with students for the
# derived terms_grad_to_first_test column.
df <- dbGetQuery(con, "
  SELECT
    a.result,
    a.region,
    a.campus,
    a.program,
    a.testing_cohort,
    s.terms_grad_to_first_test
  FROM attempts a
  JOIN students s ON s.student_id = a.student_id
  WHERE a.attempt_number = 1
    AND s.first_visible_attempt = 1
")
dbDisconnect(con)

# Categorical encoding.
df <- df %>%
  mutate(
    result          = factor(result, levels = c(0, 1)),
    region          = factor(region),
    campus          = factor(campus),
    program         = factor(program),
    testing_cohort  = factor(testing_cohort)
  )

# 70/30 train/test split, seeded for reproducibility.
set.seed(20260516)
train_idx <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
train <- df[train_idx, ]
test  <- df[-train_idx, ]

# Fit a logistic regression with campus, program, cohort, and
# graduation-to-test timing as predictors. Region is omitted
# because it is fully implied by campus.
model <- glm(
  result ~ campus + program + testing_cohort + terms_grad_to_first_test,
  data   = train,
  family = binomial(link = "logit")
)

# Predict on the held-out set and compute AUC.
test$prob <- predict(model, newdata = test, type = "response")
auc_obj   <- roc(test$result, test$prob, levels = c(0, 1), direction = "<")
cat("AUC on held-out set:", round(auc(auc_obj), 4), "\n")

Running this against the database produces an AUC in the range of 0.59 to 0.67, depending on the random split, with a mean of approximately 0.63. That is meaningfully above chance (0.50) but far below the 0.80-plus AUC that would justify per-student intervention decisions. The model is informative at the cohort and campus level (which the descriptive analysis above already covered), but it cannot reliably distinguish between two students at the same campus in the same program who differ only in their graduation timing.

What an AUC of 0.63 means in practical terms: if the model ranks students from highest to lowest predicted pass probability, students in the top half of the ranking pass about 90 percent of the time and students in the bottom half pass about 83 percent. The students most at risk are spread across the ranking rather than concentrated at the bottom. Even the lowest-predicted decile passes around 74 percent of the time. That is real discrimination, but it is too compressed to support targeted student-level intervention: the model is recapitulating the campus-level signal the descriptive analysis already surfaced, not adding new individual-level information on top of it.

The ceiling on this AUC is the data, not the model. The features available are mostly proxies for “which campus and program did this student attend,” and the within-campus and within-program variance in outcomes (the part that depends on individual-student factors) is invisible to the model because the features that would explain it are not in the dataset. A more sophisticated model (random forest, gradient boosting) would produce a similar AUC, because the limit is information content not model class.

What would change this is feature enrichment, not feature engineering. The features that would lift the AUC into the actionable range are the ones the source data does not include: prior academic performance (GPA, prerequisite course grades), readiness-test scores (HESI, ATI, Kaplan), and curriculum-progression metrics (course pass rates, clinical placement performance, time to first NCLEX attempt). The institutional analytical infrastructure that would make per-student prediction possible is not at the level of NCLEX outcome data; it is at the level of student-information-system integration with academic-performance and assessment-platform feeds.

The case study’s predictive-modeling conclusion is structural. With the available features, the best the institution can do is what this case study has already done: identify the campus-level and engagement-level levers, and intervene at those levels. Per-student prediction is not on the table without additional data. Naming this ceiling honestly is more useful than producing a model with mediocre AUC and presenting it as decision-grade.

Looking Ahead
#

The case study ends here as a published artifact. The three findings (campus spread, cohort decline, retake engagement) and the predictive-modeling ceiling are the analytical conclusions the data supports.

What an institutional follow-up would look like, given access to the institution’s full source data prior to anonymization, is straightforward to sketch. The campus-spread finding points to bottom-up diagnosis at the five campuses below the institution median, with comparison against the four campuses above (Lehigh Valley’s BTH and REA, Hudson Valley’s SCH and KIN). The retake-engagement finding points to a 2025WIQ-cohort post-mortem and a sustained engagement campaign for first-time failers at all cohorts. The predictive-modeling finding points to the data-integration work that would make per-student early-warning models possible.

The case study’s contribution is making these analytical questions sharp enough that they can be acted on. The methodology is replicable: the source phase documents what the published dataset is and why the institution is anonymized, the schema phase documents the table design that lets every query in this case study run cleanly, and the exploration phase documents the orientation queries that surface the threads this phase developed.

The reproducibility-is-the-floor commitment the biblioteca page makes holds throughout: every number in this case study traces to a query the reader can run against the published SQLite, in the browser, with no setup. The R code in the predictive-modeling section is the only piece that requires a local environment; everything else is one click away.