Project library
Data & MLCerabyte Health · Data Science Intern · 2022

A no-show prediction model that looked great — until I found it was reading the future

Shared by M. Alvarez · ex-health-tech Data Science Intern

The sharer told this exact project in their health-tech interview and went on to receive the internship offer.

Step into this interview

4 real follow-ups from the actual loop · 1 hard · ~12 min

You answer each question first — only then does the sharer's real take open up.

How they told it

A small model to predict clinic no-shows hit suspiciously high accuracy in week one. The story is really about the leak I found and how I caught it.

Read the full telling

At a health-tech startup, my project was a model to predict which patients would no-show for appointments so the clinic could send extra reminders. My first version had an AUC around 0.94, which honestly felt too good for a summer intern. When I looked at feature importance, one feature dominated: an appointment 'status' field. It turned out that field got updated after the visit, so it already encoded whether the person showed up. Classic leakage, the model was reading the answer. I dropped every feature that was written at or after appointment time and rebuilt using only things known when the appointment was booked: lead time, prior no-show count, day of week, distance bucket, appointment type. AUC fell to about 0.71. Less impressive, but real. I picked a probability threshold by looking at what reminder capacity the front desk actually had, not by maximizing accuracy, since the classes were imbalanced. I wrote up the leakage story explicitly in my final doc because I wanted the next person to not trust that status field. The clinic ran a small pilot sending reminders to the top-risk bucket. I framed it as a promising signal, not a solved problem.

What they actually got asked

How exactly did you confirm the status field was leaking rather than just being predictive?

hard

Why AUC, and why not just report accuracy?

medium

How did you choose the classification threshold?

medium

What was the actual impact versus what you assumed going in?

easy