Getting Started
Your data is in a database. Use SQL or Python to explore it:
SQL: SELECT * FROM case_data LIMIT 10
Python: df = pd.read_csv('case_data.csv')
See data_dictionary.md for column descriptions.
Watch candidates solve real business cases in a notebook with an AI assistant.
See every action and every prompt — learn how they think, and how they work with AI.
We capture every action, every line of code, every prompt.
Your data is in a database. Use SQL or Python to explore it:
SQL: SELECT * FROM case_data LIMIT 10
Python: df = pd.read_csv('case_data.csv')
See data_dictionary.md for column descriptions.
How they frame an ambiguous problem, what tradeoffs they prioritize, and whether they can tell when an answer is actually useful — not just technically correct.
Did they verify the AI's output? Did they catch its mistakes? Did they over-trust it? We watch the full session — every prompt, every accepted suggestion, every override.
Research-backed evaluation rubrics. Every score traces back to the exact evidence.
Sample interview · Assessment submitted Apr 30, 2026, 6:08 PM
Strong fit for the analyst role's core ask — the candidate framed the business question before reaching for tooling and used AI as a focused executor. Worth probing in interview: recommendations stayed at the diagnostic level and didn't push into the operational cutoffs this role owns.
Reaches a defensible, evidence-backed recommendation but stops short of an operational cutoff.
Delegated one well-scoped task to AI and read the output critically instead of paraphrasing it.
Candidate-authored work shows solid domain framing; independent statistical depth is good, not standout.
The notebook reaches a defensible recommendation — a basic lift check kills the initial hypothesis and a controlled model isolates the real driver. The work stops at the diagnostic level; an operational cutoff is left for a follow-up.
Used AI as a precise executor: one well-scoped prompt with named variables and the exact output wanted, then read the result table critically rather than paraphrasing it.
Candidate-authored work shows solid domain framing — quiz answers name industry factors not visible in the data, and the written findings reflect stakeholder constraints. Independent statistical reasoning is good, not standout.
Case
A short business framing for the case appears here — who the team is, what they're trying to decide, and the constraints they're operating under. Two or three sentences set the stage without prescribing the answer.
[Framing question 1 goes here — sets up the business hypothesis the candidate needs to push back on.]
[Sample candidate response — typically 2–4 sentences naming the candidate's mental model, the variables they'd reach for, and the assumption they want to test. Long-form free text, no character cap.]
[Framing question 2 goes here — asks the candidate where they'd start the analysis and why.]
[Sample candidate response — describes the first cut of the data they'd run, what they'd compare it against, and which secondary check would either confirm or kill the hypothesis.]
[Framing question 3 goes here — drops a partial statistic on the candidate and asks what they'd interpret and check next.]
[Sample candidate response — names the missing baseline, describes the lift calculation they'd want to do, and adds a sanity-check on a likely confounder.]
SELECT * FROM case_data LIMIT 10
df = pd.read_csv('case_data.csv')
data_dictionary.md for column descriptions.
import pandas as pd df = pd.read_csv('case_data.csv') # Headline numbers outcome_rate = df['target'].mean() segment_share = (df['segment_flag'] == 1).mean() print(f'overall outcome rate: {outcome_rate:.4f}') print(f'segment share overall: {segment_share:.4f}') # Lift: P(segment | event) / P(segment) events = df[df['target'] == 1] seg_among = (events['segment_flag'] == 1).mean() print(f'lift: {seg_among / segment_share:.3f}x')
overall outcome rate: 0.0xxx segment share overall: 0.xxxx lift: 1.0xx (>1 means over-represented)
# Outcome rate stratified by quartile of candidate variable df['var_q'] = pd.qcut(df['candidate_var'], 4, labels=['Q1 low','Q2','Q3','Q4 high']) print(df.groupby('var_q')['target'].mean().round(4))
var_q Q1 low 0.0xxx Q2 0.0xxx Q3 0.0xxx Q4 high 0.xxxx Name: target, dtype: float64
import statsmodels.api as sm features = ['segment_flag', 'candidate_var', 'control_a', 'control_b'] X = sm.add_constant(df[features]) y = df['target'] model = sm.Logit(y, X).fit(disp=False) print(model.summary())
coef OR ci_low ci_high p_value const -x.xxxx 0.0xx 0.0xx 0.0xx 0.000 segment_flag 0.0xxx 1.0xx 0.8xx 1.4xx 0.6xx candidate_var x.xxxx xx.xx xx.xx xxx.xx 0.000 control_a -0.0000 1.000 1.000 1.000 0.3xx control_b 0.0000 1.000 1.000 1.000 0.2xx
[Final-answer prompt goes here — asks the candidate to summarize findings and make a recommendation to a named stakeholder.]
[Sample candidate final answer — leads with the recommendation, then 2–3 supporting bullets, then a confidence note and a proposed validation step.] What I found: — First supporting bullet: the original hypothesis didn't survive the basic lift check. — Second bullet: the real signal sits on a different variable, with a clean monotonic gradient across quartiles. — Third bullet: a multivariate model confirms the direction — the focal variable is non-significant once controls are added; the alternative is strongly significant. Recommendation: ship the alternative routing rule; hold off on the original. Confidence: high on direction, medium on cutoff — next step is a holdout validation on a recent vintage.
[Resume-claim verification question — references a specific result the candidate cited and asks them to walk through how they validated it.]
Resume · prior role[Process-probe question — references a project on the resume and asks how the candidate handled the operational constraints around it.]
Resume · prior project[Critical-eval question — surfaces a specific output the candidate accepted at face value and asks them to interpret it more carefully.]
Cell #4 at min 6[Depth-check question — acknowledges the candidate stopped at a coarse cut and asks how they'd land a specific operational threshold with more time.]
Section 2 Q1[Hold-out-concern question — surfaces a model-quality metric the candidate didn't flag and asks whether it would change their ship/no-ship call.]
Cell #4 at min 6The same business problems your team faces — modeled in synthetic data engineered for real analytical depth.
Talk through a problem. No hands on the data, no AI.
Not just talk. Not another coding test. A customized data project.
LeetCode-style algorithm tasks without AI.
AI allowed, the goal is still get the code right.
![]() |
![]() |
|||
|---|---|---|---|---|
| Case content | SQL + Python algorithm tasks | Standardized DS task batteries | Interviewer-brought notebooks or take-homes | Real-world DS cases — messy data, business framing |
| AI policy | Banned or flagged | Limited, discouraged | Up to the interviewer | AI required — full notebook + assistant, same as the actual job |
| What's measured | Code correctness + speed | Benchmarked task performance | Code quality + communication (interviewer-scored) | 3-axis rubric — Final Deliverable, AI Collaboration, Domain Expertise |
| Workflow realism | Single coding window | Guided single-task workspace | Live notebook + chat | Framing quiz → AI-native IDE → written findings (full loop) |
| Report evidence | Score + a few snippets | Rubric score, percentile | Interviewer notes | Every scoring result cites a specific cell, prompt, or quiz answer |
| Hiring-fit read | Generic percentile | Standardized benchmark | Interviewer judgment | Summary anchored against your JD + hiring priorities |
| Interview prep | None | None | Live-session notes | 5 case + 5 resume follow-up questions, evidence-tagged |
The data job has been quietly rebuilt around AI. Writing code isn't the work anymore — it's framing the right question, judging what the model gives back, and knowing when to push back. That's not on a résumé. You only see it in the work.