Builders' Echo

Senior/Staff Applied Machine Learning Scientist

Judging rubric

Criteria and level definitions for this challenge. Weights always sum to 100% across criteria.

Calibration quality

25.0%

Reliability diagram is included and shows close alignment between mean predicted probability and observed click rate across at least 10 equal-frequency bins. ECE is explicitly computed and reported. An excellent submission has ECE < 0.01 on the local validation split and discusses why calibration matters for bid pricing.

Reviewers use a 0–100 score for this criterion.

Predictive performance on hidden holdout

25.0%

Log-Loss on the platform's hidden holdout (rows 5M–6M of Criteo train.txt). Score is cross-referenced with the candidate's self-reported validation Log-Loss to detect leakage. An excellent submission is within 1% of the best holdout Log-Loss seen across all submissions and shows no sign of overfitting or leakage.

Reviewers use a 0–100 score for this criterion.

Feature engineering depth and justification

20.0%

At least one non-trivial encoding technique is applied correctly to high-cardinality categorical features. The README explains why that technique was chosen over alternatives, quantifies its impact (e.g., ablation Log-Loss delta), and discusses risks such as target leakage in target encoding. An excellent submission includes a small ablation table.

Reviewers use a 0–100 score for this criterion.

Code quality and reproducibility

20.0%

train.py runs end-to-end with a single command; all random seeds are pinned; the serialized artifact is loadable by predict.py on a fresh environment; requirements are fully specified. An excellent submission produces identical validation metrics on two independent runs and passes a basic lint check.

Reviewers use a 0–100 score for this criterion.

Engineering judgment and production readiness write-up

10.0%

README section on production considerations addresses at minimum: handling feature distribution shift, latency constraints in a real-time bidder, and a model monitoring strategy. An excellent submission proposes a concrete online update or drift-detection approach relevant to programmatic advertising.

Reviewers use a 0–100 score for this criterion.

← Back to contest