Feature Engineering Quality
15.0%Reviewer looks for thoughtful, justified feature construction: time-decay transformations, interaction terms, text-derived signals, and proper handling of data leakage. Great submissions explain why each feature group was chosen and show feature importance or ablation results.
Reviewers use a 0–100 score for this criterion.
Model Design and Evaluation Rigor
15.0%At least two distinct model families are trained, compared, and evaluated using NDCG@10 and AUC-ROC on a held-out split. Great submissions include a calibration check, discuss overfitting mitigations, and show the candidate understands ranking vs. classification objectives.
Reviewers use a 0–100 score for this criterion.
Production-Readiness of Serving Endpoint
30.0%The FastAPI endpoint is correct, containerized via Docker, handles edge cases (empty list, missing fields), and includes a latency benchmark showing per-request inference time. Great submissions also discuss how the endpoint would change under high QPS at Reddit scale.
Reviewers use a 0–100 score for this criterion.
Reproducibility and Code Quality
20.0%The repo runs end-to-end with a single command on a fresh environment. Code is clean, modular, typed where appropriate, and follows good software engineering practices. Dependencies are pinned. Great submissions include a brief CI config or at least a linting step.
Reviewers use a 0–100 score for this criterion.
System Scalability Reasoning in README
20.0%The README section on scale demonstrates genuine understanding of distributed ML systems: feature stores, offline vs. online feature computation, model serving latency budgets, Spark/Kafka pipelines, and A/B testing frameworks. Great submissions make concrete, realistic proposals rather than vague buzzword lists.
Reviewers use a 0–100 score for this criterion.