Machine Learning Engineer

Judging rubric

Criteria and level definitions for this challenge. Weights always sum to 100% across criteria.

Feature Engineering Quality

15.0%

Reviewer looks for thoughtful, justified feature construction: time-decay transformations, interaction terms, text-derived signals, and proper handling of data leakage. Great submissions explain why each feature group was chosen and show feature importance or ablation results.

Reviewers use a 0–100 score for this criterion.

Model Design and Evaluation Rigor

15.0%

At least two distinct model families are trained, compared, and evaluated using NDCG@10 and AUC-ROC on a held-out split. Great submissions include a calibration check, discuss overfitting mitigations, and show the candidate understands ranking vs. classification objectives.

Reviewers use a 0–100 score for this criterion.

Production-Readiness of Serving Endpoint

30.0%

The FastAPI endpoint is correct, containerized via Docker, handles edge cases (empty list, missing fields), and includes a latency benchmark showing per-request inference time. Great submissions also discuss how the endpoint would change under high QPS at Reddit scale.

Reviewers use a 0–100 score for this criterion.

Reproducibility and Code Quality

20.0%

The repo runs end-to-end with a single command on a fresh environment. Code is clean, modular, typed where appropriate, and follows good software engineering practices. Dependencies are pinned. Great submissions include a brief CI config or at least a linting step.

Reviewers use a 0–100 score for this criterion.

System Scalability Reasoning in README

20.0%

The README section on scale demonstrates genuine understanding of distributed ML systems: feature stores, offline vs. online feature computation, model serving latency budgets, Spark/Kafka pipelines, and A/B testing frameworks. Great submissions make concrete, realistic proposals rather than vague buzzword lists.

Reviewers use a 0–100 score for this criterion.

← Back to contest