Table of Contents (draft)
Your dev and test sets should come from the same distribution
Establish a single-number evaluation metric for your team to optimize
If you have a large dev set, split it into two subsets, only one of which you look at