Theory and Practice of Efficient and Accurate Dataset Construction