Bucketing

Often we batch data for training/inferencing. However each data might have different lengths. For example, sentences may have different lengths. Usually we pad up the shorter data lengths upto the longest data in a batch. In this article we will explore the probability distribution of the batched data, and ways to bucket it to reduce variations in sizes

Let X be the discrete random variable denoting the length of a datum.

TODO: Finish

Page updated

Google Sites

Report abuse