Kaggle Tricks

The following are key learning points in Kaggle participation:

  1. When the Training loss and Validation loss are not converging/ not reducing, which means -It is an issue of LEARNING RATE.

  2. Check for good learning rate (5e-5, 5e-6, 6e-6...)

  3. Check for cosine scheduler with warm-up

  4. weight decay

  5. When, the GPU Memory error look for the following parameters

    1. Batch Size

    2. Gradient Accumulation Steps.

  6. When training any model, from the given training dataset, split your own Training split and validation split

  7. Be careful about the data leakage

  8. The techniques to detect the data leakage are:

a. look for the column values present in the test column values

b. Identify the columns through which the data leakage can be there

c. Split the data into train/validation spilt using group k-fold split, in which the data points occurs in train should not appear in validation.

d. If, the normal train/test split has high metric score than, the group k-fold split score, then there is a chance for the data leakage, carefully handle that.

  1. To detect and understand the data leakage a trick can be used such as Leader Board Probing. In which when a particular column values in submission are matching with the test samples, then submit random values else, submit the original prediction. Which result in, if there is data leakage, the score will near to zero else will result a positive score.