After submitting a job, revising, or deleting the code will not impact the submitted job.
Using pinmemory can speed up dataloader. It is mentioned here.
In dataloader, pin memory can not be used together with persistent_worker. The issue is discussed here:
When loading models trained in parallel, an extra step is required: here
tensor.detach() returns a new tensor as discussed here.
Runtime on Sulis: given that we can only use home/ folder, the runtime highly depends on the load on the file system. Specifically, if the load on home/ directory is high, our code may require a much longer runtime compared to when the load on home/ is low.
FP16 vs FP32: https://datascience.stackexchange.com/questions/73107/fp16-fp32-what-is-it-all-about-or-is-it-just-bitsize-for-float-values-pytho
get time in bash: https://unix.stackexchange.com/questions/428217/current-time-date-as-a-variable-in-bash-and-stopping-a-program-with-a-script
When enumerating data loader while using multiple GPUs it could time a long time.
The issue is discussed here, and the solution (multi-epoch data loader) works very well. Experimental results are as follows.