Don't use Hadoop for cold data storage.
There are other ways to store data, refer to the HPC Storage page.
Consider using Greene if you only need Spark. Your job may run faster on Greene.