Cluster Job Issues

General Job Issues

When writing code and sbatch scripts, you want to add the tiniest bit of code that moves towards your goal, run it to make sure it works, and then add another tiny piece, and repeat. This will help you build your code and make sure it all works, as well as pinpointing where you run into issues.

If you are having issues using one of the clusters, this process will help you narrow down where your issue stems from. Be sure to do this list in order.



If you have an error of "User not found on host," go to go.pdx.edu/help, click Common Requests, and click "Get IT Help." Fill out that form, and in the Summary of Request line, put "Research Computer Help: 'User not found on host.'" In the description, please include the entire "User not found on host" block of error messages. 

Performance Issues

If your job is having issues running as fast as anticipated, there are several methods that can assist with determining the cause of this event. Try these after submitting a job and ssh-ing into the compute node(s) it is running on. An example of this would be that if squeue shows your job is on compute[125-126] or compute126, you would tunnel into compute126 with  ssh compute126 .