where the executable train.sh is
#!/bin/bash
source /vol/vssp/ucdatasets/abcde/QLENV3/bin/activate
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vol/vssp/localsoft/External/cudnn/v7.1.4/cuda-9.2/lib64/
cd /user/HS204/MyPythonCodeDir
python3 TrainModel.py
A docker submit file example
universe = docker
# docker_image = nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
docker_image = registry.eps.surrey.ac.uk/samplernn-pytorch:4659
executable = /vol/vssp/ucdatasets/abcde/QLENV3/bin/python3
arguments = /user/HS204/MyPythonCodeDir/TrainModel.py
environment = "mount=/vol/vssp/smile,/vol/vssp/signsrc,/vol/vssp/ucdatasets,/vol/vssp/xyz12345/mnist"
log = $(cluster).$(process).log~
output = $(cluster).$(process).out~
error = $(cluster).$(process).error~
should_transfer_files = YES
transfer_output_files = Models
request_GPUs = 1
request_CPUs = 2
request_memory = 400
queue 1
To remotely access university HTCondor
1. The first step is to ssh to surrey using:
ssh username@access.eps.surrey.ac.uk
In order to do so, two-step authentication needs to be setup.
2. ssh to condor server:
ssh condor
3. A job can now be submitted to condor.
Example of submitting a Matlab process using Windows system
The executable file is:
#!/bin/bash
exec /opt/bin/matlab -nodisplay -nojvm -nosplash -r \
"run('PreprocessObservationData.m'); quit"
The submit file is:
#
# Example Job for HTCondor
#
####################
# --------------------------------------------
# Executable and its arguments
executable = ACUjob.sh
arguments =
# ---------------------------------------------------
# Docker: Executable might be stated in docker image so can be omitted
# If specified, overrides the one stated in docker image
# Need to specify the docker image to use
universe = vanilla
# -------------------------------------------------
# Input, Output and Log files
log = c$(cluster).p$(process).log
output = c$(cluster).p$(process).out
error = c$(cluster).p$(process).error
should_transfer_files = YES
transfer_input_files= PreprocessObservationData.m
when_to_transfer_output = ON_EXIT
# -------------------------------------
# Requirements for the Job
requirements = ( HasStornext == true )
# --------------------------------------
# Resource requirements
request_GPUs = 0
request_CPUs = 2
request_memory = 8192
# -----------------------------------
# Queue commands
queue 1
If the main entry PreprocessObservationData.m needs some local functions within the current directory, these need to be listed in the transfer_input_files. For instance, a function abc is called, transfer_input_files= PreprocessObservationData.m,Funcs/abc.m Adding the command addpath(genpath('./')) in PreprocessObservationData.m is not working under this condition.
After logging into the condor server, and trying to submit the process, An error come up that