SGE (Sun Grid Engine) is a distributed resource management software used to schedule and manage workloads in a cluster environment. Think of it as a smart scheduler that organizes tasks efficiently, ensuring fair and optimal resource allocation for all users.
How does it work?
SGE handles jobs in a cluster environment by managing their submission, prioritization, and execution based on resource availability. It ensures that jobs are executed efficiently by balancing the workload across the cluster.
Key Concepts
Job submission: Tasks (jobs) are submitted to SGE with details about resource needs and expected runtime.
Queue: SGE organizes submitted jobs into queues based on resource requirements and job priorities.
Resource allocation: Available resources are allocated to jobs in the queue, considering factors like priority and fairness.
Job execution: SGE executes jobs on allocated resources and monitors their progress.
SGE simplifies job scheduling in cluster environments, promoting efficient utilization of computational resources and ensuring fair access for all users.
Submitting Jobs
To submit a job to SGE, you create a script with job details, then submit it using the qsub command.
qsub myscript.sh
Example SGE Script
#!/bin/bash
#$ -N myjob # Job name
#$ -q compute # Queue name
#$ -pe smp 4 # Number of CPU cores
#$ -l h_rt=01:00:00 # Walltime (runtime limit)
#$ -cwd # Run job in current working directory
# Commands to execute:
echo "Hello, SGE!"
Key Options
-N: Specifies the job name.
-q: Specifies the queue to submit the job.
-pe: Defines the parallel environment and number of CPU cores.
-l h_rt: Sets the maximum runtime for the job.
-cwd: Ensures the job runs in the directory where it was submitted.
Checking job status: Use qstat to view job status and queue information.
qstat
Cancelling jobs: Cancel a job with
qdel job_id
Job details:
qstat -j job_id