batchload
Batch/Load Share for ClusterGateOrg
Condor The goal of the Condor╝ Project is to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Guided by both the technological and sociological challenges of such a computing environment, the Condor Team has been building software tools that enable scientists and engineers to increase their computing throughput.
Portable Batch System (PBS from NASA) The Portable Batch System (PBS) project was intitiated to create a flexible, extensible batch processing system to meet the unique demands of emerging heterogeneous computing networks. You can submit your "batch job" on any machine and PBS will run your script on the machine you request when the resources you need are available. Your PBS system administrator can define the method used to choose what jobs to run where, and in what order. PBS can be installed on almost any UNIX machine, ranging from single-processor workstations to workstation clusters and massively parallel supercomputers. It's portable. It's flexible. It's distributed. And it's available absolutely free to U.S. beta test sites.
OpenPBS (Main open PBS site)
Torque TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC , the U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading edge HPC organizations.
SUN page gridware (long before it was COmputing in DIstributed Network Environments - CODINE)
Platform Computing: Load Share Facility - LSF. The Platform LSF family of products delivers a superior grid-enabled solution that is optimized for solving technical computing problems - for example, the electronics industry including semiconductor design, government and research for aerospace and defense contractors, automotive industrial manufacturers, and life sciences organizations such as biotechnology firms.
Platform LSF fully utilizes all IT resources regardless of operating system, including desktops, servers and mainframes to ensure policy-driven, prioritized service levels for always-on access to resources.
FBSNG -- Next Generation of FBS (at FNAL). FBSNG is a batch system designed for a farm architecture. Traditional batch system features such as job submission, job queueing, and load balancing are built into the system. FBSNG inherits fundamental design ideas from its predecessor FBS. Most important difference between FBSNG and FBS is that FBSNG is complete batch system and it does not rely on any external software such as LSF.
SLURM - is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.Also pls see Slurm Workload Manager.
And SLURM on SourceForge.
© 2009-2024
Andrey Ye. Shevel