This site has been moved to "http://sites.google.com/site/webspherelibrary"A Quick reference tutorial is available hereIndex1) What is Job Management System What is a Job Management System? It will help implement enterprise growth and change, not dictate the way the enterprise operates. Unicenter AutoSys JM is an automated job control system for scheduling, monitoring, and reporting. These jobs can reside on any configured machine that is attached to a network. What is a Job ? You can use the following methods to create job definitions: AutoSys Features - Graphical Calendar Facility - Automated Restart and Recovery - High Availability - Fault-tolerance - Load Balancing and Queue Management - Framework Management Integration - ERP Adapters Integration - Multiple Time Zone Support - Reporting Capabilities AutoSys for Unix/NT Environments - Mix and match across UNIX and NT platforms with complete interoperability. - Supports Windows, Unix – Sun Solaris, Aix , Hp -Ux Redhat & Suse Linux The AutoSys ArchitectureThe main Unicenter AutoSys JM system components are as follows: · Event Server (AutoSys database) · Event Processor · Remote Agent In addition, Unicenter AutoSys JM provides utilities to help you define, run, and maintain instances and jobs. The included utilities are platform-specific; however, all platforms include the graphical user interface (GUI) and Job Information Language (JIL). Both the GUI and JIL enable you to define, manage, monitor, and report on jobs. Event Server Event Processor ========================= Remote Agent On a UNIX machine, the remote agent is a temporary process started by the event processor to perform a specific task on a remote, or client, machine. On a Windows NT machine, the remote agent is a Windows NT service running on a client machine that is directed by the event processor to perform specific tasks. The remote agent starts the command specified for a given job, sends running and completion information about a task to the event server, and then exits. If the remote agent is unable to transfer the information, it waits and tries again until it can successfully communicate with the database. EXAMPLE Scenario in UNIX Explanation 2. The event processor communicates with the remote agent on WorkStation_2. As soon as the remote agent receives the instructions from the event processor, the connection between the two processes is dropped. After the connection is dropped, the job will run to completion, even if the event processor stops running. 3. The remote agent performs resource checks, such as ensuring that the minimum specified number of processes are available, then “forks” a child process that will actually run the specified command. 4. The command completes and exits, and the remote agent captures the command’s exit code. 5. The remote agent communicates the event (exit code, status, and
so forth) directly to the event server. If the database is unavailable
for any reason, the remote agent will go into a wait and resend cycle
until it can deliver the message. Only two processes need to be
running—the event processor and the event server. Machines · Server machine The server is the machine on which the event processor or the event server (database) reside. · Client machine The client is the machine on which the remote agent software resides, and where jobs run. A remote agent must be installed on the machine with the event processor, and it can also be installed on separate physical client machines. Instance An instance is defined by the following: - An instance ID, an uppercase three-alphanumeric identifier defined by the AUTOSERV environment variable. - You set the instance ID during installation and cannot change it. - The $AUTOUSER/config.$AUTOSERV configuration file. - At least one event server. - At least one event processor. Environment Autosys Directory Structure
=================================== How Autosys Connects to the Database=================================== All information is kept in a relational database (RDBMS) called the event server, which is configured for Unicenter AutoSys JM. Access to Unicenter AutoSys JM requires a connection to this database. That is, you must connect to the database to add, modify, control, report on, or monitor jobs, and to change certain configuration settings. The following figure shows the scenario for connecting to an ORACLE database.
=================================== Running Multiple Instances=================================== You can run multiple instances of Unicenter AutoSys JM on the same network at the same time. Some of the reasons you may want to run multiple instances are listed following: ü Your processing volume is large, and you want to distribute the load down to the departmental level. ü You want each department in your company to be insulated from what happens in other departments. ü You want to separate or test your development and production environments. Each instance must have its own event processor specified in the
configuration file and it must have its own instance-specific event
servers installed. Event processors from multiple instances can access
the same client machines to start jobs. To enable this, you must
install a remote agent on the client machine for each instance that
will run jobs on that machine. The following figure shows two instances
of Unicenter AutoSys JM, each with a single event server. Both
instances can send jobs to the same client machine as long as both
instances have a remote agent installed or configured on that client
machine.
=================================== Running Cross-Instance Job Dependencies=================================== A job defined to run on one instance could have as a starting condition the successful completion of a job running on a different instance. The specification for such a job dependency may appear as the following: condition: success(jobA) AND success(jobB^PRD) In this example, the success (jobB^PRD) condition specifies the successful completion of a job named jobB running on a different instance specified with the three-alphanumeric ID of PRD. If the dependency specification does not include a caret (^) and a different instance ID, the current instance will be used by default. Each time cross-instance job dependency is encountered, Unicenter AutoSys JM sends an EXTERNAL_DEPENDENCY event from the requesting instance. If the target instance cannot be reached, Unicenter AutoSys JM issues an INSTANCE_UNAVAILABLE alarm. ===================================================================== Event Processors and Cross-DependenciesWhen you implement cross-instance job dependencies, event processors can do the following: ü Run on different server machines or on the same server machine. ü Access the same client machines to start jobs. ü Send events to other instances. Note: If the event server of a target instance is down, the event processor will try to resend events every five minutes until it can reach the other instance’s event server. ============================================ High-Availability Options============================================ Unicenter AutoSys JM provides two high-availability options that lets Unicenter AutoSys JM keep processing even if an event server or event processor fails due to hardware or connection problems. These high-availability options are dual event servers and a shadow event processor. You can install and configure the high-availability options at the same time you install, or you can modify an existing installation to add the high availability options. ============================================ DUAL Event Server============================================ One way that Unicenter AutoSys JM provides high-availability is by running two event servers. The two event servers contain identical information, including job definitions and events, because Unicenter AutoSys JM reads and writes to both servers simultaneously. Unicenter AutoSys JM also keeps both event servers synchronized and provides complete recovery when one server becomes unusable, disabled, or corrupted. When processing events, the event processor reads from both event servers. If it detects an event on one server and not the other, it will copy the missing event to the other server. In this way, a temporary problem in getting events to one of the servers will not interrupt processing.
Running with Dual-Event Servers============================================ When running within dual-event server mode, and the event processor detects an unrecoverable condition on one of the event servers, it automatically rolls over to single server mode. A rollover results from one of the following conditions: ü The connection to the database is lost, and after the configured number of reconnect attempts, the database remains unconnected. ü The database has an unrecoverable error, for example, the database is corrupt or a media failure occurs. ============================================ Shadow Event Processor============================================ Another way that Unicenter AutoSys JM provides high-availability is through running with a shadow event processor. The shadow event processor is designed to take over event processing in case there is a failure of the primary event processor. The following figure illustrates a typical configuration running with primary and shadow event processors as well as dual-event servers. The shadow event processor and dual-event servers are independent features, but you can run them together.
Running with a Shadow Event Processor============================================ The shadow event processor is normally in idle mode, listening for routine pings from the primary event processor, which indicate all is well. If the shadow event processor stops receiving this signal, it assumes the primary event processor has failed. If the shadow event processor does not receive the signal, it checks the third machine (defined in the configuration file) for the .dibs file. If it cannot connect to the third machine, the shadow event processor shuts down. If it can connect and cannot locate the .dibs file, the shadow event processor creates the file, attempts to signal the primary event processor to stop, and takes over processing the events. If the file already exists, it shuts down. Similarly, if the primary event processor cannot locate and signal the shadow event processor,the primary processor checks the third machine for the .dibs file and follows the same procedure as the shadow event processor. The shadow event processor is designed primarily for the situation where the machine on which the primary event processor runs goes down, or the network on which this processor runs goes down. Particular care is given to ensuring that both event processors never take over at the same time. To achieve this, Unicenter AutoSys JM uses the third machine and the existence of the .dibs file to resolve contentions and to eliminate the case where one processor takes over because its own network is down. The shadow event processor is not guaranteed to take over in 100% of the cases where it theoretically could. For example, in the case of network problems, Unicenter AutoSys JM may not be able to determine which event processor is the functional one. In this case, both processors will shut down. ============================================ Commands============================================ ===================================================================== archive_eventsFunction: Removes old information from the database. archive_events will optionally copy the information to an archive directory before deletion. Syntax : archive_events {-n num_of_days | -j num_of_days | -l num_of_days} [-A] [-d directory_name] [-B batch_size] [-D data server:database | -D TNSname] [-t timeout_in_secs] Note: –t is a UNIX command only. Description: archive_events removes data from various database tables that are older than the specified number of days. You use this command to prevent the database from becoming full. If the -A option is used; the data is archived before it is deleted. It is copied into a default directory unless you specify a different directory with -d option. The –n option removes events and any alarms associated with them from the event table. The -j option removes information from the job_runs table. In Dual-Server mode, the data is archived from both servers at the same time. If information from these tables is not regularly purged from the database, the database can fill up rather quickly, stopping all processing. We highly recommend that you run archive_events during the database maintenance cycle. ================================================= autocal_ascFunction: Adds, deletes, and prints custom calendar definitions. Syntax: autocal_asc Description: autocal_asc provides a text-based, command line mechanism for creating, deleting, and printing custom calendars, which can be used to specify the days on which to start jobs, or days on which a job should not be started, for example: holidays. Each calendar has a unique name and a list of days. Once created, calendars can be referenced in a job definition. Use one of the following methods to apply a calendar: 1. In the Job Definition Date/Time Options dialog, enter a calendar name in the Run on Days in Calendar field or the Do NOT Run on Days in Calendar Exclude field. 2. With JIL, enter a calendar name in the run_calendar or exclude_calendar attribute. Whenever a calendar is updated, Unicenter AutoSys JM refigures the starting times for all jobs, which use that calendar. ================================================== autoconsFunction: Starts the Scheduler Console. Syntax: autocons Description: The autocons command starts up the Operator Console in UNIX, or the Scheduler Console in Windows for monitoring AutoSys jobs in real-time. You can also start the Console by clicking Ops Console in the GUI Control Panel. The Consoles lets you specify job selection criteria, which can be dynamically changed, to control, which jobs you want to view. This criteria includes the current job state, the job name (with wildcarding), and the machine on which the job runs. You can select any job and view more detailed information about it, including its starting conditions, dependent jobs, and autorep reports. The Operator Console and the Scheduler Console provides an Alarm Manager, which allows the monitoring of alarms as they are generated. In the Alarm Manager, you can do the following: Ø Enter responses to alarms. Ø Set the alarm’s state—either acknowledged or closed. =================================================== autopingFunction: Verifies that the various communication facilities are correctly configured and functioning. Syntax: autoping -m {machine|ALL} [-A] [-D] Description: autoping verifies that the server and client machines are properly configured and are communicating successfully. It also checks and verifies that the Remote Agent and the Remote Agent’s database connection are functioning correctly. If you are running Dual-Event Servers, it checks both database connections. If requested, it generates an alarm when problems are detected. Since these client/server communication facilities are critical to functioning, autoping provides valuable information for troubleshooting, and should always be used early in that process. When autoping is executed, the server (the machine from which
autoping is issued) establishes a connection with the client machine
and waits for the Remote Agent to respond. If successful, the following
message will be displayed on standard output at the server: Example To check all machines and verify their database access, enter: autorepFunction Reports information about a job, jobs within boxes, machines, and machine status. Also reports information about job overrides and global variables. Syntax: autorep {-J job_name | -M machine_name | -G global_name} [-s | - w | -d | -q | -o over_num] [-R run_num][-L print_level] [-N Retry] [-t] [-D data_server:database | -D TNSname] Description: autorep lists a variety of information about jobs, machines, and global variables currently defined in the database. You can use it to list a summary of all currently defined jobs, or to display current machine load information. autorep serves as a problem tracking tool by listing all relevant event information for the last run of any given job, or a specified job run. You can also use it to extract job definitions in JIL script format and save them to an output file for later reloading into AutoSys, as a means of backing up job definitions. autorep retrieves data from the database to formulate the reports. Any data that has been archived with archive_events will not appear in the reports. When listing nested jobs, subordinate jobs are indented to illustrate the hierarchy. The following sections describe the types of autorep reports. Columns in the autorep ReportThe columns in an autorep report vary with the type of report requested. The following table describes the columns. ============================================================== Status AbbreviationsThe following table lists the abbreviations used in the ST (status) column of the autorep report, and gives the status for each abbreviation. =============================================================== Event State AbbreviationsThe following table lists the abbreviations used in the ES (event state) column of the autorep report, and gives the status for each abbreviation. The following summary report is for a run of the Nightly_Download example. This command requests the report: autorep -J Nightly_Download Job Name Last Start Last End ST RunPri/Xit _______________ _____________________________________ _________ Nightly_Download 11/08/2009 17:00 11/08/2009 17:52 SU 101/1 Watch_4_file 11/08/2009 17:00 11/08/2009 17:13 SU 101/1 filter_data 11/08/2009 17:13 11/08/2009 17:24 SU 101/1 update_DBMS 11/08/2009 17:24 11/08/2009 17:52 SU 101/1 The following example lists all machines defined on the data server. This command requests the report: autorep -M ALL Machine Name Max Load Current Load Factor O/S ______________ ________ __________________ _____ london 100 0 1.00 Unix berlin 90 0 0.90 NT v_italy.rome 0 0 0.00 Unix v_italy.venice 0 0 0.00 Unix v_france.paris 100 0 1.00 NT To list the value of all global variables that have been set, enter: autorep -G ALL The output from this command would look similar to the following: Global Name Value Last Changed ———— ———— ——————- AUDIT_DIR /usr/audit 11/12 /1997 12:41:00 You can use the autorep command to extract job definitions in JIL script format and direct the output to a file. The following example shows how to save all job definitions to a file. autorep -J ALL -q > dump_file The output of this command is formatted exactly as a JIL job definition script, like the following: insert_job: test_job job_type: c command: sleep 60 machine: juno #owner: jerry@jupiter permission: gx,ge,wx alarm_if_fail: 1 ====================================================================== autosyslogFunction: Displays the Event processor and Remote Agent log files. Syntax: autosyslog [-e | -J job_name] [-p] Description: autosyslog is used to view either the event processor log file or the Remote Agent log file for the specified job. Both the Remote Agent and Event Processor write diagnostic messages to their respective logs, as part of their normal operations and in response to detected error conditions. autosyslog provides useful troubleshooting information because the event processor logs all events it processes and provides a detailed trace of its activities. If Unicenter AutoSys JM appears to be behaving abnormally, these logs are the first places you should look. Using autosyslog to view the event processor log is the same as issuing the following command: tail -f $AUTOUSER/out/event_demon.$AUTOSERV The last 10 lines of the event processor log file are displayed when the autosyslog command is issued. The log file is updated continually as processing occurs. To terminate the display of the log, press Ctrl+C in the display window. Remote Agent log The autosyslog utility can be a useful diagnostic tool when jobs fail. This command, when provided with the name of a job, displays the log of the job’s most recent run. Although the Remote Agent’s log file is automatically deleted by default after a successful job run, the log file will not be deleted at job completion if the job ended with a FAILURE status. ======================================================== autosys_secureFunction: Maintains the Edit and Exec superuser ownerships, remote authentication methods and database password. Also maintains Windows user IDs and passwords, which are required for jobs to run on Windows client machines, and performs eTrust AC administrative tasks. Syntax: autosys_secure or: autosys_secure [-h] [-q] {-a | -c | -d} {–u | -editu | -execu} user@host_or_domain [-o old_password] [–p password] [-host domain_or_host] Description: You use the autosys_secure command to specify the Edit Superuser and Exec Superuser, the database password, remote authentication method, and Windows user IDs and passwords. You can also use autosys_secure to enable eTrust security within Unicenter AutoSys JM and perform basic eTrust administration operations. Edit Superuser and Exec Superuser Two users have administrator privileges: the Edit Superuser and the Exec Superuser. The Edit Superuser is the only user with permission to do the following: ü Edit or delete any job, regardless of who owns it and what permissions are set for it. ü Change the owner of a job. ü Change the database password, remote authentication method, and Windows user passwords. The Exec Superuser is the only user with permission to: ü Issue start or kill any job, regardless of the execution permissions on the specified job. This user can affect how jobs run, typically by issuing the sendevent command. ü Shut down the Event processor (by sending the STOP_DEMON event). ================================================================================================================================= System StatesEventsThe following is the list of events that Unicenter AutoSys JM processes. Some of these events are generated internally, while some only occur when sent manually using the sendevent command. In effect, manual events are runtime commands for the event processor. In the listing following, each event’s internal code assignment is provided next to the event in parenthesis. This code number is used for viewing the event in the database event table. For more information, see the chapter “Commands,” in this guide. ALARM (106) An alarm is an informational event only; it invokes no action on its own. The type of alarm is further qualified by the value of the alarm, described later in this appendix. An alarm is generally an internal event, but an alarm can also be sent manually if an application wants to alert an operator. CHANGE_PRIORITY (120) Changes the priority of a job. If the job is in the QUE_WAIT state, it changes it immediately, and possibly starts the job. If the job is not yet in the QUE_WAIT state, it changes the priority for the next run of the job only. A permanent change of priority can be done by editing the job definition. CHANGE_STATUS (101) Changes the value of the status for a specific job. When the event processor processes this event, it initiates any actions that are dependent upon this status of this job. The values of status are listed later in this appendix. CHECK_HEARTBEAT (116) Instructs the event processor to check all jobs that have specified a heartbeat interval to see if any are missing. If so, a MISSING_HEARTBEAT alarm will be sent. If the event processor is configured to do so, it will perform this check automatically. CHK_BOX_TERM (118) An internally generated event that instructs the event processor to check if a box job has run for more than its Maximum Runtime (max_run_time) value. CHK_MAX_ALARM (114) An internally generated event, instructing the event processor to check if a job has run for more than its Maximum Runtime value. CHK_RUN_WINDOW (122) A future event set to run at the end of a job’s run window, to see if the job has run or not. COMMENT (117) For information purposes only. This event can be associated with a job and as a result, is displayed on reports (autorep). It is a method for generating comments at runtime and have them be associated with a specific run of a job. DELETEJOB (119) Tells AutoSys to delete this job. If the job is a box, it deletes everything within the box. EXTERNAL_DEPENDENCY (127) Sent from an issuing instance to a different, receiving instance to signal that a cross-instance dependency has been dispatched. FORCE_STARTJOB (108) Event to start a job, regardless of any conditions on this job. This event is never generated, and should be used only in the event of system problems. Using this event, it is possible to start the same job twice, and as a result, have two instances of the job running at the same time. For this reason, we recommend that this command be used only with extreme caution. Note: If you FORCE_START a job that has a status of ON_ICE or ON_HOLD, upon completion (either success or failure), the status/condition does not change back to the previous condition. For example: You scheduled Job-1 to run every Monday at 3:00 A.M, however, on Sunday you placed this job ON_HOLD. If you FORCE_START Job-1 on Wednesday at 2:00 P.M., Job-1 will run to completion (either success or failure), and then run again as scheduled on Monday at 3:00A.M. HEARTBEAT (115) The event sent from the Remote Agent posting a heartbeat for a given job. This event is internally generated. JOB_ON_ICE (110) Event that instructs the event processor to place a job ON_ICE. If the job is in the STARTING or RUNNING state, it will not place the job ON_ICE. This event is manually generated. JOB_OFF_ICE (111) Event that instructs the event processor to take a job OFF_ICE. If the job is in a RUNNING box, it will attempt to start it, conditions permitting. This event is manually generated. JOB_ON_HOLD (112) Event that instructs the event processor to place a job ON_HOLD. If the job is in the STARTING or RUNNING state, it will not place the job ON_HOLD. This event is manually generated. JOB_OFF_HOLD (113) Event to take the job OFF_HOLD. The starting of the job will continue as it was before it was placed ON_HOLD. This method takes a job OFF_HOLD when using the AutoHold feature. KILLJOB (105) Instructs the event processor to kill a specific job. If the specified job is a box, it will change the box status to TERMINATED, and, if so configured, kill the jobs within it. This event is manually generated. STARTJOB (107) Event to start a job, if and only if the starting conditions are satisfied, and if it is not already running. STARTJOB is the recommended way to start a job ====================================================================== ALARMSThe following is a list of the alarms that may be generated. AUTO_PING (526) The autoping -M -A command cannot connect to a client machine. The name of the machine is listed. CHASE (514) The chase command has found a problem with a job that is supposedly running. The job and the problem are listed. DATABASE_COMM (516) The Remote Agent had trouble sending an event to the database. The job probably ran successfully. Inspect the Remote Agent Log file to determine what happened. DB_PROBLEM (523) There is a problem with one of the databases, such as a lack of free space. This alarm can trigger a user-specified notification procedure. DB_ROLLOVER (519) Unicenter AutoSys JM has rolled over from Dual-Server to Single-Server Mode. This alarm can trigger a user-specified notification procedure. DUPLICATE_EVENT (524) Duplicate events have been received in the Event Server. Typically, this means that two event processors are up and running, although duplicate events can also be caused by Event Server configuration errors. EP_HIGH_AVAIL (522) Can mean that the Third Machine for resolving contentions between two event processors cannot be reached, that the event processor is shutting down, or that there are other event processor take over problems. This alarm can trigger a user-specified notification procedure. EP_ROLLOVER (520) The Shadow event processor is taking over processing. This alarm can trigger a user-specified notification procedure. EP_SHUTDOWN (521) The event processor is shutting down. This may be due to a normal shutdown (SEND_EVENT triggered by sendevent -E STOP_DEMON), or due to an error condition. This alarm can trigger a user-specified notification procedure. EVENT_HDLR_ERROR (507) The event processor had an error while processing an event. The job associated with that event should be inspected to see if manual intervention is required. EVENT_QUE_ERROR (508) An event was not able to be marked as processed. This is usually due to a problem with the Event Server. Contact Computer Associates Technical Support. EP_SHUTDOWN (521) The event processor is shutting down. This may be due to a normal shutdown (SEND_EVENT triggered by sendevent -E STOP_DEMON), or due to an error condition. This alarm can trigger a user-specified notification procedure. EVENT_HDLR_ERROR (507) The event processor had an error while processing an event. The job associated with that event should be inspected to see if manual intervention is required. Exit CodesWhen you use the autosyslog -J command to display the Remote Agent log file for a specified job, you may see an entry containing one of the following exit codes. If the exit code contains two numbers in parentheses, for example: (0 1), the first number is the UNIX signal, and the second number is the exit code. If a job is killed or terminated, the exit code remains at zero, which is what it was set to when the job started |












