OMS Does Not Start After Server is on New Subnet

References

Oracle® Database Net Services Reference 11g Release 1 (11.1)

Oracle® Database Reference 11g Release 1 (11.1)

Oracle® Database SQL Language Reference 11g Release 1 (11.1)

Oracle® Enterprise Manager Administration 10g Release 5 (10.2.0.5)

RHEL 5.0 Deployment Guide

Overview

GRIDCTRL is having a problem of starting up OMS and OMA after I moved the server from the 10.1.1.0/24 subnet to 192.168.1.0/24.

The root cause was an entry in /etc/hosts pointing to the old subnet.

Review of Changes Made

I updated the DNS files on GRIDCTRL to refer to the new subnet:

  • /var/named/chroot/etc/named.conf
  • /var/named/chroot/var/named/yaocm.id.au.rr.zone
  • /var/named/chroot/var/named/yaocm.id.au.zone

The named service was restarted via the Services Configuration Tool.

Analysis

Starting OMS

When I tried to start OMS on GRIDCTRL (Using emctl to Start, Stop, and Check the Status of the Oracle Management Service), I get the following messages:

[oracle@gridctrl ~]$ emctl start oms Oracle Enterprise Manager 10g Release 5 Grid Control Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. opmnctl: opmn is already running Starting HTTP Server ... Starting Oracle Management Server ... Checking Oracle Management Server Status ... Oracle Management Server is Down.

Review Log

The following messages appearred in /opt/oracle/app/OracleHomes/oms10g/sysman/log/emoms.trc whenever the OMS tries to start:

2012-02-02 17:21:22,320 [Orion Launcher] WARN jdbc.ConnectionCache _getConnection.352 - Io exception: The Network Adapter could not establish the connection java.sql.SQLException: Io exception: The Network Adapter could not establish the connection at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:137) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:174) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:286) at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:332) at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:429) at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:152) at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:31) at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:608) at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:217) at oracle.jdbc.pool.OracleConnectionPoolDataSource.getPhysicalConnection(OracleConnectionPoolDataSource.java:113) at oracle.jdbc.pool.OracleConnectionPoolDataSource.getPooledConnection(OracleConnectionPoolDataSource.java:76) at oracle.jdbc.pool.OracleImplicitConnectionCache.makeCacheConnection(OracleImplicitConnectionCache.java:1361) at oracle.jdbc.pool.OracleImplicitConnectionCache.getCacheConnection(OracleImplicitConnectionCache.java:439) at oracle.jdbc.pool.OracleImplicitConnectionCache.getConnection(OracleImplicitConnectionCache.java:334) at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:285) at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:253) at oracle.sysman.util.jdbc.ConnectionCache._getConnection(ConnectionCache.java:336) at oracle.sysman.util.jdbc.ConnectionCache._getConnection(ConnectionCache.java:322) at oracle.sysman.util.jdbc.ConnectionCache.getUnwrappedConnection(ConnectionCache.java:575) at oracle.sysman.emSDK.svc.conn.FGAConnectionCache.getFGAConnection(FGAConnectionCache.java:218) at oracle.sysman.emSDK.svc.conn.ConnectionService.getPrivateConnection(ConnectionService.java:1162) at oracle.sysman.emSDK.svc.conn.ConnectionService.getRepositoryVersionAndMode(ConnectionService.java:762) at oracle.sysman.emSDK.svc.conn.ConnectionService.verifyRepositoryEx(ConnectionService.java:840) at oracle.sysman.emSDK.svc.conn.ConnectionService.verifyRepository(ConnectionService.java:934) at oracle.sysman.eml.app.ContextInitializer.contextInitialized(ContextInitializer.java:301) at com.evermind.server.http.HttpApplication.initDynamic(HttpApplication.java:1020) at com.evermind.server.http.HttpApplication.(HttpApplication.java:560) at com.evermind.server.Application.getHttpApplication(Application.java:915) at com.evermind.server.http.HttpServer.getHttpApplication(HttpServer.java:707) at com.evermind.server.http.HttpSite.initApplications(HttpSite.java:637) at com.evermind.server.http.HttpSite.setConfig(HttpSite.java:278) at com.evermind.server.http.HttpServer.setSites(HttpServer.java:278) at com.evermind.server.http.HttpServer.setConfig(HttpServer.java:179) at com.evermind.server.ApplicationServer.initializeHttp(ApplicationServer.java:2435) at com.evermind.server.ApplicationServer.setConfig(ApplicationServer.java:1592) at com.evermind.server.ApplicationServerLauncher.run(ApplicationServerLauncher.java:92) at java.lang.Thread.run(Thread.java:534) 2012-02-02 17:21:22,709 [Orion Launcher] WARN jdbc.ConnectionCache _getConnection.353 - Got a fatal exeption when getting a connection; Error code = 17002; Cleaning up cache and retrying

Check the Listener Log

This looks like a problem of trying to connect to the repository database via the listener.

Listener Logs Not in ADR Structure

There are two (2) problems with the listener as shown by the output from the lsnrctl status command:

    1. The logs are not in the right place for adrci;
    2. The repos database instance is not registered with the listener.

[oracle@gridctrl ~]$ lsnrctl status LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 02-FEB-2012 10:34:01 Copyright (c) 1991, 2008, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.1.0.7.0 - Production Start Date 02-FEB-2012 06:22:40 Uptime 0 days 4 hr. 11 min. 21 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /opt/oracle/app/OracleHomes/db11g/network/admin/listener.ora Listener Log File /opt/oracle/app/OracleHomes/db11g/log/diag/tnslsnr/gridctrl/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=gridctrl.yaocm.id.au)(PORT=1521))) Services Summary... Service "PLSExtProc" has 1 instance(s). Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service... The command completed successfully

Change Location of Listener Log

The problem with the location of the logs was fixed by adding the following line to /opt/oracle/app/OracleHomes/db11g/install/unix/scripts/seedstup:

export ORACLE_BASE=/opt/oracle/app

After GRIDCTRL was restarted, the logs are now in the correct place for ADR as shown by the lsnrctl status command:

[oracle@gridctrl ~]$ lsnrctl status LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 02-FEB-2012 10:49:49 Copyright (c) 1991, 2008, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.1.0.7.0 - Production Start Date 02-FEB-2012 10:42:34 Uptime 0 days 0 hr. 7 min. 15 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /opt/oracle/app/OracleHomes/db11g/network/admin/listener.ora Listener Log File /opt/oracle/app/diag/tnslsnr/gridctrl/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=gridctrl.yaocm.id.au)(PORT=1521))) Services Summary... Service "PLSExtProc" has 1 instance(s). Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service... The command completed successfully

However, the listener log is not showing any connection errors.

Registering Repository Database with Listener

I hacked the REPOS database instance by using the LOCAL_LISTENER system parameter.

tnsnames.ora contains the following entry:

REPOS = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC )) ) (CONNECT_DATA = (SID = repos) ) )

Changed the system parameter as follows and registered the instance with the listener via the ALTER SYSTEM command:

SQL< alter system set local_listener='REPOS' scope=both; System altered. SQL< alter system register; System altered.

The database instance is now registered with the listener as shown by the lsnrctl services command:

LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 02-FEB-2012 10:51:12 Copyright (c) 1991, 2008, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC))) Services Summary... Service "PLSExtProc" has 1 instance(s). Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 LOCAL SERVER Service "repos.yaocm.id.au" has 1 instance(s). Instance "repos", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:ready LOCAL SERVER Service "repos_XPT.yaocm.id.au" has 1 instance(s). Instance "repos", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:ready LOCAL SERVER The command completed successfully

Attempt to Start OMS Again

Even with the database service registered, the startup of OMS still fails.

Root Cause Identified

Did a google search using the phrase "[Orion Launcher] WARN jdbc.ConnectionCache _getConnection.353 - Got a fatal exeption when getting a connection; Error code = 17002; Cleaning up cache and retrying" and found a hint about /etc/hosts in Grid Control windows installation fails (Business & Enterprise Application).

Checked /etc/hosts on GRIDCTRL and found the problem in the following line:

10.1.1.252 gridctrl.yaocm.id.au gridctrl

You really should use the Managing Hosts utility instead of editting the /etc/hosts directly.

Changed this line to:

192.168.1.252 gridctrl.yaocm.id.au gridctrl

And the startup of OMS was successful (Using emctl to Start, Stop, and Check the Status of the Oracle Management Service):

[oracle@gridctrl ~]$ emctl start oms Oracle Enterprise Manager 10g Release 5 Grid Control Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. opmnctl: opmn is already running Starting HTTP Server ... Starting Oracle Management Server ... Checking Oracle Management Server Status ... Oracle Management Server is Up.

Conclusion

Although I had changed the DNS correctly, I forgot to check the /etc/hosts file for entries.

The change to the LOCAL_LISTENER parameter was unnecessary because the listener is using the default port of 1521. I have left it there in order to maintain the IPC backdoor.