List HACMP version:
# lslpp -L | grep cluster.es.server.rte
cluster.es.server.rte 5.4.1.4 C F ES Base Server Runtime
#
ó bien
# lslpp -L | grep cluster*
y luego identificar server.rte
ó bien
# lslpp -L |grep cluster* |grep -i server.rte
cluster.es.server.rte 5.4.1.4 C F ES Base Server Runtime
#
Extender un filesystem del cluster:
Aumentar un Filesystem de un Cluster(HACMP)
Listar el estado de los servicios de heartbeat del cluster:
# clstat
clstat - HACMP Cluster Status Monitor
-------------------------------------
Cluster: cl_hacmp (1230610296)
Tue Aug 9 11:38:19 GMT-03:00 2011
State: UP Nodes: 2
SubState: STABLE
Node: nodo1 State: UP
Interface: nodo1-stby1 (0) Address: 192.168.1.123
State: UP
Interface: nodo1-stby2 (0) Address: 192.168.2.123
State: UP
Interface: nodo1-stby3 (0) Address: 192.168.3.123
State: UP
Interface: nodo1_hdisk1_01 (1) Address: 0.0.0.0
State: UP
Interface: nodo1_hdisk2 (2) Address: 0.0.0.0
State: UP
Interface: nodo1-svc (0) Address: 10.1.1.123
State: UP
Resource Group: rg_nodo1 State: On line
Node: nodo2 State: UP
Interface: nodo2-stby1 (0) Address: 192.168.1.124
State: UP
Interface: nodo2-stby2 (0) Address: 192.168.2.124
State: UP
Interface: nodo2-stby3 (0) Address: 192.168.3.124
State: UP
Interface: nodo2_hdisk1_01 (1) Address: 0.0.0.0
State: UP
Interface: nodo2_hdisk2 (2) Address: 0.0.0.0
State: UP
Interface: nodo2-svc2 (0) Address: 10.1.1.124
State: UP
Resource Group: rg_nodo2 State: On line
******* f/forward, b/back, r/refresh, q/quit *******************************
Listar el estado del subsistemas del cluster:
# clshowsrv -v
Status of the RSCT subsystems used by HACMP:
Subsystem Group PID Status
topsvcs topsvcs 213214 active
grpsvcs grpsvcs 516192 active
grpglsm grpsvcs inoperative
emsvcs emsvcs 667784 active
emaixos emsvcs inoperative
ctrmc rsct 278664 active
Status of the HACMP subsystems:
Subsystem Group PID Status
clcomdES clcomdES 307420 active
clstrmgrES cluster 356528 active
Status of the optional HACMP subsystems:
Subsystem Group PID Status
clinfoES cluster 684146 active
#
Error al consultar/listar FSs-VGs-LVs del cluster:
# /usr/es/sbin/cluster/sbin/cl_showfs2
connect: Connection refused
Unable to retrieve filesystems from node nodo1.
connect: A remote host refused an attempted connect operation.
Unable to retrieve filesystems from node nodo2.
No filesystems retrieved.
Solución:
# startsrc -s clcomdES
ó bien
# startsrc -s clcomd
Verificar estado del subsistema:
#lssrc -s clcomdES
Subsystem Group PID Status
clcomdES clcomdES 307420 active
Verificar que comando funcione correctamente:
# /usr/es/sbin/cluster/sbin/cl_showfs2
# Resource Group File System
rg_apl1 /software/sistema
rg_apl1 /software/sistema/historico
rg_apl2 /software2/sistema
#
topsvcs -- NIM thread blocked
---------------------------------------------------------------------------
LABEL: TS_NIM_ERROR_STUCK_
IDENTIFIER: 3D32B80D
Date/Time: Sat May 26 04:10:43 GMT-03:00 2012
Sequence Number: 12314
Machine Id: 00CB7E024C00
Node Id: myServer
Class: S
Type: PERM
WPAR: Global
Resource Name: topsvcs
Description
NIM thread blocked
Probable Causes
A thread in a Topology Services Network Interface Module (NIM) process
was blocked
Topology Services NIM process cannot get timely access to CPU
The system clock was set forward
User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention
The system clock was manually set forward
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.41,7916
ERROR ID
6BUfAx.n56kD/LtU1ZFE.8....................
REFERENCE CODE
Thread which was blocked
send thread
Interval in seconds during which process was blocked
11
Interface name
rhdisk2
Solución:
Nota previa: NIM en este caso tiene el significado (es la abreviación) de Network Interface Module. Creo que la aclaración tiene lugar, puesto que se puede confundir con la abreviación de NIM: Network Installation Manager.
Volviendo al mensaje que se obtuvo del "system log" de AIX errpt, podemos ver que el mensaje se registro a las 4:10 am. Según la experiencia este mensaje se presenta en maquinas con cluster hacmp y con una carga muy elevada sobre los recursos de CPU/Memorias/I/O, entonces presenta problemas para resolver el fall over entre los nodos del cluster.
Por lo tanto la solución seria identificar que es lo que esta causando ese consumo no deseados. En lo personal recomiendo el uso de la herramienta nmon para el análisis de performance, pero se puede perfectamente usar los comandos directamente desde la consola. Tener en cuenta que si no se tiene un script con estos comandos, el monitoreo sera online, mientras que con nmon, con una siempre configuracion se obtiene batch mucho mas completa. Ver: nmon config on aix/linux host
Usando comandos: