HACMP

HACMP OPERATIONS

List HACMP version:

# lslpp -L | grep cluster.es.server.rte

cluster.es.server.rte 5.4.1.4 C F ES Base Server Runtime

#

ó bien

# lslpp -L | grep cluster*

y luego identificar server.rte

ó bien

# lslpp -L |grep cluster* |grep -i server.rte

cluster.es.server.rte 5.4.1.4 C F ES Base Server Runtime

#

volver al tope

Extender un filesystem del cluster:

Aumentar un Filesystem de un Cluster(HACMP)

volver al tope

Listar el estado de los servicios de heartbeat del cluster:

# clstat

clstat - HACMP Cluster Status Monitor

-------------------------------------

Cluster: cl_hacmp (1230610296)

Tue Aug 9 11:38:19 GMT-03:00 2011

State: UP Nodes: 2

SubState: STABLE

Node: nodo1 State: UP

Interface: nodo1-stby1 (0) Address: 192.168.1.123

State: UP

Interface: nodo1-stby2 (0) Address: 192.168.2.123

State: UP

Interface: nodo1-stby3 (0) Address: 192.168.3.123

State: UP

Interface: nodo1_hdisk1_01 (1) Address: 0.0.0.0

State: UP

Interface: nodo1_hdisk2 (2) Address: 0.0.0.0

State: UP

Interface: nodo1-svc (0) Address: 10.1.1.123

State: UP

Resource Group: rg_nodo1 State: On line

Node: nodo2 State: UP

Interface: nodo2-stby1 (0) Address: 192.168.1.124

State: UP

Interface: nodo2-stby2 (0) Address: 192.168.2.124

State: UP

Interface: nodo2-stby3 (0) Address: 192.168.3.124

State: UP

Interface: nodo2_hdisk1_01 (1) Address: 0.0.0.0

State: UP

Interface: nodo2_hdisk2 (2) Address: 0.0.0.0

State: UP

Interface: nodo2-svc2 (0) Address: 10.1.1.124

State: UP

Resource Group: rg_nodo2 State: On line

******* f/forward, b/back, r/refresh, q/quit *******************************

volver al tope

Listar el estado del subsistemas del cluster:

# clshowsrv -v

Status of the RSCT subsystems used by HACMP:

Subsystem Group PID Status

topsvcs topsvcs 213214 active

grpsvcs grpsvcs 516192 active

grpglsm grpsvcs inoperative

emsvcs emsvcs 667784 active

emaixos emsvcs inoperative

ctrmc rsct 278664 active

Status of the HACMP subsystems:

Subsystem Group PID Status

clcomdES clcomdES 307420 active

clstrmgrES cluster 356528 active

Status of the optional HACMP subsystems:

Subsystem Group PID Status

clinfoES cluster 684146 active

#

volver al tope

Error al consultar/listar FSs-VGs-LVs del cluster:

# /usr/es/sbin/cluster/sbin/cl_showfs2

connect: Connection refused

Unable to retrieve filesystems from node nodo1.

connect: A remote host refused an attempted connect operation.

Unable to retrieve filesystems from node nodo2.

No filesystems retrieved.

Solución:

# startsrc -s clcomdES

ó bien

# startsrc -s clcomd

Verificar estado del subsistema:

#lssrc -s clcomdES

Subsystem Group PID Status

clcomdES clcomdES 307420 active

Verificar que comando funcione correctamente:

# /usr/es/sbin/cluster/sbin/cl_showfs2

# Resource Group File System

rg_apl1 /software/sistema

rg_apl1 /software/sistema/historico

rg_apl2 /software2/sistema

#

volver al tope

topsvcs -- NIM thread blocked

---------------------------------------------------------------------------

LABEL: TS_NIM_ERROR_STUCK_

IDENTIFIER: 3D32B80D

Date/Time: Sat May 26 04:10:43 GMT-03:00 2012

Sequence Number: 12314

Machine Id: 00CB7E024C00

Node Id: myServer

Class: S

Type: PERM

WPAR: Global

Resource Name: topsvcs

Description

NIM thread blocked

Probable Causes

A thread in a Topology Services Network Interface Module (NIM) process

was blocked

Topology Services NIM process cannot get timely access to CPU

The system clock was set forward

User Causes

Excessive memory consumption is causing high memory contention

Excessive disk I/O is causing high memory contention

The system clock was manually set forward

Recommended Actions

Examine I/O and memory activity on the system

Reduce load on the system

Tune virtual memory parameters

Call IBM Service if problem persists

Failure Causes

Excessive virtual memory activity prevents NIM from making progress

Excessive disk I/O traffic is interfering with paging I/O

Recommended Actions

Examine I/O and memory activity on the system

Reduce load on the system

Tune virtual memory parameters

Call IBM Service if problem persists

Detail Data

DETECTING MODULE

rsct,nim_control.C,1.39.1.41,7916

ERROR ID

6BUfAx.n56kD/LtU1ZFE.8....................

REFERENCE CODE

Thread which was blocked

send thread

Interval in seconds during which process was blocked

11

Interface name

rhdisk2

Solución:

Nota previa: NIM en este caso tiene el significado (es la abreviación) de Network Interface Module. Creo que la aclaración tiene lugar, puesto que se puede confundir con la abreviación de NIM: Network Installation Manager.

Volviendo al mensaje que se obtuvo del "system log" de AIX errpt, podemos ver que el mensaje se registro a las 4:10 am. Según la experiencia este mensaje se presenta en maquinas con cluster hacmp y con una carga muy elevada sobre los recursos de CPU/Memorias/I/O, entonces presenta problemas para resolver el fall over entre los nodos del cluster.

Por lo tanto la solución seria identificar que es lo que esta causando ese consumo no deseados. En lo personal recomiendo el uso de la herramienta nmon para el análisis de performance, pero se puede perfectamente usar los comandos directamente desde la consola. Tener en cuenta que si no se tiene un script con estos comandos, el monitoreo sera online, mientras que con nmon, con una siempre configuracion se obtiene batch mucho mas completa. Ver: nmon config on aix/linux host

Usando comandos: