0% found this document useful (0 votes)
17 views

Cluster Commands

The document provides useful commands and information for troubleshooting and managing cluster resources on AIX systems using HACMP (High Availability Cluster Multi-Processing). It lists commands for checking cluster status, resource location, topology, volume groups, and more. It also gives locations of important log files and describes how to start and stop cluster services and resources groups. The document provides guidance on problem determination steps for issues like clusters being down on both nodes or resources failing to start.

Uploaded by

yurdurepso
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Cluster Commands

The document provides useful commands and information for troubleshooting and managing cluster resources on AIX systems using HACMP (High Availability Cluster Multi-Processing). It lists commands for checking cluster status, resource location, topology, volume groups, and more. It also gives locations of important log files and describes how to start and stop cluster services and resources groups. The document provides guidance on problem determination steps for issues like clusters being down on both nodes or resources failing to start.

Uploaded by

yurdurepso
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

Cluster Commands:-

[edit] Useful Commands


/usr/es/sbin/cluster/clstat – Shows cluster status (relies on SNMP so may not
work)

/usr/es/sbin/cluster/utilities/clRGinfo – Shows where the resource is running


from. (shows both sides of the cluster) add -m to show the status of monitoring

/usr/es/sbin/cluster/utilities/cltopinfo - List cluster topology information using an


alternative format that's easier to read and understand

/usr/es/sbin/cluster/utilities/clshowres – show useful information about the


cluster.

/usr/es/sbin/cluster/utilities/clfindres – shows where the resource is currently


running. (only shows one server, unlike clRGinfo)

lssrc –al topsvcs | grep –I instance - This will show the instance number of the
cluster( i.e. how many times its been syncronized in the past). Run on both
sides. IMPORTANT: MUST BE THE SAME. If not, you may have a buggered
cluster.

lsvg - Shows available volume groups. Hopefully the cluster volume group is
shown.

lsvg –o - Shows if the cluster volume group is online. It must only be online on
one side of the cluster at a time.

lsvg –l <vg> - Will show all mounted filesystems for the resource. Make sure
they are open/synced

smitty clstop - Stops the cluster. The defaults will do it gracefully. If the
resource is up on this side, it will fail over at this point unless you take the
resource offline manually.

smitty clstart - Starts the cluster. This will automatically start the resource
unless either

(1) The resource is already up on the other side of the cluster or

(2) Its not set up in the Config of the cluster to start automatically.

Manually starting or stopping the resource group will differ depending on the
version of HACMP you’re using.

On HACMP V4.5 do this (very old version but it should be similar):


Smitty hacmp

Cluster System Management


Cluster Resource Group Management
Bring a Resource Group Online
Bring a Resource Group Offline
Move a Resource Group
Depending on the cluster setup, the resource group WILL NOT fail over to the
standby server because you have manually offlined the resource. If you want it
to fail over, stop the cluster using smitty clstop, or manually online the resource
on the other node.

[edit] Log files


/tmp/hacmp.out - The default log for hacmp. Difficult to read whats going on

/var/hacmp/log/hacmp.out - Another location for hacmp.out

/var/hacmp/clcomd/clcomd.log - Contains important communications information


for the cluster

/var/hacmp/clverify/clverify.log - Log for cluster verification

/var/ha/log/topsvcs.default - check for disk heartbeating

All the application and other log files are under /var/hacmp/log (the same place
as the hacmp.out is located). If you run ls -alrt they you will see all the newest
files.

Check this directory too:

/var/hacmp/log/clappmond_failed_logs.<date_tame& gt;

Look for the error log that it was moaning about in the hacmp.out log file. I.E - if
its moaning about mon_longtime_db, look for the logfile
clappmond.mon_longtime_db.<resource_name>.lo g

[edit] Cluster start and stop scripts


These are usually provided by the application teams so check with them before
you make any changes. You can find them here (but this could be anywhere):

cd /usr/es/local/script

If you get issues and need to escalate to IBM, they will ask you to do a snap –e.
Rename and send the output off to the usual place.

[edit] Patching HACMP


HACMP cluster nodes can run on different versions of the OS

HACMP can run with different cersions of the cluster software


You might find that when you attempt to patch hacmp, you get errors. You may
have E-Fixes installed. These are specific patches released by IBM to resolve
specific issues on your cluster. They will be locked so will cause an upgrade
attempt to fail. If you need to upgrade a server, you will need to unlock the E-
Fix, do the upgrade and then re-lock the e-Fix. This could still cause the cluster
to break and may need to be re-raised with IBM to supply a new fix. Any doubt,
call IBM.

To check if you have an e-fix installed run:

emgr -l

[edit] Other checks


Check the application log files. Could be anywhere but try here:

/var/hacmp/log

/usr/es/sbin/cluster/netmon.cf - Pingable addresses

Cluster manager process:

/usr/es/sbin/cluster/clstrmgr

/usr/sbin/gsclvmd - This process MUST be running (lazy update)

CTRMC process must be running to allow DLPAR on the server:

#lssrc -s ctrmc
Subsystem Group PID Status
ctrmc rsct 348334 active

[edit] Problem determination


Always difficult to keep this section updated but here are some things to try:

Problem Cluster down on both nodes

checks Have both servers rebooted? Bring up the cluster using 'smitty clstart
Once you know its stable, bring up the resources

Problem Resource won't start

'Checks Startup the resources outside of cluster. Vary on the VG Add the IP alias
Start up the service

# smitty hacmp
~~~~~~~~~~~~~~~~~~~~~~~~~
Initialization and Standard Configuration
Extended Configuration
System Management (C-SPOC) <---
Problem Determination Tools
~~~~~~~~~~~~~~~~~~~~~~~~~
HACMP Communication Interface Management <---
HACMP Resource Group and Application Management
HACMP Log Viewing and Management
HACMP File Collection Management
HACMP Security and Users Management
HACMP Logical Volume Management
HACMP Concurrent Logical Volume Management
HACMP Physical Volume Management
Configure GPFS
Open a SMIT Session on a Node
~~~~~~~~~~~~~~~~~~~~~~~~~
Manage HACMP Services
Start Cluster Services <---
Stop Cluster Services
Show Cluster Services
~~~~~~~~~~~~~~~~~~~~~~~~~
* Start now, on system restart or both [now+]
Start Cluster Services on these nodes[AIX_Box_1]+
* Manage Resource Groups [Automatically+]
BROADCAST message at startup? [true+]
Startup Cluster Information Daemon? [true+]
Ignore verification errors?[ false+]
Automatically correct errors found during [Interactively+]
cluster start? <---
~~~~~~~~~~~~~~~~~~~~~~~~~

You might also like