Veritas Clustering On Linux
Veritas Clustering On Linux
User’s Guide
Linux
N16857H
July 2005
Disclaimer
The information contained in this publication is subject to change without notice. VERITAS Software
Corporation makes no warranty of any kind with regard to this manual, including, but not limited to,
the implied warranties of merchantability and fitness for a particular purpose. VERITAS Software
Corporation shall not be liable for errors contained herein or for incidental or consequential damages
in connection with the furnishing, performance, or use of this manual.
Third-Party Copyrights
Apache Software
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that
entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity,
whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such
entity.
"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source,
and configuration files.
"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled
object code, generated documentation, and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice
that is included in or attached to the work.
"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial
revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License,
Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and
Derivative Works thereof.
"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or
Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal
Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal,
or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source
code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving
the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a
Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide,
non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform,
sublicense, and distribute the Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide,
non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell,
import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily
infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If
You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution
incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for
that Work shall terminate as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You meet the following conditions:
(a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from
the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable
copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works,
in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such
third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may
add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work,
provided that such additional attribution notices cannot be construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use,
reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and
distribution of the Work otherwise complies with the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to
the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above,
nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such
Contributions.
6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except
as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides
its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including,
without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR
PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated
with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by
applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including
any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to
use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other
commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a
fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting
such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You
agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by
reason of your accepting any such warranty or additional liability.
Data Encryption Standard (DES)
Support for data encryption in VCS is based on the MIT Data Encryption Standard (DES) under the following copyright:
Copyright © 1990 Dennis Ferguson. All rights reserved.
Commercial use is permitted only if products that are derived from or include this software are made available for purchase and/or use in
Canada. Otherwise, redistribution and use in source and binary forms are permitted.
Copyright 1985, 1986, 1987, 1988, 1990 by the Massachusetts Institute of Technology. All rights reserved.
Export of this software from the United States of America may require a specific license from the United States Government. It is the responsibility
of any person or organization contemplating export to obtain such a license before exporting.
WITHIN THAT CONSTRAINT, permission to use, copy, modify, and distribute this software and its documentation for any purpose and without
fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice
appear in supporting documentation, and that the name of M.I.T. not be used in advertising or publicity pertaining to distribution of the software
without specific, written prior permission. M.I.T. makes no representations about the suitability of this software for any purpose. It is provided as
is without express or implied warranty.
SNMP Software
SNMP support in VCS is based on CMU SNMP v2 under the following copyright:
Copyright 1989, 1991, 1992 by Carnegie Mellon University
All Rights Reserved
Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided
that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting
documentation, and that the name of CMU not be used in advertising or publicity pertaining to distribution of the software without specific,
written prior permission.
CMU DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL CMU BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
OF THIS SOFTWARE.
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
How This Guide is Organized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Documentation Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
v
Chapter 4. Configuration Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
The VCS Configuration Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
The main.cf File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
The types.cf File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Keywords/Reserved Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Managing the VCS Configuration File: The hacf Utility . . . . . . . . . . . . . . . . . . . . . . . . . 50
Contents vii
Web Console Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Navigating the Web Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Reviewing Web Console Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Administering Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Administering Cluster Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Administering Service Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Administering Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Administering Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Editing Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Querying the Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Customizing the Web Console with myVCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Customizing the Log Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Monitoring Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Integrating the Web Console with VERITAS Traffic Director . . . . . . . . . . . . . . . . . . . 285
Contents ix
Monitoring Aggregate Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Configuring Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Contents xi
Bringing a Resource Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
Taking a Resource Offline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
Bringing a Service Group Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
Taking a Service Group Offline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
Detecting Resource Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
Detecting System Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
Detecting Network Link Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
When a System Panics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
Time Taken for a Service Group Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
Time Taken for a Service Group Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
Scheduling Class and Priority Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
Monitoring CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
VCS Agent Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
Contents xiii
xiv VERITAS Cluster Server User’s Guide
Preface
This guide provides information on how to use and configure VERITAS Cluster Server
(VCS) version 4.1 on the Linux operating system
If this document is dated more than six months prior to the date you are installing the
enterprise agent, contact VERITAS Technical Support to confirm you have the latest
supported versions of the application and operating
xv
How This Guide is Organized
Chapter 9. “Using VCS Configuration Wizards” on page 287 describes the Apache,
Application, and NFS wizards and provide instructions on how to use the wizards to
create and modify the service groups.
,Chapter 10. “VCS Communications, Membership, and I/O Fencing” on page 311
describes how the VCS engine, HAD, communicates with the various components of VCS.
This chapter also explains how VCS behaves during failures in fenced and non-fenced
environments.
Chapter 11. “Controlling VCS Behavior” on page 341 describes the default behavior of
resource and service groups when they fail. This chapter also explains the latest load
balancing mechanism and how VCS employs this functionality at the service group level.
Chapter 12. “The Role of Service Group Dependencies” on page 385 defines the role of
service group dependencies and describes how to link service groups.
Chapter 13. “Notification” on page 415 explains how VCS uses SNMP and SMTP to
notify administrators of important events, such as resource or system faults. This chapter
also describes the notifier component, consisting of the VCS notifier process and the
hanotify utility.
Chapter 14. “VCS Event Triggers” on page 431 describes how event triggers work and
how they enable the administrator to take specific actions in response to particular events.
This chapter also includes a description of each event trigger, including usage and
location.
Chapter 15. “Connecting Clusters–Introducing the Global Cluster Option” on page 445
explains global clustering and presents key terms.
Chapter 16. “Administering Global Clusters from the Command Line” on page 471
provides instructions on how to perform administrative tasks on global clusters from the
command line.
Chapter 17. “Administering Global Clusters from Cluster Manager (Java Console)” on
page 483 provides instructions on how to perform administrative tasks on global clusters
from Cluster Manager (Java Console).
Chapter 18. “Administering Global Clusters from Cluster Manager (Web Console)” on
page 503 provides instructions on how to perform administrative tasks on global clusters
from Cluster Manager (Web Console).
Chapter 19. “Setting Up Replicated Data Clusters” on page 523 describes how to set up a
replicated data cluster configuration.
Chapter 20. “Setting Up Campus Clusters” on page 533 describes how to set up a campus
cluster configuration.
Chapter 21. “Predicting VCS Behavior Using VCS Simulator” on page 539 introduces
VCS Simulator and describes how to simulate cluster configurations.
Chapter 22. “VCS Performance Considerations” on page 553 describes the impact of VCS
on system performance.
Chapter 23. “Troubleshooting and Recovery for VCS” on page 569 explains VCS unified
logging and defines the message format. This chapter also describes how to troubleshoot
common problems in VCS.
Appendix A. “VCS User Privileges—Administration Matrices” on page 595 describes
user privileges for VCS operations.
Appendix B. “Cluster and System States” on page 607 describes the various cluster and
system states and the order in which they transition from one state to another.
Appendix C. “VCS Attributes” on page 613 lists the VCS attributes for each cluster
object, including service groups, resources, resource types, systems, and clusters.
Appendix D. “Administering VERITAS Web Server” on page 653 describes the VERITAS
Web Server component VRTSweb and explains how to configure it. Cluster Manager (Web
Console) uses VRTSweb.
Appendix E. “Accessibility and VCS” on page 675 describes VCS accessibility features
and compliance.
Preface xvii
Conventions
Conventions
monospace Used for path names, com- Read tunables from the
mands, output, directory and /etc/vx/tunefstab file.
file names, functions, and pa- See the ls(1) manual page for more in-
rameters. formation.
italic Identifies book titles, new See the User’s Guide for details.
terms, emphasized text, and The variable system_name indicates the
variables replaced with a system on which to enter the command.
name or value.
bold Depicts GUI objects, such as Enter your password in the Password
fields, list boxes, menu selec- field.
tions, etc. Also depicts GUI Press Return.
commands.
blue text Indicates hypertext links. See “Getting Help” on page xix.
Getting Help
For technical assistance, visit http://support.veritas.com and select phone or email
support. This site also provides access to resources such as TechNotes, product alerts,
software downloads, hardware compatibility lists, and our customer email notification
service. Use the Knowledge Base Search feature to access additional product information,
including current and past releases of VERITAS documentation.
Additional Resources
For license information, software updates and sales contacts, visit
https://my.veritas.com/productcenter/ContactVeritas.jsp. For information on
purchasing product documentation, visit http://webstore.veritas.com.
Documentation Feedback
Your feedback on product documentation is important to us. Send suggestions for
improvements and reports on errors or omissions to clusteringdocs@veritas.com.
Include the title and part number of the document (located in the lower left corner of the
title page), and chapter and section titles of the text on which you are reporting. Our goal
is to ensure customer satisfaction by providing effective, quality documentation. For
assistance with topics other than documentation, visit http://support.veritas.com.
Preface xix
Documentation Feedback
What is a Cluster?
VERITAS Cluster Server (VCS) connects, or clusters, multiple, independent systems into a
management framework for increased availability. Each system, or node, runs its own
operating system and cooperates at the software level to form a cluster. VCS links
commodity hardware with intelligent software to provide application failover and
control. When a node or a monitored application fails, other nodes can take predefined
actions to take over and bring up services elsewhere in the cluster.
Detecting Failure
VCS can detect application failure and node failure among cluster members.
3
What is a Cluster?
IP Address IP Address
Application Application
Storage Storage
Switchover
A switchover is an orderly shutdown of an application and its supporting resources on
one server and a controlled startup on another server. Typically this means unassigning
the virtual IP, stopping the application, and deporting shared storage. On the other
server, the process is reversed. Storage is imported, file systems are mounted, the
application is started, and the virtual IP address is brought up
Failover
A failover is similar to a switchover, except the ordered shutdown of applications on the
original node may not be possible, so the services are started on another node. The process
of starting the application on the node is identical in a failover or switchover.
9
Understanding Cluster Components
Resources
Resources are hardware or software entities, such as disk groups and file systems,
network interface cards (NIC), IP addresses, and applications. Controlling a resource
means bringing it online (starting), taking it offline (stopping), and monitoring the
resource.
Resource Dependencies
Resource dependencies determine the order in which resources are brought online or
taken offline when their associated service group is brought online or taken offline. For
example, a disk group must be imported before volumes in the disk group start, and
volumes must start before file systems are mounted. Conversely, file systems must be
unmounted before volumes stop, and volumes must stop before disk groups are
deported.
In VCS terminology, resources are categorized as parents or children. Child resources must
be online before parent resources can be brought online, and parent resources must be
taken offline before child resources can be taken offline.
Database IP Address
File Network
Disk Group
In the preceding figure, the disk group and the network card can be brought online
concurrently because they have no interdependencies. When each child resource required
by the parent is brought online, the parent is brought online, and so on up the tree, until
finally the application program is started. Conversely, when deactivating a service, the
VCS engine, HAD, begins at the top. In this example, the application is stopped first,
followed by the file system and the IP address, and so on down the tree until the
application program is stopped.
Resource Categories
Different types of resources require different levels of control. In VCS there are three
categories of resources:
◆ On-Off. VCS starts and stops On-Off resources as required. For example, VCS
imports a disk group when required, and deports it when it is no longer needed.
◆ On-Only. VCS starts On-Only resources, but does not stop them. For example, VCS
requires NFS daemons to be running to export a file system. VCS starts the daemons if
required, but does not stop them if the associated service group is taken offline.
◆ Persistent. These resources cannot be brought online or taken offline. For example, a
network interface card cannot be started or stopped, but it is required to configure an
IP address. A Persistent resource has an operation value of None. VCS monitors
Persistent resources to ensure their status and operation. Failure of a Persistent
resource triggers a service group failover.
Service Groups
A service group is a logical grouping of resources and resource dependencies. It is a
management unit that controls resource sets.
For example, a database service group may be composed of resources that manage logical
network (IP) addresses, the database management software (DBMS), the underlying file
systems, the logical volumes, and a set of physical disks managed by the volume manager
(typically VERITAS Volume Manager in a VCS cluster).
Application
A single node may host any number of service groups, each providing a discrete service to
networked clients. Each service group is monitored and managed independently.
Independent management enables a group to be failed over automatically or manually
trigger is invoked on the lowest numbered node. Hybrid service groups adhere to the
same rules governing group dependencies as do parallel groups. See the “Categories of
Service Group Dependencies” on page 387 for more information.
Agents
Agents are VCS processes that manage resources of predefined resource types according
to commands received from the VCS engine, HAD. A system has one agent per resource
type that monitors all resources of that type; for example, a single IP agent manages all IP
resources.
When the agent is started, it obtains the necessary configuration information from VCS. It
then periodically monitors the resources, and updates VCS with the resource status.
The agent provides the type-specific logic to control resources. The action required to
bring a resource online or take it offline differs significantly for each resource type. VCS
employs agents to handle this functional disparity between resource types. For example,
bringing a disk group online requires importing the disk group, but bringing a database
online requires starting the database manager process and issuing the appropriate startup
commands.
VCS agents are multithreaded, meaning a single VCS agent monitors multiple resources
of the same resource type on one host. For example, the IP agent monitors all IP resources.
VCS monitors resources when they are online and offline to ensure they are not started on
systems on which they are not supposed to run. For this reason, VCS starts the agent for
any resource configured to run on a system when the cluster is started. If no resources of a
particular type are configured, the agent is not started. For example, if there are no Oracle
resources in your configuration, the Oracle agent is not started on the system.
Agent Operations
Agents carry out specific operations on resources on behalf of the cluster engine. The
functions an agent performs are entry points, code sections that carry out specific
functions, such as online, offline, and monitor. Entry points can be compiled into the
agent itself or can be implemented as individual Perl scripts. For details on any of the
following entry points, see the VERITAS Cluster Server Agent Developer’s Guide.
Agent Classifications
Bundled Agents
Bundled agents are packaged with VCS. They include agents for Disk, Mount, IP, and
various other resource types. See the VERITAS Bundled Agents Reference Guide for a
complete list.
Enterprise Agents
Enterprise agents control third-party applications and are licensed separately. These
include agents for Oracle, NetBackup, and Sybase. Each enterprise agent includes
instructions on installing and configuring the agent. Contact your VERITAS sales
representative for more information.
Custom Agents
Custom agents can be developed by you or by VERITAS consultants. Typically, agents are
developed because the user requires control of an application that is not covered by
current bundled or enterprise agents. See the VERITAS Cluster Server Agent Developer’s
Guide for information on developing a custom agent, or contact VERITAS Enterprise
Consulting Services.
nfs_ip
nfs_nic home_share
home_mount NFS_nfs_group_16
shared_dg1
VCS starts the agents for disk group, mount, share, NFS, NIC, and IP on all systems
configured to run NFS_Group. The resource dependencies are configured as:
◆ The /home file system, home_mount, requires the disk group, shared_dg1, to be
online before mounting.
◆ The NFS export of the home file system requires the file system to be mounted and the
NFS daemons be running.
◆ The high-availability IP address, nfs_IP, requires the file system to be shared and the
network interface to be up, represented as nfs_nic.
◆ The NFS daemons and the disk group have no child dependencies, so they can start in
parallel.
◆ The NIC resource is a persistent resource and does not require starting.
The service group NFS_Group can be configured to start automatically on either node in
the preceding example. It can then move or fail over to the second node on command or
automatically if the first node fails. Upon failover or relocation, VCS takes the resources
offline beginning at the top of the graph and starts them on the second node beginning at
the bottom.
23
Basic Failover Configurations
Application
✗ Application
Asymmetric Failover
This configuration is the simplest and most reliable. The redundant server is on stand-by
with full performance capability. If other applications are running, they present no
compatibility issues.
Application1 Application2
✗ Application1
Application2
Symmetric Failover
Symmetric configurations appear more efficient in terms of hardware utilization. In the
asymmetric example, the redundant server requires only as much processor power as its
peer. On failover, performance remains the same. In the symmetric example, the
redundant server requires not only enough processor power to run the existing
application, but also enough to run the new application it takes over.
Further issues can arise in symmetric configurations when multiple applications running
on the same system do not co-exist properly. Some applications work well with multiple
copies started on the same system, but others fail. Issues can also arise when two
applications with different I/O and memory requirements run on the same system.
N-to-1 Configuration
An N-to-1 failover configuration reduces the cost of hardware redundancy and still
provides a potential, dedicated spare. In an asymmetric configuration there is no
performance penalty and there are no issues with multiple applications running on the
same system; however, the drawback is the 100 percent redundancy cost at the server
level.
Redundant Server
N-to-1 Configuration
The problem with this design is the issue of failback. When the original, failed server is
repaired, all services normally hosted on the server must be failed back to free the spare
server and restore redundancy to the cluster.
✗
Redundant Server
N + 1 Configuration
With the capabilities introduced by storage area networks (SANs), you can not only create
larger clusters, but more importantly, can connect multiple servers to the same storage.
Spare
N+1 Configuration
A dedicated, redundant server is no longer required in the configuration. Instead of
N-to-1 configurations, there is N+1. In advanced N+1 configurations, an extra server in
the cluster is spare capacity only.
When a server fails, the application service group restarts on the spare. After the server is
repaired, it becomes the spare. This configuration eliminates the need for a second
application failure to fail back the service group to the primary system. Any server can
provide redundancy to any other server.
Service Group
✗
Service Group
Service Group
Service Group
N+1 Failover
N-to-N Configuration
An N-to-N configuration refers to multiple service groups running on multiple servers,
with each service group capable of being failed over to different servers in the cluster. For
example, consider a four-node cluster with each node supporting three critical database
instances.
SG SG SG SG
SG SG SG SG
SG SG SG SG
SG = Service Group
N-to-N Configuration
If any node fails, each instance is started on a different node, ensuring no single node
becomes overloaded. This configuration is a logical evolution of N + 1: it provides cluster
standby capacity instead of a standby server.
✗
SG SG SG
SG SG SG
SG SG
SG SG SG
SG SG
SG SG SG
SG = Service Group
N-to-N Failover
N-to-N configurations require careful testing to ensure all applications are compatible.
Applications must also have complete control of where service groups fail when an event
occurs.
Site A Site B
SG SG SG SG SG
SG SG
SG SG SG
SG SG
SG SG SG SG
SG = Service Group
Service Group
Replication
Global Cluster
A global cluster links clusters at separate locations and enables wide-area failover and
disaster recovery.
Local clustering provides local failover for each site or building. Campus and replicated
cluster configurations offer protection against disasters affecting limited geographic
regions. Large scale disasters such as major floods, hurricanes, and earthquakes can cause
outages for an entire city or region. In such situations, data availability can be ensured by
migrating applications to sites located considerable distances apart.
Public Clients
Cluster A Network Redirected Cluster B
Application
Failover
Oracle Oracle
Group Group
Replicated
Data
Separate Separate
Storage Storage
Global Cluster
In a global cluster, if an application or a system fails, the application is migrated to
another system within the same cluster. If the entire cluster fails, the application is
migrated to a system in another cluster. Clustering on a global level also requires
replicating shared data to the remote site. See “How VCS Global Clusters Work” on
page 446 for more information.
37
The VCS Configuration Language
Include Clauses
Include clauses incorporate additional configuration files into main.cf. These additional
files typically contain type definitions, including the types.cf file. Other type definitions
must be included as required. Typically, custom agents add type definitions in their own
files. Most customers and VERITAS consultants do not modify the types.cf file, but
instead create additional type files.
Cluster Definition
This section of main.cf defines the attributes of the cluster, including the cluster name and
the names of the cluster users.
System Definition
Each system designated as part of the cluster is listed in this section of main.cf. The names
listed as system names must match the name returned by the command uname -a.
System names are preceded with the keyword “system.” For any system to be used in a
service group definition, it must be defined in this section. Each service group can be
configured to run on a subset of systems defined in this section.
SystemList Attribute
The SystemList attribute designates all systems on which a service group can come online.
By default, the order of systems in the list defines the priority of systems used in a
failover. For example, the definition SystemList = { SystemA, SystemB, SystemC }
configures SystemA to be the first choice on failover, followed by SystemB and then
SystemC.
System priority may also be assigned explicitly in the SystemList attribute by assigning
numeric values to each system name. For example: SystemList = { SystemA=0,
SystemB=1, SystemC=2 }
If you assign numeric priority values, VCS assigns a priority to the system without a
number by adding 1 to the priority of the preceding system. For example, if the
SystemList is defined as SystemList = { SystemA, SystemB=2, SystemC }, VCS
assigns the values SystemA = 0, SystemB = 2, SystemC = 3.
Note that a duplicate numeric priority value may be assigned when the following occurs:
SystemList = { SystemA, SystemB=0, SystemC }
The numeric values assigned are SystemA = 0, SystemB = 0, SystemC = 1.
To avoid the same priority number being assigned to more than one system, do not assign
any numbers or assign different numbers to each system in SystemList.
AutoStartList Attribute
List of systems on which the service group will be started with VCS (usually at system
boot). For example, if a system is a member of a failover service group’s AutoStartList
attribute, and if the service group is not already running on another system in the cluster,
the group is brought online when the system is started.
Resource Definition
This section in main.cf defines each resource used in a particular service group. Resources
can be added in any order and the utility hacf arranges the resources alphabetically the
first time the configuration file is run.
system Server1
system Server2
group NFS_group1 (
SystemList = { Server1, Server2 }
AutoStartList = { Server1 }
)
DiskGroup DG_shared1 (
DiskGroup = shared1
)
IP IP_nfs1 (
Device = eth0
Address = "192.168.1.3"
)
Mount Mount_home (
MountPoint = "/export/home"
BlockDevice = "/dev/vx/dsk/shared1/home_vol"
FsckOpt = -y
MountOpt = rw
)
NFS NFS_group1_16 (
Nproc = 16
)
NIC NIC_group1_eth0 (
Device = eth0
)
Share Share_home (
PathName = "/export/home"
Client = "*"
)
type DiskGroup (
static int NumThreads = 1
static int OnlineRetryLimit = 1
static str ArgList[] = { DiskGroup, StartVolumes, StopVolumes,
MonitorOnly, MonitorReservation, tempUseFence}
str DiskGroup
boolean StartVolumes = 0
boolean StopVolumes = 0
boolean MonitorReservation = 0
temp str tempUseFence = INVALID
)
The types definition performs two important functions. First, it defines the type of values
that may be set for each attribute. In the DiskGroup example, the NumThreads and
OnlineRetryLimit attributes are both classified as int, or integer. The DiskGroup,
StartVolumes and StopVolumes attributes are defined as str, or strings. See “Attribute
Data Types” on page 45 for more information.
The second critical piece of information provided by the type definition is the ArgList
attribute. The line static str ArgList[] = { xxx, yyy, zzz } defines the order
in which parameters are passed to the agents for starting, stopping, and monitoring
resources. For example, when VCS wants to bring the disk group shared_dg1 online, it
passes the following arguments to the online command for the DiskGroup agent:
shared_dg1 1 1 <null>
The sequence of arguments indicates the online command, the name of the resource, then
the contents of the ArgList. Since MonitorOnly is not set, it is passed as a null. This is
always the order: command, resource name, ArgList.
For another example, review the following main.cf and types.cf representing an IP
resource:
main.cf
IP nfs_ip1 (
Device = eth0
Address = "192.168.1.201"
)
types.cf
type IP (
static str ArgList[] = { Device, Address, NetMask, Options}
NameRule = IP_ + resource.Address
str Device
str Address
str NetMask
str Options
)
Attributes
VCS components are configured using attributes. Attributes contain data about the cluster,
systems, service groups, resources, resource types, agent, and heartbeats if using global
clusters. For example, the value of a service group’s SystemList attribute specifies on
which systems the group is configured and the priority of each system within the group.
Each attribute has a definition and a value. Attributes also have default values assigned
when a value is not specified.
Integer Signed integer constants are a sequence of digits from 0 to 9. They may be
preceded by a dash, and are interpreted in base 10. Integers cannot exceed the
value of a 32-bit signed integer: 21471183247.
Boolean A boolean is an integer, the possible values of which are 0 (false) and 1 (true).
Attribute Dimensions
Dimension Description
Scalar A scalar has only one value. This is the default dimension.
Vector A vector is an ordered list of values. Each value is indexed using a positive
integer beginning with zero. A set of brackets ([]) denotes that the dimension is
a vector. Brackets are specified after the attribute name on the attribute
definition. For example, an agent’s ArgList is defined as:
static str ArgList[] = { RVG, DiskGroup, Primary, SRL,
RLinks }
Keylist A keylist is an unordered list of strings, and each string is unique within the list.
For example, to designate the list of systems on which a service group will be
started with VCS (usually at system boot):
AutoStartList = { SystemA, SystemB, SystemC }
Type-Dependent Attributes
Type-dependent attributes apply to a particular resource type. For example the
MountPath attribute applies only to the Mount resource type. Similarly, the Address
attribute applies only to the IP resource type.
Type-Independent Attributes
Type-independent attributes apply to all resource types. This means there is a set of
attributes that all agents understand, regardless of resource type. These attributes are
coded into the agent framework when the agent is developed. Attributes such as
RestartLimit and MonitorInterval can be set for any resource type.
Resource-Specific Attributes
Resource-specific attributes apply to a specific resource. They are discrete values that
define the “personality” of a given resource. For example, the IP agent knows how to use
the Address attribute. Setting an IP address is done only within a specific resource
definition. Resource-specific attributes are set in the main.cf file.
Type-Specific Attributes
Type-specific attributes are set for all resources of a specific type. For example, setting
MonitorInterval for the IP resource affects all IP resources. The value for MonitorInterval
would be placed in the types.cf file. In some cases, attributes can be placed in main.cf or
types.cf. For example, setting StartVolumes = 1 for the DiskGroup types.cf would
default StartVolumes to True for all DiskGroup resources. Setting the value in main.cf
would set StartVolumes on a per-resource basis.
In the example below, StartVolumes and StopVolumes are set in types.cf. This sets the
default for all DiskGroup resources to start all volumes contained in a disk group when
the disk group is brought online. If no value for StartVolumes or StopVolumes is set in
main.cf, they will default to True.
type DiskGroup (
static int NumThreads = 1
static int OnlineRetryLimit = 1
static str ArgList[] = { DiskGroup, StartVolumes, StopVolumes,
MonitorOnly, MonitorReservation, tempUseFence}
str DiskGroup
boolean StartVolumes = 0
boolean StopVolumes = 0
boolean MonitorReservation = 0
temp str tempUseFence = INVALID
)
Adding the required lines in main.cf allows this value to be modified. In the next excerpt,
the main.cf is used to override the default type-specific attribute with a resource-specific
attribute:
DiskGroup shared_dg1 (
DiskGroup = shared_dg1
StartVolumes = 0
StopVolumes = 0
)
Static Attributes
Static attributes apply for every resource of a particular type. These attributes are prefixed
with the term static and are not included in the resource’s argument list. You can override
some static attributes and assign them resource-specific values. See “Overriding Resource
Type Static Attributes” on page 92 for more information.
Temporary Attributes
You can define temporary attributes in the types.cf file. The values of temporary attributes
remain in memory as long as the VCS engine (HAD) is running. These attribute values are
not stored in the main.cf file.
The command haattr -add -temp adds the temporary resource into memory. VCS does
not require the configuration to be in read/write mode to add or delete these attributes
using the command line. If temporary attributes are defined and the configuration is
dumped, all temporary attributes and their default values are saved to types.cf. When
HAD is restarted, the temporary attributes are defined and available. If HAD is stopped
completely within the cluster without an intervening dump, values of temporary
attributes are not available when HAD is restarted.
The scope of these attributes is local or global. If the scope is local on any node in the
cluster, the value remains in memory after the node fails. Also, local attributes can be
defined prior to HAD starting on the node. In the case when HAD is restarted and the
node rejoins the cluster, the value remains the same as when the node was running.
You must have the same permissions required to run modification commands from the
command line, or the Cluster Manager Java and Web Consoles, regardless of whether an
attribute is temporary or not. Some modifications require the configuration be opened; for
example, changing an attribute’s default value. See “Adding, Deleting, and Modifying
Resource Attributes” on page 93 for command-line instructions. You can define and
modify these attributes only while the VCS engine is running. Temporary attributes
cannot be converted to permanent, and vice-versa, but they can persist when dumped to
the types.cf file.
Note Duplicate names are not allowed for temporary attributes on a per-type basis. If a
temporary attribute cannot be created, verify that the name does not already exist
for that type in the types.cf file.
Keywords/Reserved Words
The following list includes the current keywords reserved for the VCS configuration
language. Note they are case-sensitive.
Verifying a Configuration
Use hacf to verify (check syntax of) the main.cf and the type definition file, types.cf. VCS
does not execute if hacf detects errors in the configuration. No error message and a return
value of zero indicates that the syntax is legal.
# hacf -verify config_directory
The variable config_directory refers to directories containing a main.cf file and any .cf files
included in main.cf.
Loading a Configuration
The hacf utility verifies the configuration before loading it into VCS. The configuration is
not loaded under the following conditions:
◆ If main.cf or include files are missing.
◆ If syntax errors appear in the .cf files.
◆ If the configuration file is marked “stale.” A .stale file is created in the configuration
directory when you indicate that you intend to change a running configuration. See
“Setting the Configuration to Read/Write” on page 67 for details.
Cluster Administrator
includes privileges for
Cluster Operator
includes privileges for
Cluster Guest
Group Administrator
includes privileges for Group Operator
53
VCS User Privileges
The following table lists the VCS user categories, with a summary of their associated
privileges.
Cluster Administrator- Users in this category are assigned full privileges, including making
configuration read-write, creating and deleting groups, setting group
dependencies, adding and deleting systems, and adding, modifying,
and deleting users. All group and resource operations are allowed.
Users with Cluster Administrator privileges can also change other
users’ privileges and passwords.
Note Cluster Administrators can change their own and other users’
passwords only after changing the configuration to read/write
mode.
Users in this category can create and delete resource types.
Cluster Operator- In this category, all cluster-, group-, and resource-level operations are
allowed, including modifying the user’s own password and bringing
service groups online.
Note Users in this category can change their own passwords only if
configuration is in read/write mode. Cluster Administrators can
change the configuration to the read/write mode.
Additionally, users in this category can be assigned Group
Administrator privileges for specific service groups.
Group Administrator- Users in this category can perform all service group operations on
specific groups, such as bringing groups and resources online, taking
them offline, and creating or deleting resources. Additionally, users can
establish resource dependencies and freeze or unfreeze service groups.
Note that users in this category cannot create or delete service groups.
Group Operator- Users in this category can bring service groups and resources online
and take them offline. Users can also temporarily freeze or unfreeze
service groups.
Cluster Guest- Users in this category have read-only access, meaning they can view the
configuration, but cannot change it. They can modify their own
passwords only if the configuration is in read/write mode. They cannot
add or update users. Additionally, users in this category can be
assigned Group Administrator or Group Operator privileges for
specific service groups.
Note By default, newly created users are assigned Cluster Guest
permissions.
User categories are set implicitly, as shown in the figure in “VCS User Privileges” on
page 53, but may also be set explicitly for specific service groups. For example, a user in
category Cluster Operator can be assigned the category Group Administrator for one or
more service groups. Likewise, a user in category Cluster Guest can be assigned Group
Administrator and Group Operator.
Review the following sample main.cf:
Cluster vcs
UserNames = { sally = Y2hJtFnqctD76, tom = pJad09NWtXHlk,
betty = kjheewoiueo, lou = T6jhjFYkie, don = gt3tgfdgttU,
intern = EG67egdsak }
Administrators = { tom }
Operators = { sally }
...
)
Group finance_server (
Administrators = { betty }
Operators = { lou, don }
...
)
Group hr_application (
Administrators = { sally }
Operators = { lou, betty }
...
)
Group test_server (
Administrators = { betty }
Operators = { intern, don }
...
)
Cluster Administrator ✔ – – – – –
Cluster Operator ✔ ✔ – – – –
finance_server Admin. ✔ – ✔ – – –
finance_server Operator ✔ ✔ ✔ ✔ ✔ –
hr_application Admin. ✔ ✔ – – – –
hr_application Operator ✔ ✔ ✔ ✔ – –
test_server Admin. ✔ – ✔ – – –
test_server Operator ✔ ✔ ✔ – ✔ ✔
VCS_ENABLE_LDF- Designates whether or not log data files (LDFs) are generated. If set to
1, LDFs are generated. If set to 0, they are not.
59
VCS Environment Variables
VCS_HAD_RESTART_TIMEOUT- Set this variable to designate the amount of time the hashadow
process waits (sleep time) before restarting HAD.
Default: 0
1. Install the new license on each node in the cluster using the vxlicinst utility.
Starting VCS
The command to start VCS is invoked from the file /etc/rc3.d/S99vcs or
/sbin/rc3.d/S99vcs. When VCS is started, it checks the state of its local configuration
file and registers with GAB for cluster membership. If the local configuration is valid, and
if no other system is running VCS, it builds its state from the local configuration file and
enters the RUNNING state.
▼ To start VCS
# hastart [-stale|-force]
The option -stale instructs the engine to treat the local configuration as stale even if it is
valid. The option -force instructs the engine to treat a stale, but otherwise valid, local
configuration as valid.
◆ If all systems running VCS are in the state of STALE_ADMIN_WAIT, and if the local
configuration file of the system joining the cluster is invalid, then the joining system
also transitions to STALE_ADMIN_WAIT.
See the appendix “Cluster and System States” for a complete list of VCS system states.
Stopping VCS
The hastop command stops HAD and related processes. This command includes the
following options:
hastop -all [-force]
hastop [-help]
hastop -local [-force | -evacuate | -noautodisable]
hastop -local [-force | -evacuate -noautodisable]
hastop -sys system ... [-force | -evacuate | -noautodisable]
hastop -sys system ... [-force | -evacuate -noautodisable]
The option -all stops HAD on all systems in the cluster and takes all service groups
offline.
The option -help displays command usage.
The option -local stops HAD on the system on which you typed the command.
The option -force allows HAD to be stopped without taking service groups offline on
the system.
The option -evacuate, when combined with -local or -sys, migrates the system’s
active service groups to another system in the cluster, before the system is stopped.
The option -noautodisable ensures that service groups that can run on the node where
the hastop command was issued are not autodisabled. This option can be used with
-evacuate but not with -force.
The option -sys stops HAD on the system you specified.
Logging On to VCS
When non-root users execute haxxx commands, they are prompted for their VCS user
name and password to authenticate themselves. Use the halogin command to save the
authentication information so that you do not have to enter your credentials every time
you run a VCS command.
The command stores authentication information in the user’s home directory.
If you run the command for different hosts, VCS stores authentication information for
each host.
▼ To log on to a cluster
1. Define the node on which the VCS commands will be run. Set the VCS_HOST
environment variable to the name of the node.
2. Log on to VCS:
# halogin vcsusername password
Note You must add users to the VCS configuration to monitor and administer VCS from
the graphical user interface Cluster Manager.
This command also designates the configuration stale by creating the default file
$VCS_CONF/conf/config/.stale on all systems running VCS.
Adding a User
1. Set the configuration to read/write mode:
# haconf -makerw
Modifying a User
1. Set the configuration to read/write mode:
# haconf -makerw
Deleting a User
1. Set the configuration to read/write mode:
# haconf -makerw
2. For users with Administrator and Operator access, remove their privileges:
# hauser -delpriv user Adminstrator|Operator [-group
service_groups]
Displaying a User
▼ To display a list of users
# hauser -list
Querying VCS
VCS enables you to query various cluster objects, including resources, service groups,
systems, resource types, agents, and clusters. You may enter query commands from any
system in the cluster. Commands to display information on the VCS configuration or
system states can be executed by all users: you do not need root privileges.
Querying Resources
▼ For a list of a resource’s dependencies
# hares -dep [resource]
Querying Agents
▼ For an agent’s run-time status
# haagent -display [agent]
If agent is not specified, information regarding all agents is displayed.
Faults Indicates the number of agent faults and the time the faults began.
Querying Systems
▼ For a list of systems in the cluster
# hasys -list
Querying Clusters
▼ For the value of a specific cluster attribute
# haclus -value attribute
Querying Status
▼ For the status of all service groups in the cluster, including resources
# hastatus
▼ For the status of cluster faults, including faulted service groups, resources,
systems, links, and agents
# hastatus -summary
Note Unless executed with the -summary option, hastatus continues to produce
output of online state transitions until you interrupt it with the command CTRL+C.
The option -any specifies hamsg return messages matching any of the specified
query options.
The option -tag specifies hamsg return messages matching the specified tag.
The option -otype specifies hamsg return messages matching the specified object
type:
VCS = general VCS messages
RES = resource
GRP = service group
SYS = system
AGT = agent
The option -oname specifies hamsg return messages matching the specified object
name.
The option -msgid specifies hamsg return messages matching the specified
message ID.
The option -path specifies where hamsg looks for the specified LDF. If not specified,
hamsg looks for files in the default directory /var/VRTSvcs/ldf.
The option -lang specifies the language in which to display messages. For example,
the value "en" specifies English and "ja" specifies Japanese.
Conditional Statements
Some query commands include an option for conditional statements. Conditional
statements take three forms:
Attribute=Value (the attribute equals the value)
Attribute!=Value (the attribute does not equal the value)
Attribute=~Value (the value is the prefix of the attribute, for example a query for
the state of a resource = ~FAULTED returns all resources whose state begins with
FAULTED.)
Note You can only query attribute-value pairs displayed in the output of command
hagrp -display, described in section “Querying Service Groups” on page 71.
▼ To start a service group on a system and bring online only the resources already
online on another system
# hagrp -online service_group -sys system -checkpartial
other_system
If the service group does not have resources online on the other system, the service
group is brought online on the original system and the checkpartial option is
ignored.
Note that the checkpartial option is used by the Preonline trigger during failover.
When a service group configured with Preonline =1 fails over to another system
(system 2), the only resources brought online on system 2 are those that were
previously online on system 1 prior to failover.
▼ To stop a service group only if all resources are probed on the system
# hagrp -offline [-ifprobed] service_group -sys system
See “Clearing Resources in the ADMIN_WAIT State” on page 361 for more
information.
Administering Resources
▼ To bring a resource online
# hares -online resource -sys system
▼ To clear a resource
Initiate a state change from RESOURCE_FAULTED to RESOURCE_OFFLINE:
# hares -clear resource [-sys system]
Clearing a resource initiates the online process previously blocked while waiting for
the resource to become clear. If system is not specified, the fault is cleared on each
system in the service group’s SystemList attribute. (For instructions, see “To clear
faulted, non-persistent resources in a service group” on page 78.)
This command also clears the resource’s parents. Persistent resources whose static
attribute Operations is defined as None cannot be cleared with this command and
must be physically attended to, such as replacing a raw disk. The agent then updates
the status automatically.
Administering Systems
▼ To force a system to start while in ADMIN_WAIT
# hasys -force system
This command overwrites the configuration on systems running in the cluster. Before
using it, verify that the current VCS configuration is valid.
▼ To freeze a system (prevent groups from being brought online or switched on the
system)
# hasys -freeze [-persistent] [-evacuate] system
The option -persistent enables the freeze to be “remembered” when the cluster is
rebooted. Note that the cluster configuration must be in read/write mode and must
be saved to disk (dumped) to enable the freeze to be remembered.
The option -evacuate fails over the system’s active service groups to another
system in the cluster before the freeze is enabled.
Administering Clusters
▼ To add a system to a cluster
This section provides an overview of tasks involved in adding a node to a cluster. For
detailed instructions, see the VERITAS Cluster Server Installation Guide.
1. Make sure the system meets the hardware and software requirements for VCS. See the
VERITAS Cluster Server Installation Guide for details.
4. Add the VCS license key. See “Installing a VCS License” on page 61 for instructions.
5. Configure LLT and GAB to include the new system in the cluster membership.
2. Switch or remove any VCS service groups from the node. The node cannot be
removed as long as it runs service groups on which other service groups depend.
6. Remove the entries for the node from the following files on each remaining node:
◆ /etc/gabtab
◆ /etc/llthosts
Encrypting Passwords
Use the vcsencrypt utility to encrypt passwords when editing the VCS configuration file
main.cf to add VCS users or when configuring agents that require user passwords.
Note Do not use the vcsencrypt utility when entering passwords from a configuration
wizard or from the Java and Web consoles.
▼ To encrypt a password
2. The utility prompts you to enter the password twice. Enter the password and press
Return.
# Enter New Password:
# Enter Again:
3. The utility encrypts the password and displays the encrypted password. Use the
displayed password to edit the VCS configuration file main.cf.
Note VCS must be in read/write mode before you can change the configuration.
For instructions, see “Setting the Configuration to Read/Write” on page 67.
Adding Resources
▼ To add a resource
# hares -add resource resource_type service_group
This command creates a new resource, resource, which must be a unique name throughout
the cluster, regardless of where it resides physically or in which service group it is placed.
The resource type is resource_type, which must be defined in the configuration language.
The resource belongs to the group service_group.
When new resources are created, all non-static attributes of the resource’s type, plus their
default values, are copied to the new resource. Three attributes are also created by the
system and added to the resource:
◆ Critical (default = 1). If the resource or any of its children faults while online, the
entire service group is marked faulted and failover occurs.
◆ AutoStart (default = 1). If the resource is set to AutoStart, it is brought online in
response to a service group command. All resources designated as AutoStart=1 must
be online for the service group to be considered online. (This attribute is unrelated to
AutoStart attributes for service groups.)
◆ Enabled. If the resource is set to Enabled, the agent for the resource’s type manages
the resource. The default is 1 for resources defined in the configuration file main.cf,
0 for resources added on the command line.
Note Adding resources on the command line requires several steps, and the agent must
be prevented from managing the resource until the steps are completed. For
resources defined in the configuration file, the steps are completed before the agent
is started.
Linking Resources
▼ To specify a dependency relationship, or “link,” between two resources
# hares -link parent_resource child_resource
The variable parent_resource depends on child_resource being online before going
online itself. Conversely, parent_resource go offline before child_resource goes offline.
For example, a NIC resource must be available before an IP resource can go online, so
for resources IP1 of type IP and NIC1 of type NIC, specify the dependency as:
# hares -link IP1 NIC1
▼ To delete a resource
# hares -delete resource
Note that deleting a resource won’t take offline the object being monitored by the
resource. The object remains online, outside the control and monitoring of VCS.
▼ To unlink resources
# hares -unlink parent_resource child_resource
Note You can unlink service groups and resources at any time. You cannot delete a
service group until all of its resources are deleted.
Note Under normal conditions, VCS agents are started and stopped automatically.
After issuing the commands above, a message is displayed instructing the user to look for
messages in the log file. The agent log is located at $VCS_HOME/log/agent_A.log. See
“Logging” on page 569 for more information on log messages.
For example, to set the AgentClass attribute of the FileOnOff resource to RealTime, type:
# hatype -modify FileOnOff AgentClass "RT"
For example, to set the AgentPriority attribute of the FileOnOff resource to 10, type:
# hatype -modify FileOnOff AgentPriority "10"
For example, to set the ScriptClass of the FileOnOff resource to RealTime, type:
# hatype -modify FileOnOff ScriptClass "RT"
For example, to set the ScriptClass of the FileOnOff resource to RealTime, type:
# hatype -modify FileOnOff ScriptPriority "40"
Note For attributes AgentClass and AgentPriority, changes are effective immediately. For
ScriptClass and ScriptPriority, changes become effective for scripts fired after the
execution of the hatype command.
Note For the attributes EngineClass and EnginePriority, changes are effective
immediately. For ProcessClass and ProcessPriority changes become effective only
for processes fired after the execution of the haclus command.
Option Action
hasnap -sdiff Displays files that were changed on the local system after a specific
snapshot was created.
hasnap -fdiff Displays the differences between a file in the cluster and its copy stored in a
snapshot.
hasnap -export Exports a snapshot from the local, predefined directory to the specified file.
hasnap -include Configures the list of files or directories to be included in new snapshots, in
addition to those included automatically by the -backup command.
hasnap -exclude Configures the list of files or directories to be excluded from new snapshots
when backing up the configuration using the -backup command.
hasnap -delete Deletes snapshots from the predefined local directory on each node.
Note With the exception of the -include, -exclude, and the -delete options, all
options can be combined with the -f option. This option indicates that all files be
backed up to or restored from the specified single file instead of a local, predefined
directory on each node. This option is useful when you want to store the
configuration data to an alternate location that is periodically backed up using
backup software like VERITAS Net Backup.
hasnap -backup
The hasnap -backup command backs up files in a snapshot format. A snapshot is a
collection of VCS configuration files backed up at a particular point in time, typically
before making changes to the existing configuration. A snapshot also contains
information such as the snapshot name, description, creation time, and file permisisons.
The command backs up a predefined list of VCS configuration files as well as a
user-defined list. The predefined list includes all the *.cf files, custom agents, LLT and
GAB configuration files, triggers, custom heartbeats, and action scripts. Please see the
-include and -exclude commands to construct a user-defined list.
Syntax
hasnap -backup [-f filename] [-n] [-m description]
Options
-n: Runs the command in the non-interactive mode
-m: Specifies a description of the snapshot
Examples
The following command creates a backup of the configuration in the non-interactive
mode and adds “Test Backup” as the backup description.
# hasnap -backup -n -m "Test Backup"
The following command creates a backup of the configuration files and saves it as
/tmp/backup-2-2-2003 on the node where the command was run.
# hasnap -backup -f /tmp/backup-2-2-2003
hasnap -restore
The hasnap -restore command restores configuration files from a previously created
snapshot.
Syntax
hasnap -restore [-f filename] [-n] [-s snapid]
Options
-n: Runs command in the non-interactive mode
-s: Specifies the ID of the snapshot to be restored
If no snapshot ID is specified, -restore displays which snapshots are available for
restoration.
Examples
The following command restores the snapshot vcs-20030101-22232 in the non-interactive
mode.
# hasnap -restore -n -s vcs-20030101-22232
The following command restores the snapshot stored in the file /tmp/backup-2-2-2003.
# hasnap -restore -f /tmp/backup-2-2-2003
hasnap -display
The hasnap -display command displays details of previously created snapshots.
Syntax
hasnap -display [-f filename] [-list|-s snapid] [-m] [-l] [-t]
Options
-list: Displays the list of snapshots in the repository
-s: Identifies the snapshot ID
-m: Displays snapshot description
-l: Displays the list of files in the snapshot
-t: Displays the snapshot timestamp
If no options are specified, the command displays all information about the latest
snapshot.
Examples
The following command lists all snapshots.
# hasnap -display -list
The following command displays the description and the time of creation of the specified
snapshot.
# hasnap -display -s vcs-20030101-2232 -m -t
The following command displays the description, the timestamp, and the list of all files in
the snapshot file /tmp/backup-2-2-2003
# hasnap -display -f /tmp/backup-2-2-2003
hasnap -sdiff
The hasnap -sdiff command displays files that were changed on the local system after
a specific snapshot was created.
Syntax
hasnap -sdiff [-f filename] [-s snapid] [-sys hostname]
Options
-s: Identifies the snapshot ID of the comparison snapshot.
-sys: Indicates the host on which the snapshot is to be compared.
If no options are specified, -sdiff uses the latest snapshot to compare the files on each
node in the cluster.
Examples
The following command displays the differences between the current configuration and
the snapshot vcs-20030101-22232.
# hasnap -sdiff -s vcs-20030101-22232
The following command displays the difference between the configuration on system
host1 and the snaphot stored in the file /tmp/backup-2-2-2003.
# hasnap -sdiff -f /tmp/backup-2-2-2003 -sys host1
hasnap -fdiff
The hasnap -fdiff command displays the differences between a file currently on the
cluster and its copy stored in a previously created snapshot.
Syntax
hasnap -fdiff [-f filename] [-s snapid] [-sys hostname] file
Options
-s: Identifies the snaphot ID of the snapshot.
-sys: Indicates the host on which the specified file is to be compared.
file: Identifies the comparison file.
If no options are specified, -fdiff uses the latest snapshot to compare the file on
each node in the cluster.
Examples
The following command displays the differences between the files
/etc/VRTSvcs/conf/config/main.cf on host1 and its version in the last snapshot.
# hasnap -fdiff -sys host1 /etc/VRTSvcs/conf/config/main.cf
The following command displays the differences between the files /var/llttab on each
node in the cluster and the version stored in the snapshot contained in the file
/var/backup-2-2-2003.
# hasnap -fdiff -f /tmp/backup-2-2-2003 /etc/llttab
hasnap -export
The hasnap -export command exports a snapshot from the local, predefined directory
on each node in the cluster to the specified file. This option is useful when you want to
store a previously created snapshot to an alternate location that is periodically backed up
using backup software like VERITAS NetBackup.
Syntax
hasnap -export -f filename [-s snapid]
Options
-s: Indicates the snapshot ID to be exported.
If the snapshot ID is not specified, the command exports the latest snapshot to the
specified file.
Example
The following command exports data from snapshot vcs-20030101-22232 from each node
in the cluster to the file /tmp/backup-2-2-2003 on the current node.
# hasnap -export -f /tmp/backup-2-2-2003 -s vcs-20030101-22232
hasnap -include
The hasnap -include command configures the list of files or directories to be included
in new snapshots, in addition to those included automatically by the -backup command.
Please see section on the -backup command for the list of files automatically included for
VCS.
Syntax
hasnap -include -add|-del|-list [-sys hostname] files|directories
Options
-add: Adds the specified files or directories to the include file list.
-del: Deletes the specified files or directories from the include file list.
-list: Displays the files or directories in the include file list.
files/directories: Identifies the file or directory names to be added to or deleted from the
include list. Use this attribute with the -add or -delete options only.
Examples
The following command displays the list of files or directories to be included in new
snapshots on each node of the cluster.
# hasnap -include -list
The following command adds the file /opt/VRTSweb/conf/vrtsweb.xml to the include
list on host1, which results in this file being included in the snapshot the next time the
hasnap -backup command is run.
# hasnap -include -add /opt/VRTSweb/conf/vrtsweb.xml
The following command removes the file /opt/VRTSweb/conf/vrtsweb.xml from the
include list on host1.
# hasnap -include -del -sys host1 /opt/VRTSweb/conf/vrtsweb.xml
hasnap -exclude
The hasnap -exclude command configures the list of files or directories that should
not be included in new snapshots when backing up the configuration using the -backup
command.
Syntax
hasnap -exclude -add|-del|-list [-sys hostname] files|directories
Options
-add: Adds the specified files or directories to the exclude file list.
-del: Deletes the specified files or directories from the exclude file list.
-list: Displays the files or directories in the exclude file list.
files/directories: Identifies the files or directories to be added to or deleted from the
exclude list. Use this attribute with the -add or -delete options only.
Examples
The following command displays the exclude file list on each node in the cluster.
# hasnap -exclude -list
hasnap -delete
The hasnap -delete command deletes previously created snapshots from the
predefined local directory on each node.
Syntax
hasnap -delete [-s snapid]
Options
-s: Snapshot ID to be deleted.
If the snapshot ID is not specified, the command displays a list of snapshots available
for deletion.
Example
The following command deletes snapshot vcs-20030101-22232 from the cluster.
# hasnap -delete -s vcs-20030101-22232
Disability Compliance
Cluster Manager (Java Console) for VCS provides disabled individuals access to and use
of information and data that is comparable to the access and use provided to non-disabled
individuals, including:
◆ Alternate keyboard sequences for specific operations (see matrix in appendix
“Accessibility and VCS” on page 675).
◆ High-contrast display settings.
◆ Support of third-party accessibility tools. Note that VERITAS has not tested screen
readers for languages other than English.
◆ Text-only display of frequently viewed windows.
109
Getting Started
Getting Started
✔ Make sure you have the current version of Cluster Manager (Java Console) installed.
If you have a previous version installed, upgrade to the latest version. Cluster
Manager (Java Console) is compatible with earlier versions of VCS.
✔ Cluster Manager (Java Console) is supported on Windows 2000, Windows XP, and
Windows 2003 systems. If you are using a Solaris system, you must use Solaris 2.7 or
higher to support JRE 1.4.
✔ Verify the configuration has a user account. A user account is established during VCS
installation that provides immediate access to Cluster Manager. If a user account does
not exist, you must create one. For instructions, see “Adding a User” on page 146.
✔ On UNIX systems, you must set the display for Cluster Manager (“Setting the
Display” on page 110).
✔ Start Cluster Manager (“Starting Cluster Manager (Java Console)” on page 112).
✔ Add a cluster panel (“Configuring a New Cluster Panel” on page 142).
✔ Log on to a cluster (“Logging On to and Off of a Cluster” on page 144).
Note Certain cluster operations are enabled or restricted depending on the privileges
with which you log on to VCS. For information on specific privileges associated
with VCS users, see “VCS User Privileges” on page 53.
1. Type the following command to grant the system permission to display on the
desktop:
# xhost +
2. Configure the shell environment variable DISPLAY on the system where Cluster
Manager will be launched. For example, if using Korn shell, type the following
command to display on the system myws:
# export DISPLAY=myws:0
2. Log on to the remote system and start an X clock program that you can use to test the
forward connection.
# xclock &.
Note Do not set the DISPLAY variable on the client. X connections forwarded through a
secure shell use a special local display setting.
2. From the client system, forward a port (client_port) to port 14141 on the VCS server.
# $ssh -L client_port:server_host:14141 server_host
You may not be able set GatewayPorts in the configuration file if you use openSSH.
In this case use the -g option in the command.
# $ssh -g -L client_port:server_host:14141 server_host
3. Open another window on the client system and start the Java Console.
# $/opt/VRTSvcs/bin/hagui
4. Add a cluster panel in the Cluster Monitor. When prompted, enter the name of client
system as the host and the client_port as the port. Do not enter localhost.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 111
Getting Started
Icon Description
Cluster
System
Service Group
Resource Type
Resource
OFFLINE
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 113
Icons in the Java Console
Icon Description
PARTIAL
UP AND IN JEOPARDY
FROZEN
AUTODISABLED
UNKNOWN
ADMIN_WAIT
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 115
About Cluster Monitor
Move Cluster Panel Up. Moves the selected cluster panel up.
Move Cluster Panel Down. Moves the selected cluster panel down.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 117
About Cluster Monitor
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 119
About Cluster Monitor
a. Click Native (Windows or Motif) look & feel or Java (Metal) look & feel.
b. Click Apply.
d. Select the Remove Cluster Manager colors check box to alter the standard color
scheme.
e. Click Apply.
a. Select the Enable Sound check box to associate sound with specific events.
e. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 121
About Cluster Explorer
The display is divided into three panes. The top pane includes a toolbar that enables you
to perform frequently used operations quickly. The left pane contains a configuration tree
with three tabs: Service Groups, Systems, and Resource Types. The right pane contains a
panel that displays various views relevant to the object selected in the configuration tree.
Save and Close Configuration. Writes the configuration to disk as a read-only file.
Add Service Group. Displays the Add Service Group dialog box.
Manage systems for a Service Group. Displays the System Manager dialog box.
Online Service Group. Displays the Online Service Group dialog box.
Offline Service Group. Displays the Offline Service Group dialog box.
Show Command Center. Enables you to perform many of the same VCS
operations available from the command line.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 123
About Cluster Explorer
Show the Logs. Displays alerts and messages received from the VCS engine, VCS
agents, and commands issued from the console.
Launch Notifier Resource Configuration Wizard. Enables you to set up VCS event
notification.
Add/Delete Remote Clusters. Enables you to add and remove global clusters.
Configure Global Groups. Enables you to convert a local service group to a global
group, and vice versa.
Query. Enables you to search the cluster configuration according to filter criteria.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 125
About Cluster Explorer
Status View
The Status View summarizes the state of the object selected in the configuration tree. Use
this view to monitor the overall status of a cluster, system, service group, resource type,
and resource.
For example, if a service group is selected in the configuration tree, the Status View
displays the state of the service group and its resources on member systems. It also
displays the last five critical or error logs. Point to an icon in the status table to open a
ScreenTip about the relevant VCS object.
For global clusters, this view displays the state of the remote clusters. For global groups,
this view shows the status of the groups on both local and remote clusters.
Properties View
The Properties View displays the attributes of VCS objects. These attributes describe the
scope and parameters of a cluster and its components.
To view information on an attribute, click the attribute name or the icon in the Help
column of the table. For a complete list of VCS attributes, including their type, dimension,
and definition, see the appendix “VCS Attributes.”
By default, this view displays key attributes of the object selected in the configuration tree.
The Properties View for a resource displays key attributes of the resource and attributes
specific to the resource types. It also displays attributes whose values have been
overridden. See“Overriding Resource Type Static Attributes” on page 179 for more
information.
To view all attributes associated with the selected VCS object, click Show all attributes.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 127
About Cluster Explorer
Resource View
The Resource View displays the resources in a service group. Use the graph and
ScreenTips in this view to monitor the dependencies between resources and the status of
the service group on all or individual systems in a cluster.
In the graph, the line between two resources represents a dependency, or parent-child
relationship. Resource dependencies specify the order in which resources are brought
online and taken offline. During a failover process, the resources closest to the top of the
graph must be taken offline before the resources linked to them are taken offline.
Similarly, the resources that appear closest to the bottom of the graph must be brought
online before the resources linked to them can come online.
◆ A resource that depends on other resources is a parent resource. The graph links a
parent resource icon to a child resource icon below it. Root resources (resources
without parents) are displayed in the top row.
◆ A resource on which the other resources depend is a child resource. The graph links a
child resource icon to a parent resource icon above it.
◆ A resource can function as a parent and a child.
Point to a resource icon to display ScreenTips about the type, state, and key attributes of
the resource. The state of the resource reflects the state on a specified system (local).
In the bottom pane of the Resource View, point to the system and service group icons to
display ScreenTips about the service group status on all or individual systems in a cluster.
Click a system icon to view the resource graph of the service group on the system. Click
the service group icon to view the resource graph on all systems in the cluster.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 129
About Cluster Explorer
1. From Cluster Explorer, click the service groups tab in the configuration tree.
Click Link to set or disable the link mode for the Service Group and Resource Views.
Note: There are alternative ways to set up dependency links without using the Link
button.
The link mode enables you to create a dependency link by clicking on the parent icon,
dragging the yellow line to the icon that will serve as the child, and then clicking the child
icon. Use the Esc key to delete the yellow dependency line connecting the parent and
child during the process of linking the two icons.
If the Link mode is not activated, click and drag an icon along a horizontal plane to move
the icon. Click Auto Arrange to reset the appearance of the graph. The view resets the
arrangement of icons after the addition or deletion of a resource, service group, or
dependency link. Changes in the Resource and Service Group Views will be maintained
after the user logs off and logs on to the Java Console at a later time.
◆ To move the view to the left or right, click a distance (in pixels) from the drop-down
list box between the hand icons. Click the <- or -> hand icon to move the view in the
desired direction.
◆ To shrink or enlarge the view, click a size factor from the drop-down list box between
the magnifying glass icons. Click the - or + magnifying glass icon to modify the size of
the view.
◆ To view a segment of the graph, point to the box to the right of the + magnifying glass
icon. Use the red outline in this box to encompass the appropriate segment of the
graph. Click the newly outlined area to view the segment.
◆ To return to the original view, click the magnifying glass icon labeled 1.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 131
About Cluster Explorer
VCS monitors systems and their services over a private network. The systems
communicate via heartbeats over an additional private network, which enables them to
recognize which systems are active members of the cluster, which are joining or leaving
the cluster, and which have failed.
VCS protects against network failure by requiring that all systems be connected by two or
more communication channels. When a system is down to a single heartbeat connection,
VCS can no longer discriminate between the loss of a system and the loss of a network
connection. This situation is referred to as jeopardy.
Point to a system icon to display a ScreenTip on the links and disk group heartbeats. If a
system in the cluster is experiencing a problem connecting to other systems, the system
icon changes its appearance to indicate the link or disk heartbeat is down. In this situation,
a jeopardy warning may appear in the ScreenTip for this system.
The Remote Cluster Status View provides an overview of the clusters and global groups
in a global cluster environment. Use this view to view the name, address, and status of a
cluster, and the type (Icmp or IcmpS) and state of a heartbeat.
This view enables you to declare a remote cluster fault as a disaster, disconnect, or outage.
Point to a table cell to view information about the VCS object.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 133
Accessing Additional Features of the Java Console
Template View
The Template View displays the service group templates available in VCS. Templates are
predefined service groups that define the resources, resource attributes, and
dependencies within the service group. Use this view to add service groups to the cluster
configuration, and copy the resources within a service group template to existing service
groups.
In this window, the left pane displays the templates available on the system to which
Cluster Manager is connected. The right pane displays the selected template’s resource
dependency graph.
Template files conform to the VCS configuration language and contain the extension .tf.
These files reside in the VCS configuration directory.
System Manager
Use System Manager to add and remove systems in a service group’s system list.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 135
Accessing Additional Features of the Java Console
User Manager
User Manager enables you to add and delete user profiles and to change user privileges.
You must be logged in as Cluster Administrator to access User Manager.
Command Center
Command Center enables you to build and execute VCS commands; most commands that
are executed from the command line can also be executed through this window. The left
pane of the window displays a Commands tree of all VCS operations. The right pane
displays a view panel that describes the selected command. The bottom pane displays the
commands being executed.
The commands tree is organized into Configuration and Operations folders. Click the
icon to the left of the Configuration or Operations folder to view its subfolders and
command information in the right pane. Point to an entry in the commands tree to display
information about the selected command.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 137
Accessing Additional Features of the Java Console
Configuration Wizard
Use Configuration Wizard to create and assign service groups to systems in a cluster.
Cluster Query
Use Cluster Query to run SQL-like queries from Cluster Explorer. VCS objects that can be
queried include service groups, systems, resources, and resource types. Some queries can
be customized, including searching for the system’s online group count and specific
resource attributes.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 139
Accessing Additional Features of the Java Console
Logs
The Logs dialog box displays the log messages generated by the VCS engine, VCS agents,
and commands issued from Cluster Manager to the cluster. Use this dialog box to monitor
and take actions on alerts on faulted global clusters and failed service group failover
attempts.
Note To ensure the time stamps for engine log messages are accurate, make sure to set the
time zone of the system running the Java Console to the same time zone as the
system running the VCS engine.
✔ Click the VCS Logs tab to view the log type, time, and details of an event. Each
message presents an icon in the first column of the table to indicate the message type.
Use this window to customize the display of messages by setting filter criteria.
✔ Click the Agent Logs tab to display logs according to system, resource type, and
resource filter criteria. Use this tab to view the log type, time, and details of an agent
event.
✔ Click the Command Logs tab to view the status (success or failure), time, command
ID, and details of a command. The Command Log only displays commands issued in
the current session.
✔ Click the Alerts tab to view situations that may require administrative action. Alerts
are generated when a local group cannot fail over to any system in the local cluster, a
global group cannot fail over, or a cluster fault takes place. A current alert will also
appear as a pop-up window when you log on to a cluster through the console.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 141
Administering Cluster Monitor
b. If necessary, change the default port number of 14141; VCS Simulator uses a
default port number of 14153. Note that you must use a different port to connect
to each Simulator instance, even if these instances are running on the same
system.
c. Enter the number of failover retries. VCS sets the default failover retries number
to 12.
b. Enter the port number and the number of failover retries. VCS sets the default
port number to 14141 and failover retries number to 12; VCS Simulator uses a
default port number of 14153.
d. Click OK.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 143
Administering Cluster Monitor
Logging on to a Cluster
2. Click the panel that represents the cluster you want to log on to and monitor.
or
If the appropriate panel is highlighted, click Login on the File menu.
b. Click OK.
The animated display shows various objects, such as service groups and resources, being
transferred from the server to the console.
Cluster Explorer is launched automatically upon initial logon, and the icons in the cluster
panel change color to indicate an active panel.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 145
Administering User Profiles
Adding a User
1. From Cluster Explorer, click User Manager on the File menu.
c. Select the appropriate check boxes to grant privileges to the user. To grant Group
Administrator or Group Operator privileges, proceed to step 3d. Otherwise,
proceed to step 3f.
e. Click the groups for which you want to grant privileges to the user and click the
right arrow to move the groups to the Selected Groups box.
f. Click OK to exit the Add User dialog box, then click OK again to exit the Add
Group dialog box.
4. Click Close.
Deleting a User
1. From Cluster Explorer, click User Manager on the File menu.
4. Click Yes.
c. Click OK.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 147
Administering User Profiles
c. Click OK.
Note Before changing the password, make sure the configuration is in the read-write
mode. Cluster administrators can change the configuration to the read-write mode.
3. Click Change Privileges and enter the details for user privileges:
a. Select the appropriate check boxes to grant privileges to the user. To grant Group
Administrator or Group Operator privileges, proceed to step 4b. Otherwise,
proceed to step 4d.
c. Click the groups for which you want to grant privileges to the user, then click the
right arrow to move the groups to the Selected Groups box.
d. Click OK in the Change Privileges dialog box, then click Close in the User
Manager dialog box.
b. In the Available Systems box, click the systems on which the service group will
be added.
c. Click the right arrow to move the selected systems to the Systems for Service
Group box. The priority number (starting with 0) is automatically assigned to
indicate the order of systems on which the service group will start in case of a
failover. If necessary, double-click the entry in the Priority column to enter a new
value.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 149
Administering Service Groups
f. Click the appropriate service group type. A failover service group runs on only
one system at a time; a parallel service group runs concurrently on multiple
systems.
g. Click Show Command in the bottom left corner if you want to view the command
associated with the service group. Click Hide Command to close the view of the
command.
h. Click OK.
3. In the Available Systems box, click the systems on which the service group will be
added.
4. Click the right arrow to move the selected systems to the Systems for Service Group
box. The priority number (starting with 0) is automatically assigned to indicate the
order of systems on which the service group will start in case of a failover. If
necessary, double-click the entry in the Priority column to enter a new value.
5. To add a new service group based on a template, click Templates. Otherwise, proceed
to step 8.
7. Click OK.
8. Click the appropriate service group type. A failover service group runs on only one
system at a time; a parallel service group runs concurrently on multiple systems.
9. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 151
Administering Service Groups
2. Right-click the Template View panel, and click Add as Service Group from the
pop-up menu. This adds the service group template to the cluster configuration file
without associating it to a particular system.
3. Use System Manager to add the service group to systems in the cluster.
Note You cannot delete service groups with dependencies. To delete a linked service
group, you must first delete the link.
1. In the Service Groups tab of the configuration tree, right-click the service group.
or
Click a cluster in the configuration tree, click the Service Groups tab, and right-click
the service group icon in the view panel.
3. Click Yes.
3. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 153
Administering Service Groups
1. In the Service Groups tab of the configuration tree, right-click the service group.
or
Click a cluster in the configuration tree, click the Service Groups tab, and right-click
the service group icon in the view panel.
2. Click Online, and click the appropriate system from the menu. Click Any System if
you do not need to specify a system.
b. For global groups, select the cluster in which to bring the group online.
c. Click the system on which to bring the group online, or select the Any System
check box.
d. Select the No Preonline check box to bring the service group online without
invoking the preonline trigger.
e. Click Show Command in the bottom left corner to view the command associated
with the service group. Click Hide Command to close the view of the command
f. Click OK.
3. For global groups, select the cluster in which to bring the group online.
4. Click the system on which to bring the group online, or select the Any System check
box.
5. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 155
Administering Service Groups
1. In the Service Groups tab of the configuration tree, right-click the service group.
or
Click a cluster in the configuration tree, click the Service Groups tab, and right-click
the service group icon in the view panel.
2. Click Offline, and click the appropriate system from the menu. Click All Systems to
take the group offline on all systems.
b. For global groups, select the cluster in which to take the group offline.
c. Click the system on which to take the group offline, or click All Systems.
d. Click Show Command in the bottom left corner if you want to view the command
associated with the service group. Click Hide Command to close the view of the
command.
e. Click OK.
3. For global groups, select the cluster in which to take the group offline.
4. Click the system on which to take the group offline, or click the All Systems check
box.
5. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 157
Administering Service Groups
1. In the Service Groups tab of the configuration tree, right-click the service group.
or
Click the cluster in the configuration tree, click the Service Groups tab, and right-click
the service group icon in the view panel.
2. Click Switch To, and click the appropriate system from the menu.
3. For global groups, select the cluster in which to switch the service group.
4. Click the system on which to bring the group online, or select the Any System check
box.
5. Click Apply.
1. In the Service Groups tab of the configuration tree, right-click the service group.
or
Click the cluster in the configuration tree, click the Service Groups tab, and right-click
the service group icon in the view panel.
2. Click Freeze, and click Temporary or Persistent from the menu. The persistent option
maintains the frozen state after a reboot if the user saves this change to the
configuration.
3. Select the persistent check box if necessary. The persistent option maintains the frozen
state after a reboot if the user saves this change to the configuration.
4. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 159
Administering Service Groups
1. In the Service Groups tab of the configuration tree, right-click the service group.
or
Click the cluster in the configuration tree, click the Service Groups tab, and right-click
the service group icon in the view panel.
2. Click Unfreeze.
3. Click Apply.
1. In the Service Groups tab of the configuration tree, right-click the service group.
or
Click the cluster in the configuration tree, click the Service Groups tab, and right-click
the service group icon in the view panel.
2. Click Enable, and click the appropriate system from the menu. Click all to enable the
group on all systems.
3. Select the Per System check box to enable the group on a specific system instead of all
systems.
4. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 161
Administering Service Groups
1. In the Service Groups tab of the configuration tree, right-click the service group.
or
Click the cluster in the configuration tree, click the Service Groups tab, and right-click
the service group icon in the view panel.
2. Click Disable, and click the appropriate system in the menu. Click all to disable the
group on all systems.
3. Select the Per System check box to disable the group on a specific system instead of all
systems.
4. Click Apply.
1. In the Service Groups tab of the configuration tree, right-click the service group.
or
Click the cluster in the configuration tree, click the Service Groups tab, and right-click
the service group icon in the view panel.
2. Click Autoenable, and click the appropriate system from the menu.
4. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 163
Administering Service Groups
1. In the Service Groups tab of the configuration tree, right-click the service group.
or
Click the cluster in the configuration tree, click the Service Groups tab, and right-click
the service group icon in the view panel.
2. Click Flush, and click the appropriate system from the menu.
4. Click Apply.
2. In the view panel, click the Service Groups tab. This opens the service group
dependency graph. To link a parent group with a child group:
a. Click Link.
c. Move the mouse toward the child group. The yellow line “snaps” to the child
group. If necessary, press Esc on the keyboard to delete the line between the
parent and the pointer before it snaps to the child.
e. In the Link Service Groups dialog box, click the group relationship and
dependency type. See “Categories of Service Group Dependencies” on page 387
for details on group dependencies.
f. Click OK or perform steps 1 and 2, right-click the parent group, and click Link
from the menu.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 165
Administering Service Groups
g. Click the child group, relationship, and dependency type. See “Categories of
Service Group Dependencies” on page 387 for details on group dependencies.
h. Click OK.
2. Click the parent resource group in the Service Groups box. After selecting the parent
group, the potential groups that can serve as child groups are displayed in the Child
Service Groups box.
4. Click the group relationship and dependency type. See Chapter 13 for details on
group dependencies.
5. Click Apply.
3. In the Service Group View, right-click the link between the service groups.
5. Click Yes.
2. Click the parent resource group in the Service Groups box. After selecting the parent
group, the corresponding child groups are displayed in the Child Service Groups
box.
4. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 167
Administering Service Groups
1. In the System Manager dialog box, click the system in the Available Systems box.
2. Click the right arrow to move the available system to the Systems for Service Group
table.
3. The priority number (starting with 0) is assigned to indicate the order of systems on
which the service group will start in case of a failover. If necessary, double-click the
entry in the Priority column to enter a new value.
4. Click OK.
1. In the System Manager dialog box, click the system in the Systems for Service Group
table.
2. Click the left arrow to move the system to the Available Systems box.
3. Click OK.
1. Open the Configuration Wizard. From Cluster Explorer, click Configuration Wizard
on the Tools menu.
3. Specify the name and target systems for the service group:
c. Click the right arrow to move the systems to the Systems for Service Group table.
To remove a system from the table, click the system and click the left arrow.
4. Click Next.
5. Click Next again to configure the service group with a template and proceed to step 7.
Click Finish to add an empty service group to the selected cluster systems and
configure it at a later time.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 169
Administering Service Groups
6. Click the template on which to base the new service group. The Templates box lists
the templates available on the system to which Cluster Manager is connected. The
resource dependency graph of the templates, the number of resources, and the
resource types are also displayed.
7. Click Next. If a window notifies you that the name of the service group or resource
within the service group is already in use, proceed to step 9. Otherwise, proceed to
step 10.
8. Click Next to apply all of the new names listed in the table to resolve the name clash.
or
Modify the clashing names by entering text in the field next to the Apply button,
clicking the location of the text for each name from the Correction drop-down list box,
clicking Apply, and clicking Next.
9. Click Next to create the service group. A progress indicator displays the status.
10. After the service group is successfully created, click Next to edit attributes using the
wizard and proceed to step 12. Click Finish to edit attributes at a later time using
Cluster Explorer.
11. Review the attributes associated with the resources of the service group. If necessary,
proceed to step 13 to modify the default values of the attributes. Otherwise, proceed
to step 14 to accept the default values and complete the configuration.
e. Click OK.
Administering Resources
Use the Java Console to administer resources in the cluster. Use the console to add and
delete, bring online and take offline, probe, enable and disable, clear, and link and unlink
resources. You can also import resource types to the configuration.
Adding a Resource
The Java Console provides several ways to add a resource to a service group. Use Cluster
Explorer or Command Center to perform this task.
c. Edit resource attributes according to your configuration. The Java Console also
enables you to edit attributes after adding the resource.
d. Select the Critical and Enabled check boxes, if applicable. The Critical option is
selected by default.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 171
Administering Resources
A critical resource indicates the service group is faulted when the resource, or any
resource it depends on, faults. An enabled resource indicates agents monitor the
resource; you must specify the values of mandatory attributes before enabling a
resource. If a resource is created dynamically while VCS is running, you must
enable the resource before VCS monitors it. VCS will not bring a disabled resource
nor its children online, even if the children are enabled.
e. Click Show Command in the bottom left corner to view the command associated
with the resource. Click Hide Command to close the view of the command
f. Click OK.
5. Edit resource attributes according to your configuration. The Java Console also
enables you to edit attributes after adding the resource.
6. Select the Critical and Enabled check boxes, if applicable. The Critical option is
selected by default.
A critical resource indicates the service group is faulted when the resource, or any
resource it depends on, faults. An enabled resource indicates agents monitor the
resource; you must specify the values of mandatory attributes before enabling a
resource. If a resource is created dynamically while VCS is running, you must enable
the resource before VCS monitors it. VCS will not bring a disabled resource nor its
children online, even if the children are enabled.
7. Click Apply.
2. In the left pane of the Template View, click the template from which to add resources
to your configuration.
4. Click Copy, and click Self from the menu to copy the resource. Click Copy, and click
Self and Child Nodes from the menu to copy the resource with its dependent
resources.
5. In the Service Groups tab of the Cluster Explorer configuration tree, click the service
group to which to add the resources.
7. Right-click the Resource View panel and click Paste from the menu. After the
resources are added to the service group, edit the attributes to configure the resources.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 173
Administering Resources
Deleting a Resource
▼ To delete a resource from Cluster Explorer
1. In the Service Groups tab of the configuration tree, right-click the resource.
or
Click a service group in the configuration tree, click the Resources tab, and right-click
the resource icon in the view panel.
3. Click Yes.
3. Click Apply.
1. In the Service Groups tab of the configuration tree, right-click the resource.
or
Click a service group in the configuration tree, click the Resources tab, and right-click
the resource icon in the view panel.
2. Click Online, and click the appropriate system from the menu.
2. Click a resource.
4. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 175
Administering Resources
1. In the Service Groups tab of the configuration tree, right-click the resource.
or
Click a service group in the configuration tree, click the Resources tab, and right-click
the resource icon in the view panel.
2. Click Offline, and click the appropriate system from the menu.
2. Click a resource.
4. If necessary, select the ignoreparent check box to take a selected child resource offline,
regardless of the state of the parent resource. This option is only available through
Command Center.
5. Click Apply.
▼ To take a parent resource and its child resources offline from Cluster Explorer
2. Click Offline Prop, and click the appropriate system from the menu.
▼ To take a parent resource and its child resources offline from Command Center
3. Click the system on which to offline the resource and its child resources.
4. Click Apply.
▼ To take child resources offline from Command Center while ignoring the state of the
parent resource
3. Click the system on which to offline the resource and its child resources.
5. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 177
Administering Resources
Probing a Resource
Probe a resource to check that it is configured and ready to bring online.
1. In the Service Groups tab of the configuration tree, right-click the resource.
2. Click Probe, and click the appropriate system from the menu.
4. Click Apply.
1. Right-click the resource in the Service Groups tab of the configuration tree or in the
Resources tab of the view panel.
4. Click OK.
The selected attributes appear in the Overridden Attributes table in the Properties
view for the resource.
5. To modify the default value of an overridden attribute, click the icon in the Edit
column of the attribute.
1. Right-click the resource in the Service Groups tab of the configuration tree or in the
Resources tab of the view panel.
4. Click OK.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 179
Administering Resources
1. From Cluster Explorer, click the Service Groups tab of the configuration tree.
2. Right-click a disabled resource in the configuration tree, and click Enabled from the
menu.
1. From Cluster Explorer, click the Service Groups tab in the configuration tree.
3. Click Apply.
1. From Cluster Explorer, click the Service Groups tab in the Cluster Explorer
configuration tree.
1. From Cluster Explorer, click the Service Groups tab in the configuration tree.
3. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 181
Administering Resources
Clearing a Resource
Clear a resource to remove a fault and make the resource available to go online. A
resource fault can occur in a variety of situations, such as a power failure or a faulty
configuration.
1. In the Service Groups tab of the configuration tree, right-click the resource.
2. Click Clear, and click the system from the menu. Click Auto instead of a specific
system to clear the fault on all systems where the fault occurred.
2. Click the resource. To clear the fault on all systems listed in the Systems box, proceed
to step 5. To clear the fault on a specific system, proceed to step 3.
5. Click Apply.
Linking Resources
Use Cluster Explorer or Command Center to link resources in a service group.
3. In the view panel, click the Resources tab. This opens the resource dependency graph.
To link a parent resource with a child resource:
a. Click Link.
c. Move the mouse towards the child resource. The yellow line “snaps” to the child
resource. If necessary, press Esc to delete the line between the parent and the
pointer before it snaps to the child.
f. Click OK.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 183
Administering Resources
3. Click the parent resource in the Service Group Resources box. After selecting the
parent resource, the potential resources that can serve as child resources are displayed
in the Child Resources box.
5. Click Apply.
Unlinking Resources
Use Cluster Explorer or Command Center to unlink resources in a service group.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 185
Administering Resources
3. Click the parent resource in the Service Group Resources box. After selecting the
parent resource, the corresponding child resources are displayed in the Child
Resources box.
5. Click Apply.
1. In the Service Groups tab of the configuration tree, right-click the resource.
2. Click Actions.
c. To add an argument, click the Add icon (+) and enter the argument. Click the
Delete icon (-) to remove the argument.
d. Click OK.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 187
Administering Resources
1. In the Service Groups tab of the configuration tree, right-click the resource.
2. Click Refresh ResourceInfo, and click the system on which to refresh the attribute
value.
1. In the Service Groups tab of the configuration tree, right-click the resource.
2. Click Clear ResourceInfo, and click the system on which to reset the attribute value.
a. Click the file from which to import the resource type. The dialog box displays the
files on the system that Cluster Manager is connected to.
b. Click Import.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 189
Administering Systems
Administering Systems
Use the Java Console to administer systems in the cluster. Use the console to add, delete,
freeze, and unfreeze systems.
Adding a System
Cluster Explorer and Command Center enable you to add a system to the cluster. A
system must have an entry in the LLTTab configuration file before it can be added to the
cluster.
3. Click Show Command in the bottom left corner to view the command associated with
the system. Click Hide Command to close the view of the command.
4. Click OK.
3. Click Apply.
Deleting a System
▼ To delete a system from Command Center
3. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 191
Administering Systems
Freezing a System
Freeze a system to prevent its components from failing over to another system. Use this
procedure during a system upgrade.
2. In the configuration tree, right-click the system, click Freeze, and click Temporary or
Persistent from the menu. The persistent option maintains the frozen state after a
reboot if the user saves this change to the configuration.
3. If necessary, select the persistent and evacuate check boxes. The evacuate option
moves all service groups to a different system before the freeze operation takes place.
The persistent option maintains the frozen state after a reboot if the user saves this
change to the configuration.
4. Click Apply.
Unfreezing a System
Unfreeze a frozen system to perform online and offline operations on the system.
3. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 193
Administering Clusters
Administering Clusters
Use the Java Console to specify the clusters you want to view from the console, and to
modify the VCS configuration. The configuration details the parameters of the entire
cluster. Use Cluster Explorer or Command Center to open, save, and “save and close” a
configuration. VCS Simulator enables you to administer the configuration on the local
system while VCS is offline.
2. Click Apply.
2. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 195
Administering Clusters
2. Click Apply.
Executing Commands
Use Command Center to execute commands on a cluster. Command Center enables you
to run commands organized as “Configuration” and “Operation.”
1. From Command Center, click the command from the command tree. If necessary,
expand the tree to view the command.
2. In the corresponding command interface, click the VCS objects and appropriate
options (if necessary).
3. Click Apply.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 197
Editing Attributes
Editing Attributes
Use the Java Console to edit attributes of VCS objects. By default, the Java Console
displays key attributes and type specific attributes. To view all attributes associated with
an object, click Show all attributes.
1. From the Cluster Explorer configuration tree, click the object whose attributes you
want to edit.
2. In the view panel, click the Properties tab. If the attribute does not appear in the
Properties View, click Show all attributes. This opens the Attributes View.
3. In the Properties or Attributes View, click the icon in the Edit column of the Key
Attributes or Type Specific Attributes table. In the Attributes View, click the icon in
the Edit column of the attribute.
4. In the Edit Attribute dialog box, enter the changes to the attributes values.
To edit a scalar value:
Enter or click the value.
To edit a non-scalar value:
Use the + button to add an element. Use the - button to delete an element.
To change the attribute’s scope:
Click the Global or Per System option.
To change the system for a local attribute:
Click the system from the menu.
5. Click OK.
3. In the attribute table, click the icon in the Edit column of the attribute.
4. In the Edit Attribute dialog box, enter the changes to the attributes values.
To edit a scalar value:
Enter or click the value.
To edit a non-scalar value:
Use the + button to add an element. Use the - button to delete an element.
To change the attribute’s scope:
Click the Global or Per System option.
To change the system for a local attribute:
Click the system from the menu.
5. Click OK.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 199
Querying the Cluster Configuration
c. Click the appropriate phrase or symbol between the search item and value.
d. Click the appropriate value for the specified query. Certain queries allow the user
to enter specific filter information:
Click System, click Online Group Count, click <, and type the required value in
the blank field.
or
Click Resource, click [provide attribute name] and type in the name of an
attribute, click = or contains, and type the appropriate value of the attribute in the
blank field. For example, click Resource, click [provide attribute name] and type
in pathname, click contains, and type c:\temp in the blank field.
g. Click Search. The results appear in tabular format at the bottom of the dialog box.
To search a new item, click Reset to reset the dialog box to its original blank state.
2. Click Next.
c. Click the right arrow to move the systems to the Systems for Service Group table.
To remove a system from the table, click the system and click the left arrow. The
priority number (starting with 0) is assigned to indicate the order of systems on
which the service group will start in case of a failover. If necessary, double-click
the entry in the Priority column to enter a new value.
Note This setp assumes that you need to create both the ClusterService group and the
Notifier resource. If the ClusterService group exists but the Notifier resource is
configured under another group, you can modify the attributes of the existing
Notifier resource and system list for that group. If the ClusterService group is
configured but the Notifier resource is not configured, the Notifier resource will be
created and added to the ClusterService group.
4. Click Next.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 201
Setting up VCS Event Notification Using Notifier Wizard
5. Choose the mode of notification which needs to be configured. Select the check boxes
to configure SNMP and/or SMTP (if applicable).
a. Click + to create the appropriate number of fields for the SNMP consoles and
severity levels. Click - to remove a field.
b. Enter the console and click the severity level from the menu. For example,
"snmpserv" and "Information".
c. Enter the SNMP trap port. For example, "162" is the default value.
b. Click + to create the appropriate number of fields for recipients of the notification
and severity levels. Click - to remove a field.
c. Enter the recipient and click the severity level in the drop-down list box. For
example, "admin@yourcompany.com" and "Warning".
8. Click Next.
c. Click the icon (...) in the Discover column of the table to find the MACAddress for
each system.
11. Click the Bring the Notifier Resource Online checkbox, if desired.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 203
Administering Logs
Administering Logs
The Java Console enables you to customize the log display of messages generated by the
engine. In the Logs dialog box, you can set filter criteria to search and view messages, and
monitor and resolve alert messages.
To browse the logs for detailed views of each log message, double-click the event’s
description. Use the arrows in the VCS Log details pop-up window to navigate backward
and forward through the message list.
b. From the Logs of list, select the category of log messages to display.
c. From the Named menu, select the name of the selected object or component. To
view all the messages for the selected category, click All.
d. In the Logs from last field, enter the numerical value and select the time unit.
e. To search log messages, enter the search string. Select the Whole String check
box, if required.
f. Click OK.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 205
Administering Logs
d. Click the name of the resource. To view messages for all resources, click All.
Monitoring Alerts
The Java Console sends automatic alerts that require administrative action and are
displayed on the Alerts tab of the Logs dialog box. Use this tab to take action on the alert
or delete the alert.
1. In the Alert tab or dialog box, click the alert to take action on.
4. Click OK.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 207
Administering Logs
▼ To delete an alert
b. Click OK.
Chapter 7, Administering the Cluster from Cluster Manager (Java Console) 209
Administering VCS Simulator
Disability Compliance
Cluster Manager (Web Console) for VCS provides disabled individuals access to and use
of information and data that is comparable to the access and use provided to non-disabled
individuals, including:
◆ Alternate keyboard sequences for specific operations (see matrix in appendix
“Accessibility and VCS”).
◆ High-contrast display settings.
◆ Support of third-party accessibility tools.
◆ Text-only display of frequently viewed windows.
211
Before Using the Web Console
2. Add a resource of type NIC to the service group. Name the resource csgnic. Set the
value of the Device attribute to the name of the NIC. Configure other attributes, if
desired.
3. Add a resource of type IP to the service group. Name the resource webip. Configure
the following attributes for the resource:
◆ Address: A virtual IP address to be assigned to VCS Cluster Manager (Web
Console.) The GUI is accessed using this IP address.
◆ Device: The name the public network card on the system from which the Web
GUI will run. Device is defined as a local attribute for each system in the cluster.
◆ NetMask: The subnet to which the virtual IP address belongs.
◆ Critical: Set this attribute to True to make webip a critical resource.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 213
Before Using the Web Console
4. Add a resource of type VRTSWebApp to the service group. Name the resource
VCSweb. Configure the following attributes for the resource:
◆ Appname: Set to vcs.
◆ InstallDir: Set to /opt/VRTSweb/VERITAS.
◆ TimeForOnline: Set to 5.
◆ Critical: Set to False.
7. Bring the ClusterService service group online. You can now access the GUI from
http://IP_alias:8181/vcs, where IP_alias is the virtual IP address configured in the
service group.
Sample Configuration
group ClusterService (
SystemList = { vcslnx5, vcslnx6 }
AutoStartList = { vcslnx5, vcslnx6 }
OnlineRetryLimit = 3
)
IP webip (
Address = "162.39.9.85"
NetMask = "255.255.255.0"
Device = "eth0"
)
NIC csgnic (
Device = "eth0"
NetworkHosts = {"162.39.1.1", "162.39.144.156"}
)
VRTSWebApp VCSweb (
AppName = "vcs"
InstallDir = "/opt/VRTSweb/VERITAS"
TimeForOnline = 5
Critical = 0
)
VCSweb requires webip
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 215
Before Using the Web Console
2. In the Advanced tab, verify the JIT compiler for virtual machine enabled check box
is selected under Microsoft VM.
VERITAS recommends using Microsoft VM on IE. If the IE 6.0 browser does not provide a
Microsoft VM option on the Advanced tab, you must download the Java Plug-in from
http://java.sun.com/products/plugin/index-1.4.html. You can install any version prior
to 1.4.2 on the client. VERITAS recommends using version 1.4.1_x.
If the IE 5.5 or 6.0 browser stops responding or “hangs” after logging on to a cluster
through the Web Console, verify the type of Java Plug-in used by the browser on the
Advanced tab described above.
1. Clear the Use Java 2 v1.4.2_x for <applet> check box under Java (Sun) on the
Advanced tab.
2. Select the JIT compiler for virtual machine enabled check box under Microsoft VM
and click OK. If Microsoft VM is not an available option, check that the system is
running only one Java Runtime Environment. If multiple JREs are installed on the
client, uninstall the earlier versions and keep the latest version. VERITAS
recommends using JRE 1.4.1.
If the Java Plug-in is not enabled on Netscape 6.2. or 7.0, you must download the
Plug-in from the Netscape Web site (http://wp.netscape.com/plugins/jvm.html) and
configure it according to the instructions on the site. Use any Java Plug-in prior to
version 1.4.2_x.
Note Certain cluster operations are enabled or restricted depending on the privileges
with which you log on to VCS. For information about the specific privileges
associated with VCS users, see the appendix “VCS User Privileges—Administration
Matrices” on page 595.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 217
Before Using the Web Console
4. Enter the information for the host system and port number:
c. Click OK.
b. Click OK.
b. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 219
Before Using the Web Console
1. Click the appropriate cluster name listed on the Management Host page.
3. Click Login.
1. Click Logout in the top right corner of any view if you are viewing a particular cluster.
or
Click Logout next to the cluster name on the Management Host page.
2. Clear the Save Configuration check box if you do not want to save the latest
configuration.
3. Click Yes.
1. Click Logout All in the top right corner of the Management Host page.
2. Clear the Save Configuration check box if you do not want to save the latest
configurations for all clusters you are logged on to through the Web Console.
3. Click Yes.
Using Help
The Web Console provides context-sensitive online help in a separate browser for the
various views in the console. Use the Contents, Index, and Search features in the help
system to locate the information you need.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 221
Web Console Layout
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 223
Reviewing Web Console Views
Note Review the Java Plug-in requirements for the browser to ensure you can view the
console properly after logging in to a cluster.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 225
Reviewing Web Console Views
◆ Use the VCS Users table to change passwords and privileges, and delete users.
◆ Use the links in the left pane to add users, change passwords, open and save the
configuration, and access information on resource types.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 227
Reviewing Web Console Views
Preferences Page
The Preferences page enables you to select the appropriate refresh mode to refresh the
data automatically or manually, or to disable the update notification. The refresh mode
icon in the top left corner of most views alters its appearance according to the mode
selected; you can also click the icon to change the modes.
myVCS Page
The myVCS page enables you to view consolidated information on specific service
groups, resources, systems, and logs without viewing the entire cluster. This page is
particularly useful in large configurations where searching for specific information can be
difficult and time-consuming. Using the myVCS wizard, you can select the contents and
define the format of the HTML page to create a personalized view of the cluster.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 229
Reviewing Web Console Views
◆ Click a service group name in the left table column for details about the group. Use
the links in the left pane to add and delete service groups, and open and save the
cluster configuration. You can also access information on group dependencies, user
privileges, and resource types.
◆ For global clusters, use the Global Groups Wizard link to configure a global service
group.
From the left pane of the page, use the Configuration links to add and delete service
groups. Use the Related links to monitor users and resource types. For global clusters, you
can access the Global Groups Wizard from this pane.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 231
Reviewing Web Console Views
Systems Page
The Systems page displays the status of systems in the cluster and lists the online, faulted,
and partial service groups on the systems. The value of the UpDownState attribute is
displayed in brackets when the system is UP BUT NOT IN CLUSTER MEMBERSHIP.
◆ Click a service group or system name link for details about the group or system.
◆ Use the links in the left pane to access information on system heartbeats, user
privileges, and resource types.
◆ Click the name of a resource type for details about the type.
◆ Use the active link in the left pane to manage user information.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 233
Reviewing Web Console Views
◆ Click Show Details for information on the scope and dimension of each attribute.
Click Hide Details to return to the default view.
◆ Use the links in the left pane to manage users and resource types.
This page enables you to edit some attributes. Refer to the appendix “VCS Attributes”
on page 613 for descriptions of VCS attributes.
Logs Page
The Logs page displays log messages generated by the VCS engine HAD. An Alert
notification appears when the cluster has pending alert messages that may require
administrative action for faulted global clusters and failed service group failover
attempts.
By default, each log view displays 10 messages that include the log type, ID, time, and
details of an event. The icon in the first column of the table indicates the severity level of
the message.
◆ Click Hide IDs and Show IDs to alter the view of the message ID numbers.
◆ Use the log type and search filters to customize this page.
◆ Use the links in the left pane to monitor alerts, users, and resource types.
Note To ensure the time stamp for an engine log message is accurate, make sure to set the
time zone of the system running the Web Console to the same time zone of the
system running the VCS engine.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 235
Reviewing Web Console Views
Alerts Page
The Web Console sends automatic alerts that require administrative action and are
displayed on the Alerts page. Use this page to monitor alerts, take action on a cluster
fault, or delete the alert.
◆ If the alert warns that a local group cannot fail over to any system in the local cluster,
the user cannot take action.
◆ If the alert warns that a global group cannot fail over, the action involves bringing the
group online on another system in the global cluster environment. (Note: This
requires the Global Cluster Option.)
◆ If the alert warns that a global cluster is faulted, the action involves declaring the
cluster as a disaster, disconnect, or outage, and determining the service groups to fail
over to another cluster. Use the Alerts page to complete this operation. (Note: This
requires the Global Cluster Option.)
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 237
Reviewing Web Console Views
◆ Click a system name to access details on that system. For the entire selection of
attributes associated with the service group, click All Attributes above the Important
Attributes table.
◆ For users running the VERITAS Cluster Server Traffic Director Web Console, click
Traffic Director in the top right corner of the content pane to access that console.
◆ Use the links in the left pane to execute group operations, add and delete resources,
open and save configurations, and manage users, resource types, and group
dependencies.
◆ For global clusters, use the Global Groups Wizard link to configure a global service
group.
System Page
The System page displays the system state and major attributes of a specific system. Use
this page to review the status of service groups configured on the system and relevant log
messages.
◆ Click a service group name to access details on that group. For the entire selection of
attributes associated with the system, click All Attributes.
◆ Use the links in the left pane to freeze and unfreeze systems, and to manage users and
resource types.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 239
Reviewing Web Console Views
Use the links in the left pane to manage users and resource types.
◆ For the entire selection of attributes associated with the resource type, click All
Attributes.
◆ Use the Configuration links in the left pane to open and save the cluster
configuration.
◆ Use the Related Links to manage users and resource types.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 241
Reviewing Web Console Views
Resource Page
The Resource page displays information about the status of a resource on the cluster and
on a specified system. Use this page to view attributes, including overridden attributes,
and log messages for a resource. A resource is a hardware or software component. VCS
controls resources by starting (bringing them online), stopping (taking them offline), and
monitoring the state of the resources.
◆ To view a resource dependency graph and status for a particular system, click the
system name in the Systems list.
◆ To access a resource page, click the appropriate resource icon in the dependency
graph.
◆ Use the links in the left pane to view the dependencies between resources in tabular
format, and to manage users and resource types.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 243
Reviewing Web Console Views
◆ To access a resource page, click the appropriate resource name in the dependency
table.
◆ Use the links in the left pane to view the dependencies between resources in graphical
format, and to manage users and resource types.
The Web Console enables you to view information on heartbeats for global clusters. You
can view a heartbeat after adding a remote cluster to the heartbeat cluster list.
The heartbeat summary displays the heartbeat type (Icmp and IcmpS), heartbeat state
with respect to the local cluster, and status of the Icmp or IcmpS agents. ICMP heartbeats
send ICMP packets simultaneously to all IP addresses; ICMPS heartbeats send individual
ICMP packets to IP addresses in serial order.
◆ Use the Global Clusters links in the left pane to add, delete, and modify global
heartbeats.
◆ Use the Related Links to manage users and resource types.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 245
Administering Users
Administering Users
The Web Console enables a user with Cluster Administrator privileges to add, modify,
and delete user profiles. Administrator and Operator privileges are separated into the
cluster and group levels.
Adding a User
1. From the VCS Users page, click Add User in the left pane.
b. Enter the password and confirm it.Select the check box next to the appropriate
privilege.
d. Click OK.
Deleting a User
1. From the VCS Users page, select the X check box in the Delete User column.
2. Click Yes.
Changing a Password
▼ To change a password
1. From the VCS Users page, click Change Password in the left pane.
or
From the VCS Users page, select the Edit (...) icon in the Change Password column.
d. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 247
Administering Users
Modifying a Privilege
1. From the VCS Users page, select the ... check box in the Modify Privileges column.
a. Select the check box next to the appropriate privileges. If you select Group
Administrator or Group Operator, proceed to step 2b. Otherwise, proceed to step
2c.
b. From the active Available Groups box, select the applicable group, and click the
right arrow key to move it to the Selected Groups box. Repeat this for every
group that applies to the specified privilege.
c. Click OK.
▼ To open a configuration
1. On a page (such as the Cluster Summary page) that includes Configuration links in
the left pane, click Open Configuration.
2. Click OK.
1. On a page (such as the Cluster Summary page) that includes Configuration links in
the left pane, click Save Configuration.
2. Select the check box to prevent any write operations to the configuration file.
3. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 249
Administering Service Groups
b. Select the Add Membership check box next to the systems that you want to add
to the service group’s system list.
c. Click the Startup check box if you want the service group to start automatically
on the system.
d. Enter the priority number (starting with 0) to indicate the order of systems on
which the service group will start in case of a failover.
e. Click Next to add resources to the service group and proceed to step 3.
or
Click Finish to add the service group but configure resources at a later time.
Click Manually define resources to manually add resources and attributes to the
service group configuration. Proceed to step 4.
or
Click Use templates to load templates for service group configurations. Proceed to
step 5.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 251
Administering Service Groups
c. If necessary, clear the Critical or Enabled check boxes; these options are selected
by default. A critical resource indicates the service group is faulted when the
resource, or any resource it depends on, faults. An enabled resource indicates
agents monitor the resource. If a resource is created dynamically while VCS is
running, you must enable the resource before VCS monitors it. VCS will not bring
a disabled resource nor its children online, even if the children are enabled.
d. Click the edit icon (...) to edit an attribute for the selected resource type. After
editing the attribute, click Save in the Edit Attribute dialog box to return to the
Add Resource dialog box.
e. Click New Resource to save the resource to the resource list in the left pane of the
dialog box. Make changes to the attributes of the other resources in the group by
clicking the resource name in the resource list.
f. Click Finish.
b. Click Next.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 253
Administering Service Groups
a. If necessary, clear the Critical or Enabled check boxes; these options are selected
by default. A critical resource indicates the service group is faulted when the
resource, or any resource it depends on, faults. An enabled resource indicates
agents monitor the resource. If a resource is created dynamically while VCS is
running, you must enable the resource before VCS monitors it. VCS will not bring
a disabled resource nor its children online, even if the children are enabled.
b. Click the edit icon (...) to edit an attribute for the selected resource type. After
editing the attribute, click Save in the Edit Attribute dialog box to return to the
Resource List dialog box.
c. If necessary, view and change the attributes for the other resources in the group
by clicking the resource name in the left pane of the dialog box.
d. Click Finish.
b. Click OK.
a. Select the system on which to bring the service group online, or click Anywhere.
b. To run a PreOnline script, select the Run preonline script check box. This
user-defined script checks for external conditions before bringing a group online.
c. Click OK.
a. For parallel groups, select the system on which to take the service group offline,
or click All Systems.
b. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 255
Administering Service Groups
1. From the Service Group page, click Switch in the left pane.
b. Click OK.
1. From the Service Group page, click Freeze in the left pane.
a. If necessary, choose Persistent to enable the service group to retain its frozen state
when the cluster is rebooted.
b. Click OK.
1. From the Service Group page, click Unfreeze in the left pane.
1. From the Service Group page, click Flush in the left pane.
b. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 257
Administering Service Groups
1. From the Service Group page, click Enable in the left pane.
a. Select the system on which to enable the service group. To enable the service
group on all systems, click All Systems.
b. Click OK.
1. From the Service Group page, click Disable in the left pane.
a. Select the system on which to disable the service group. To disable the service
group on all systems, click All Systems.
b. Click OK.
1. From the Service Group page, click Autoenable in the left pane.
b. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 259
Administering Service Groups
a. Select the service group that will serve as the “child” group.
c. Select the relationship type and location. See “Categories of Service Group
Dependencies” on page 387 for information on service group dependencies.
In an Online group dependency, the parent group must wait for the child group
to be brought online before it can start. In an Offline group dependency, the
parent group can be started only if the child group is offline on the system, and
vice versa.
d. Click OK.
a. Select the name of the service group to disconnect from the dependency.
b. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 261
Administering Service Groups
1. From the Service Group page, click Modify SystemList in the left pane.
b. Select the Startup check box if you want the service group to automatically start
on the system.
d. Click OK.
1. From the Service Group page, click Clear Fault in the left pane.
a. Select the system on which to clear the service group. To clear the group on all
systems, click All Systems.
b. Click OK.
Administering Resources
The Web Console enables you to perform several operations through the Resource page.
Use this page to bring resources online and take them offline, take parent and child
resources offline, clear or probe resources, refresh the ResourceInfo attribute, invoke the
action entry point, enable and disable individual resources, and create and remove
resource dependencies.
Links for enabling and disabling all resources in a group, and adding and deleting them,
are available from the Service Group page.
b. Click OK.
b. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 263
Administering Resources
1. From the Resource page, click Offline Propagate in the left pane.
a. Select the system on which to take the resource and all of its child resources
offline.
b. Click OK.
1. From the Resource page, click Override Attribute in the left pane.
3. Click OK.
The selected attributes appear in the Overridden Attributes table.
4. To modify the default value of an overridden attribute, click the icon in the Edit
column of the attribute.
1. From the Resource page, click Remove Attribute Overrides in the left pane.
3. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 265
Administering Resources
1. From the Resource page, click Clear Fault in the left pane.
a. Select the system on which to clear the resource. To clear the resource on all
systems, click All Systems.
b. Click OK.
Probing a Resource
Probe a resource to check that it is configured and ready to bring online.
b. Click OK.
Enabling a Resource
Enable a resource in a service group to bring a disabled resource online. A resource may
have been manually disabled to temporarily stop VCS from monitoring the resource.
▼ To enable a resource
Disabling a Resource
Disable a resource in a service group to prevent it from coming online. This disabling
process is useful when you want VCS to temporarily “ignore” a resource (rather than
delete it) while the service group is still online.
▼ To disable a resource
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 267
Administering Resources
1. From the Service Group page, click Enable Resources in the left pane.
1. From the Service Group page, click Disable Resources in the left pane.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 269
Administering Resources
c. If necessary, clear the Critical or Enabled check boxes; these options are selected
by default.
A critical resource indicates the service group is faulted when the resource, or any
resource it depends on, faults. An enabled resource indicates agents monitor the
resource. If a resource is created dynamically while VCS is running, you must
enable the resource before VCS monitors it. VCS will not bring a disabled resource
nor its children online, even if the children are enabled.
d. Click the edit icon (...) to edit an attribute for the selected resource type. After
editing the attribute, click Save in the Edit Attribute dialog box to return to the
Add Resource dialog box.
e. Click OK.
b. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 271
Administering Resources
Linking Resources
Link resources from the Resource Dependency page or the Resource page.
c. Click OK.
b. Click OK.
Unlinking Resources
Unlink resources from the Resource Dependency page or the Resource page.
c. Click OK.
b. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 273
Administering Resources
a. Select the predefined action to execute. Some examples of preset actions are
displayed on the menu above.
c. Enter an action argument and click Add. Click the Delete icon (x) to delete the
argument.
d. Click OK.
1. From the Resource page, click Refresh ResourceInfo in the left pane.
b. Click OK.
3. From the Resource page, click All Attributes above the Important Attributes table to
view the latest information on the ResourceInfo attribute.
1. From the Resource page, click Clear ResourceInfo in the left pane.
a. Select the system on which to reset the parameters of the ResourceInfo attribute.
b. Click OK.
3. From the Resource page, click All Attributes above the Important Attributes table to
verify the information on the ResourceInfo attribute.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 275
Administering Systems
Administering Systems
The Web Console enables you to freeze and unfreeze systems. From the System page,
freeze a system to stop all online and offline operations on the system.
Freezing a System
Freeze a system to prevent its components from failing over to another system. Use this
procedure during a system upgrade.
▼ To freeze a system
a. If necessary, choose Persistent to enable the system to retain its frozen state when
the cluster is rebooted.
b. Select the Evacuate check box to fail over the system’s active service groups to
another system in the cluster before the freezing operation takes place.
c. Click OK.
Unfreezing a System
Unfreeze a frozen system to perform online or offline operations on the system.
▼ To unfreeze a system
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 277
Editing Attributes
Editing Attributes
The Web Console enables you to edit attributes of certain cluster objects, including service
groups, systems, resources, and resource types. Make sure the configuration is open (in
read/write mode) before editing attributes. By default, the console displays key
attributes. To view the entire list of attributes associated with a cluster object, click All
Attributes.
Changes to certain attributes, such as a webip attribute, may involve taking the service
group offline, modifying the configuration file, and bringing the group online. (VERITAS
recommends using the command line to edit attributes that are specific to the Web
Console.)
Note VERITAS does not recommend editing the value of the UserStrGlobal attribute for
the ClusterService group. The VCS Web Console uses this attribute for
cross-product navigation.
▼ To edit an attribute
1. Navigate to the page containing the attributes you want to edit. For example, to edit
system attributes, go to the System page.
2. In the Important Attributes table, click the edit icon (...) for the attribute.
or
Click All Attributes above the Important Attributes table, and click the edit icon (...)
for the attribute you want to modify.
or
Click All Attributes on the Cluster Summary page, and click the edit icon (...) for the
attribute you want to modify.
b. Click OK
1. After logging on to a cluster, click Query in the top right corner of any view.
b. Click the appropriate filters from the menus to query the object. Certain queries
allow the user to enter specific information.
c. If necessary, click + to add a subquery. Click “and” or “or” for each subquery. To
remove the last subquery, click –.
d. Click Search. Results are displayed in tabular format, including the date and time
the query was run.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 279
Customizing the Web Console with myVCS
Creating myVCS
1. After logging into a cluster, click myVCS in the top right corner of any page.
b. Click Next.
b. Click Next.
b. Click Next.
a. Select the check box to make the “myVCS” view the default page in the Web
Console instead of the Cluster Summary page.
b. Click Next.
7. Click Close Window. The customized myVCS view is displayed in the console.
Modifying myVCS
From the myVCS page, click Modify myVCS and follow the instructions in “Creating
myVCS” on page 280.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 281
Customizing the Log Display
2. Click Search.
1. From the left pane, select the check box next to each log type that you want to view on
the page.
2. Enter the amount of time (hours, days, or months) that you want the logs to span.
3. Click Apply.
Monitoring Alerts
Alerts are generated when a local group cannot fail over to any system in the local cluster,
a global group cannot fail over, or a cluster fault takes place. A current alert will also
appear as a pop-up window when you log on to a cluster through the console.
c. Click No if you do not want to switch all the global groups from the selected
cluster to the local cluster.
d. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 283
Monitoring Alerts
▼ To delete an alert
1. From the Alerts page, click the X icon in the Delete column of the Alerts table.
b. Click OK.
Chapter 8, Administering the Cluster from Cluster Manager (Web Console) 285
Integrating the Web Console with VERITAS Traffic Director
287
Configuring Application Service Groups Using the Wizard
Prerequisites
✔ Make sure that the applications are not configured in any other service group.
✔ Verify the directories on which the applications depend reside on shared disks and are
mounted.
✔ Verify the mount points on which the applications depend are not configured in any
other service group.
✔ Verify the virtual IP addresses on which applications depend are up. Verify the IP
addresses are not configured in other service groups.
✔ Make sure the executable files required to start, stop, monitor, and clean (optional) the
application reside on all nodes participating in the service group:
◆ StartProgram: The executable, created locally on each node, that starts the
application.
◆ StopProgram: The executable, created locally on each node, that stops the
application.
◆ CleanProgram: The executable, created locally on each node, that forcibly stops
the application.
◆ You can monitor the application in the following ways:
◆ Specify the program that will monitor the application.
◆ Specify a list of processes to be monitored and cleaned.
◆ Specify a list of pid files that contain the process ID of the processes to be
monitored and cleaned. These files are application-generated files. Each PID
file contains one PID which will be monitored.
◆ All or some of the above.
3. In the Wizard Options dialog box, select to create a new service group or modify an
existing group.
If you chose to modify an existing service group, select the service group.
In the Modify Application Service Group mode, you can add, modify, or delete
applications in the service group. If the service group is offline, you can also modify
the Mount, IP and NIC resources.
Click Next.
4. In the Service Group Configuration dialog box, specify the service group name and
the system list.
b. In the Available Cluster Systems box, select the systems on which to configure
the service group and click the right-arrow icon to move the systems to the service
group’s system list. Note that the agent must be configured on the local system.
To remove a system from the service group’s system list, select the system in the
Systems in Priority Order box and click the button with the left-arrow icon.
c. To change a system’s priority in the service group’s system list, select the system
in the Systems in Priority Order box and click the buttons with the up and down
arrow icons. The system at the top of the list has the highest priority while the
system at the bottom of the list has the lowest priority.
d. Click Next.
To create an application:
c. Click Next.
To modify an application:
Note Choose the Configure Application Dependency option only after you have
finished with adding, modifying, or deleting applications.
6. In the Application Details dialog box, specify information about the executables used
to start, stop, and clean the application.
a. Specify the locations of the Start, Stop, and Clean (optional) programs along with
their parameters. You must specify values for the Start and Stop programs.
b. Select the user in whose context the programs will run. Click Discover Users if
some users were added after starting the wizard.
c. Click Next.
7. In the Monitor Options dialog box, specify information about how the application will
be monitored.
Specify at least one of the MonitorProgram, Pid Files, or MonitorProcesses attributes. You can
specify some or all of these.
a. Specify the complete path of the monitor program with parameters, if any. You
can browse to locate files.
b. Click the corresponding (+) or (-) buttons to add or remove Pid files or monitor
processes.
d. Click Next.
8. In the Mount Configuration dialog box, configure the Mount resources for the
applications.
a. Select the check boxes next to the mount points to be configured in the
Application service group. Click Discover Mounts to discover mounts created
after the wizard was started
Note that the wizard discovers all Mount points that existed when you started the
wizard. For example, if you delete a Mount point after starting the wizard and
click Discover Mounts, the wizard displays the deleted Mount points on the page.
b. Specify the Mount and Fsck options, if applicable. The agent uses these options
when bringing the resource online.
c. If using the vxfs file system, select the SnapUmount check box to take the
MountPoint snapshot offline when the resource is taken offline.
d. Select the Create mount points on all systems if they do not exist check box, if
desired.
e. Click Next.
9. In the Network Configuration dialog box, configure the IP and NIC resources for the
application.
a. Select the Does application require virtual IP? check box, if required.
b. From the Virtual IP Address list, select the virtual IP for the service group. Click
Discover IP to discover IP addresses configured after wizard was started.
Note that the wizard discovers all IP addresses that existed when you started the
wizard. For example, if you delete an IP address after starting the wizard and
click Discover IP, the wizard displays the deleted IP addresses in the Virtual IP
Address list.
c. For each system, specify the associated ethernet. Click Discover NIC, if required.
d. Click Next.
10. In the Completing the Application Configuration dialog box, select the Configure
more applications check box to configure more applications in the service group.
Click Next.
Note If you choose to configure more applications, the wizard displays the Application
Options dialog box. See step 5 on page 290 for instructions on how to configure
applications. If you do not choose to configure more applications, the wizard
displays the Application Dependency dialog box.
11. The Application Dependency dialog box is displayed if you chose to configure
application dependencies.
a. From the Select Application list, select the application to be the parent.
b. From the Available Applications box, click on the application to be the child.
Note Make sure that there is no circular dependency between the applications.
c. Click the button with the right-arrow icon to move the selected application to the
Child Applications box. To remove an application dependency, select the
application in the Child Applications box and click the button with the left-arrow
icon.
d. Click Next.
12. In the Service Group Summary dialog box, review your configuration and change
resource names, if desired.
The left pane lists the configured resources. Click on a resource to view its attributes
and their configured values in the Attributes box.
To edit a resource name, select the resource name and click on it. Press Enter after
editing each name. Note that when modifying service groups, you can change names
of newly created resources only, which appear in black.
Click Next. The wizard starts running commands to create (or modify) the service
group.
13. In the Completing the Application Configuration Wizard dialog box, select the Bring
the service group online check box to bring the group online on the local system.
Click Close.
Prerequisites
✔ Verify the paths to be shared are exported.
✔ Verify the paths to be shared are mounted and are not configured in any other service
group.
✔ Verify the virtual IP to be configured is up and is not configured in any other service
group.
✔ To configure lock recovery, make sure the path where the NFS server stores lock
information (/var/lib/nfs) is on a shared disk and the path is mounted.
3. In the Wizard Options dialog box, select to create a new NFS service group or modify
an existing group.
The wizard supports only one NFS service group in the configuration. If you have an
NFS service group in your configuration, the wizard disables the Create NFS Service
Group option and enables the Modify NFS Service Group option.
If you choose to modify a service group, you can add and remove shares from the
service group. If the service group is offline, you can also modify the IP and NIC
resources.
Click Next.
4. In the Service Group Configuration dialog box, specify the service group name and
the system list.
b. In the Available Cluster Systems box, select the systems on which to configure
the service group and click the right-arrow icon to move the systems to the service
group’s system list.
To remove a system from the service group’s system list, select the system in the
Systems in Priority Order box and click the button with the left-arrow icon.
c. To change a system’s priority in the service group’s system list, select the system
in the Systems in Priority Order box and click the buttons with the up and down
arrow icons. The system at the top of the list has the highest priority while the
system at the bottom of the list has the lowest priority.
d. Click Next.
The shared disk path where the NFS server stores lock information (/var/lib/nfs)
must be mounted. If you configured the NFS server after starting the wizard, click
Discover NFS.
a. To configure lock recovery, select Yes from the Lock Recovery list.
b. Select a value for the Grace Period. The GracePeriod attribute defines number of
seconds the NFS server waits for its clients to reclaim their locks.
c. Click Next.
6. In the Share Paths dialog box, select the shares to be configured in the service group
and click Next.
The wizard displays the shared paths whose mount points do not appear in the file
/etc/fstab. If the path to be configured does not appear in the list, make sure the
path is shared and click Discover Shares.
7. In the Mount Configuration dialog box, configure Mount resources for the shares.
a. Specify the Mount and Fsck options, if applicable. The agent uses these options
when bringing the resource online.
b. If using the vxfs file system, select the SnapUmount check box to take the
MountPoint snapshot offline when the resource is taken offline.
c. Select the Create mount points on all systems if they do not exist check box, if
desired.
d. Click Next.
8. In the Network Configuration dialog box, configure the IP and NIC resources for the
shares.
c. Click Next.
9. In the Service Group Summary dialog box, review your configuration and change
resource names, if desired.
The left pane lists the configured resources. Click on a resource to view its attributes
and their configured values in the Attributes box.
To edit a resource name, select the resource name and click on it. Press Enter after
editing each name. Note that when modifying service groups, you can change names
of newly created resources only, which appear in black.
Click Next. The wizard starts running commands to create (or modify) the service
group.
10. In the Completing the NFS Configuration Wizard dialog box, select the Bring the
service group online check box to bring the service group online on the local system.
Click Close.
Prerequisites
✔ Make sure the Apache servers to be added to the configuration are running and are
bound to the virtual IP address.
✔ Verify the virtual IP to be configured is up and is not configured in any other service
group.
✔ Make sure the DocumentRoot for the Apache server resides on a shared disk, which is
not configured in another service group.
✔ Change the directory structure for ServerRoot as instructed in the next section.
▼ To change the directory structure for httpd installed with Red Hat distributions
3. In the Wizard Options dialog box, select to create a new service group or modify an
existing group.
If you choose to modify a service group, you can add and remove Apache servers
from the service group. If the service group is offline, you can also modify the IP and
NIC resources.
Click Next.
4. In the Service Group Configuration dialog box, specify the service group name and
the system list.
b. In the Available Cluster Systems box, select the systems on which to configure
the service group and click the right-arrow icon to move the systems to the service
group’s system list.
To remove a system from the service group’s system list, select the system in the
Systems in Priority Order box and click the button with the left-arrow icon.
c. To change a system’s priority in the service group’s system list, select the system
in the Systems in Priority Order box and click the buttons with the up and down
arrow icons. The system at the top of the list has the highest priority while the
system at the bottom of the list has the lowest priority.
d. Click Next.
5. In the Apache Configuration dialog box, select the servers to be configured in the
service group and click Next.
a. Select the check box next to the Apache server to be configured in the service
group.
b. To enable detail monitoring, select the virtual IP address and port number to be
used for detailed monitoring and select the Detail Monitor check box for the
Apache server.
Note that all Apache servers must listen to one virtual IP and the virtual IP
address must not be configured in another service group.
c. Click Next.
6. In the Network Configuration dialog box, configure the IP and NIC resources for the
service group.
b. For each system, specify the associated ethernet. If the ethernet card for a system
does not appear in the list, click Discover NIC.
c. Click Next.
7. In the Mount Configuration dialog box, configure Mount resources for the service
group.
a. Specify the Mount and Fsck options, if applicable. The agent uses these options
when bringing the resource online.
b. If using the vxfs file system, select the SnapUmount check box to take the
MountPoint snapshot offline when the resource is taken offline.
c. Select the Create mount points on all systems if they do not exist check box, if
desired.
d. Click Next.
8. In the Service Group Summary dialog box, review your configuration and change
resource names, if desired.
The left pane lists the configured resources. Click on a resource to view its attributes
and their configured values in the Attributes box.
To edit a resource name, select the resource name and click on it. Press Enter after
editing each name. Note that when modifying service groups, you can change names
of newly created resources only, which appear in black.
Click Next. The wizard starts running commands to create (or modify) the service
group.
9. In the Completing the Apache Configuration Wizard dialog box, select the Bring the
service group online check box to bring the service group online on the local system.
Click Finish.
◆ Chapter 10. “VCS Communications, Membership, and I/O Fencing” on page 311
Intra-Node Communication
Within a node, the VCS engine (HAD) uses a VCS-specific communication protocol
known as Inter Process Messaging (IPM) to communicate with the GUI, the command
line, and the agents. The following illustration shows basic communication on a single
VCS node. Note that agents only communicate with HAD and never communicate with
each other.
AGENT COMMAND-LINE
UTILITIES
AGENT GUI
311
Intra-Node Communication
AGENT-SPECIFIC CODE
AGENT FRAMEWORK
Status Control
HAD
The agent uses the Agent framework, which is compiled into the agent itself. For each
resource type configured in a cluster, an agent runs on each cluster node. The agent
handles all resources of that type. The engine passes commands to the agent and the agent
returns the status of command execution. For example, an agent is commanded to bring a
resource online. The agent responds back with the success (or failure) of the operation.
Once the resource is online, the agent communicates with the engine only if this status
changes.
Inter-Node Communication
VCS uses the cluster interconnect for network communications between cluster nodes.
The nodes communicate using the capabilities provided by LLT and GAB.
The LLT module is designed to function as a high performance, low latency replacement
for the IP stack and is used for all cluster communications. LLT provides the
communications backbone for GAB. LLT distributes, or load balances inter-node
communication across up to eight interconnect links. When a link fails, traffic is redirected
to remaining links.
The Group Membership Services /Atomic Broadcast module is responsible for reliable
cluster communications. GAB provides guaranteed delivery of point-to-point and
broadcast messages to all nodes. The Atomic Broadcast functionality is used by HAD to
ensure that all systems within the cluster receive all configuration change messages, or are
rolled back to the previous state, much like a database atomic commit. If a failure occurs
while transmitting a broadcast message, GAB's atomicity ensures that, upon recovery, all
systems have the same information. The VCS engine uses a private IOCTL (provided by
GAB) to tell GAB that it is alive.
The following diagram illustrates the overall communications paths.
Kernel Kernel
GAB GAB
Space Space
LLT LLT
Cluster Membership
Cluster membership implies that the cluster must accurately determine which nodes are
active in the cluster at any given time. In order to take corrective action on node failure,
surviving nodes must agree on when a node has departed. This membership needs to be
accurate and must be coordinated among active members. This becomes critical
considering nodes can be added, rebooted, powered off, faulted, and so on. VCS uses its
cluster membership capability to dynamically track the overall cluster topology. Cluster
membership is maintained through the use of heartbeats.
LLT is responsible for sending and receiving heartbeat traffic over network links. Each
node sends heartbeat packets on all configured LLT interfaces. By using an LLT ARP
response, each node sends a single packet that tells all other nodes it is alive, as well as the
communications information necessary for other nodes to send unicast messages back to
the broadcaster.
LLT can be configured to designate specific links as high priority and others as low
priority. High priority links are used for cluster communications (GAB) as well as
heartbeat. Low priority links only carry heartbeat unless there is a failure of all configured
high priority links. At this time, LLT switches cluster communications to the first available
low priority link. Traffic reverts to high priority links as soon as they are available.
LLT passes the status of the heartbeat to the Group Membership Services function of GAB.
When LLT on a system no longer receives heartbeat messages from a peer on any
configured LLT interface for a predefined time, LLT informs of the heartbeat loss for that
system. GAB receives input on the status of heartbeat from all nodes and makes
membership determination based on this information. When LLT informs GAB of a
heartbeat loss, GAB marks the peer as DOWN and excludes the peer from the cluster. In
most configurations, the I/O fencing module is utilized to ensure there was not a partition
or split of the cluster interconnect. Once the new membership is determined, GAB informs
processes on the remaining nodes that the cluster membership has changed. VCS then
carries out failover actions to recover.
✗
Node failure and loss of
cluster interconnect result in
no heartbeats being received
from node 4.
✗
This situation can also arise in other scenarios. If a system were so busy as to appear hung,
it would be declared dead. This can also happen if the hardware supports a break and
resume function. Dropping the system to prom (system controller) level with a break and a
subsequent resume means the system could be declared as dead and the cluster reformed.
The system could then return and start writing to the shared storage.
Preventing Split-brain
This section describes the strategies that could be used to prevent split-brain.
Coordinator Disks
VCS uses special-purpose disks, called coordinator disks, for I/O fencing during cluster
membership change. These are three standard disks or LUNs, which together act as a
global lock device during a cluster reconfiguration. VCS uses this lock mechanism to
determine which nodes remain in a cluster and which node gets to fence off data drives
from other nodes.
Coordinator Disks
Coordinator disks cannot be used for any other purpose in the VCS configuration. You
cannot store data on these disks or include the disks in a disk group used by user data.
Any disks that support SCSI-III Persistent Reservation can serve as coordinator disks.
VERITAS recommends the smallest possible LUNs for coordinator use.
Fencing Module
Each system in the cluster runs a kernel module called vxfen, or the fencing module. This
module works to maintain tight control on cluster membership. It is responsible for the
following actions:
◆ Registering with the coordinator disks during normal operation.
◆ Racing for control of the coordinator disks during membership changes.
Data Disks
Data disks are standard disk devices used for data storage. These can be physical disks or
RAID Logical Units (LUNs). These disks must support SCSI-III PR. Data disks are
incorporated in standard disk groups managed using VERITAS Volume Manager.
The VCS DiskGroup agent is responsible for fencing failover disk groups and Cluster
Volume Manager (CVM) handles any shared CVM disk groups.
Membership Arbitration
I/O fencing uses the fencing module and coordinator disks for membership control in a
VCS cluster. With fencing, when a membership change occurs, members of any surviving
cluster race for exclusive control of the coordinator disks to lock out any other potential
cluster. This ensures that only one cluster is allowed to survive a membership arbitration
in the event of an interconnect failure.
Let us take the example of a two-node cluster. If node 0 loses heartbeat from node 1, node
1 attempts to gain exclusive control of the coordinator disks. Node 0 makes no
assumptions that node 1 is down, and races to gain control of the coordinator disks. Each
node attempts to eject the opposite cluster from membership on the coordinator disks.
The node that ejects the opposite member and gains control over a majority of the
coordinator disks wins the race. The other node loses and must shut down.
The following illustration depicts the sequence in which these operations take place.
AGENT AGENT
LLT
First, on node 0, LLT times out the heartbeat from node 1 (16 seconds by default), GAB is
informed of a heartbeat failure. GAB then determines that a membership change is
occurring. After the “GAB Stable Timeout” (5 seconds), GAB delivers the membership
change to all registered clients. In this case, HAD and I/O fence.
HAD receives the membership change and requests the fencing module to arbitrate in
case of a split-brain scenario and waits for the race to complete.
The registration function of SCSI-III PR handles races. During normal startup, every
cluster member registers a unique key with the coordinator disks. To win a race for the
coordinator disks, a node has to eject the registration key of the node in question from a
majority of the coordinator disks.
If the I/O fencing module gains control of the coordinator disks, it informs HAD of
success. If the fencing module is unsuccessful, the node panics and reboots.
Data Protection
Simple membership arbitration does not guarantee data protection. If a node is hung or
suspended and comes back to life, it could cause data corruption before GAB and the
fencing module determine the node was supposed to be dead. VCS takes care of this
situation by providing full SCSI-III PR based data protection at the data disk level.
✗
Node 0 Node 1
Coordinator Disks
When the I/O fencing module successfully completes the race for the coordinator disks,
HAD can carry out recovery actions with assurance the node is down.
✗
succeeds Rereads keys
Because the fencing module operates identically on each system, both nodes assume the
other is failed, but carry out fencing operations to verify the same.
The GAB module on each node determines the peer has failed due to loss of heartbeat and
passes the membership change to the fencing module.
Each side races to gain control of the coordinator disks. Only a registered node can eject
the registration of another node, so only one side successfully completes the
preempt/abort command on each disk.
The fence driver is designed to delay if it loses a race for any coordinator disk. Since node
0 wins the first race, unless another failure occurs, it also wins the next two races.
The side that successfully ejects the peer from a majority of the coordinator disks wins.
The fencing module on the winning side then passes the membership change up to VCS
and other higher level packages registered with the fencing module. VCS can then take
recovery actions. The losing side calls kernel panic and reboots.
✗
cluster
3A. Node 0 wins and
broadcasts success to 3B. Node loses race
peers and panics
Coordinator Disks
3. Unless node 0 fails mid-race, it wins and gains control over the coordinator disks. The
three-node cluster remains running and node 3 shuts down.
✗
other nodes
1. The interconnect failure leads to nodes 0 and 1 being separated from nodes 2 and 3.
The cluster splits into two sub-clusters of the same size.
2. Both clusters wait for the same amount of time and begin racing. In this situation,
either side can win control of the first coordinator disk. In this example, node 0 wins
the first disk. Node 2 then delays by rereading the coordinator disks after losing the
first race. Consequently, node 0 gains control over all three coordinator disks.
3. After winning the race, node 0 broadcast its success to its peers. On the losing side,
node 2 panics because it has lost the race. The remaining members of the losing side
time out waiting for a success message and panic.
Coordinator Disks
1. All nodes lose heartbeats to all other nodes. Each LLT declares heartbeat loss to GAB,
and all GAB modules declare a membership change.
2. Each node is the lowest member of its own sub-cluster; each node races to acquire
control over the coordinator disks.
3. Node 0 acquires control over the first disk. Other nodes lose the race for the first disk
and reread the coordinator disks to pause before participating in the next race.
4. Node 0 acquires control over all three coordinator disks. Other nodes lose the race and
panic.
Note In the example, node 0 wins the race, and all other nodes panic. If no node gets a
majority of the coordinator disks, all nodes panic.
a. Read the file /etc/vxfendg to determine the name of the VERITAS Volume
Manager disk group containing the coordinator disks. See the VCS Installation
Guide for information on creating the coordinator disk group.
b. Use Volume Manager tools to determine the disks in the disk group and the paths
available to these disks.
a. The fencing driver first reads the serial numbers of the coordinator disks from the
file /etc/vxfentab and builds an in-memory list of these drives.
b. The driver then determines if it is the first node to start fencing. If other members
are up and operating on GAB port B, it asks for a configuration snapshot from a
running member. This is done to verify members in the cluster see the same
coordinator disks. Otherwise, the fencing driver enters an error state.
b. If any keys are present, verify the corresponding member can be seen in the
current GAB membership. If the member cannot be seen, the fencing driver
assumes the node starting up has been fenced out of the cluster due to a network
partition. The fencing driver prints a warning to the console and the system log
about a pre-existing network partition and does not start.
c. If the owners of the coordinator disks can be seen, or if no keys are seen on disk,
the fencing driver proceeds.
✗
interconnect severed. reboots
Node 0 wins
coordinator race
3B. Node 1 boots up.
Finds keys registered
3A. Node 0 has key for a non-member.
registered on all Prints error message
coordinator disks. and exits
Coordinator Disks
3. Node 0 has keys registered on the coordinator disks. When node 1 boots up, it sees the
node 0 keys, but cannot see node 0 in the current GAB membership. It senses a
potential preexisting split brain and causes the vxfen module to print an error
message to the console. The vxfen module prevents fencing from starting, which, in
turn, prevents VCS from coming online.
To recover from this situation, shut down node 1, reconnect the private interconnect,
and restart node 1.
✗ ✗
Node 0 Node 1
1. In the first failure, node 1 fails, and is fenced out by ejecting the node from the
coordinator disks.
3. When node 1 restarts, it sees the keys left behind by node 0, but cannot see node 0 in
the GAB membership. The fencing driver prints an error message.
4. The operator runs the vxfenclearpre utility to clear the keys left by node 0 after
physically verifying that node 0 is down. The operator then reboots node1, which
comes up normally.
Jeopardy
VCS without I/O fencing requires a minimum of two heartbeat-capable channels between
cluster nodes to provide adequate protection against network failure. When a node is
down to a single heartbeat connection, VCS can no longer reliably discriminate between
loss of a system and loss of the last network connection. It must then handle
communication loss on a single network differently than on multiple network. This
handling is called jeopardy.
GAB makes intelligent choices on cluster membership based on information about reliable
and unreliable links provided by LLT. If a system's heartbeats are lost simultaneously
across all channels, VCS determines that the system has failed. The services running on
that system are then restarted on another system. However, if the node had only one
heartbeat (that is, the node was in jeopardy), VCS does not restart the applications on a
new node. This action of disabling failover is a safety mechanism to prevent data
corruption. A system can be placed in a jeopardy membership if the system has only one
functional network heartbeat. In this situation, the node is a member of the regular
membership and the jeopardy membership, known as a regardy membership. VCS
continues to operate as a single cluster except that failover due to system failure is
disabled. Even after the last network connection is lost, VCS continues to operate as
partitioned clusters on each side of the failure.
Public Network
Regular membership: 0, 1, 2, 3
Public Network
A new cluster membership is issued with nodes 0, 1, 2, and 3 in the regular membership
and node 2 in a jeopardy membership. All normal cluster operations continue, including
normal failover of service groups due to resource faults.
Public Network
✗
Node 0 Node 1 Node 2 Node 3
All other systems recognize that node 2 has faulted. A new membership is issued for
nodes 0, 1 and 3 as regular members. Since node 2 was in a jeopardy membership, service
groups running on node 2 are autodisabled, so no other node can assume ownership of
these service groups. If the node is failed, the system administrator can clear the
AutoDisabled flag on the service groups and bring them online on other nodes in the
cluster.
Public Network
✗✗ Regular membership: 2
(Cluster 2)
In this situation, a new membership is issued for node 0, 1, and 3 as regular members.
Since node 2 was in a jeopardy membership, service groups running on node 2 are
autodisabled, so no other node can assume ownership of these service groups. Nodes 0, 1,
and 3 form a sub-cluster. Node 2 forms another single-node sub-cluster. All service groups
that were present on nodes 0, 1, and 3 are autodisabled on node 2.
Public Network
Public Network
The low priority link continues with heartbeat only. No jeopardy condition exists because
there are two links to determine system failure.
Public Network
If you reconnect a private network, all cluster status reverts to the private link and the low
priority link returns to heartbeat only. At this point, node 2 is placed back in normal
regular membership.
HAD
VCS Seeding
To protect your cluster from a pre-existing network partition, VCS employs the concept of
a seed. Systems can be seeded automatically or manually. Note that only systems that
have been seeded can run VCS.
By default, when a system comes up, it is not seeded. When the last system in a cluster is
booted, the cluster seeds and starts VCS on all systems. Systems can then be brought
down and restarted in any combination. Seeding is automatic as long as at least one
instance of VCS is running in the cluster.
Automatic seeding occurs in one of two ways:
◆ When an unseeded system communicates with a seeded system.
◆ When all systems in the cluster are unseeded and able to communicate with each
other.
VCS requires that you declare the number of systems that will participate in the cluster.
Seeding control is established via the /etc/gabtab file. GAB is started with the
command /sbin/gabconfig -c -n X. The variable X represents number of nodes in
the cluster.
To start a cluster with less than all nodes, first verify the nodes not to be in the cluster are
down, then start GAB using the command /sbin/gabconfig -c -x. This manually
seeds the cluster and allows VCS to start on all connected systems.
During initial startup, VCS autodisables a service group until all resources are probed for
the group on all systems in the SystemList that have GAB running. This protects against a
situation where enough systems are running LLT and GAB to seed the cluster, but not all
systems have HAD running.
2. Verify that the -q flag does not appear in the file /etc/gabtab.
Cleaning Resources
When a resource faults, VCS takes automated actions to “clean up” the faulted resource.
The Clean function makes sure the resource is completely shut down before bringing it
online on another node. This prevents concurrency violations.
Fault Propagation
When a resource faults, VCS takes all resources dependent on the faulted resource offline.
The fault is thus propagated in the service group
341
VCS Behavior on Resource Faults
R1 C R1 C R1 C R1 C
R2 R3 R2
✗ R3 R2
✗ R3 R2
✗ R3
R4 R5 R4 R5 R4 R5 R4 R5
When resource R2 faults, the fault is propagated up the dependency tree to resource R1.
When the critical resource R1 goes offline, VCS must fault the service group and fail it
over elsewhere in the cluster. VCS takes other resources in the service group offline in the
order of their dependencies. After taking resources R3, R4, and R5 offline, VCS fails over
the service group to another node.
R1 R1 R1
R2 R3 R2
✗ R3 R2
✗ R3
R4 R5 R4 R5 R4 R5
When resource R2 faults, the engine propagates the failure up the dependency tree.
Neither resource R1 nor resource R2 are critical, so the fault does not result in offlining the
tree or in service group failover.
R1 C R1 C R1 C R1 C
R2 R3 R2
✗ R3 R2
✗ R3 R2✗ R3
R4 R5 R4 R5 R4 R5 R4 R5
VCS calls the Clean function for resource R2 and propagates the fault up the dependency
tree. Resource R1 is set to critical, so the service group is taken offline and failed over to
another node in the cluster.
✗ Faulted
R1 C R1 C
R2 R3 R5
R2
W R3
R4 R5 R4 R5
If resource R2 fails, the resource is marked as ONLINE|ADMIN_WAIT. The Clean entry point
is not called for the resource. VCS does not take any other resource offline.
R1 C R1 C
R2 R3 R2
✗ R3
R4 R5 R4 R5
When resource R2 faults, the Clean entry point is called and the resource is marked as
faulted. The fault is not propagated up the tree, and the group is not taken offline.
RestartLimit Attribute
The RestartLimit attribute defines whether VCS attempts to restart a failed resource before
informing the engine of the fault.
If the RestartLimit attribute is set to a non-zero value, the agent attempts to restart the
resource before declaring the resource as faulted. When restarting a failed resource, the
agent framework calls the Clean entry point before calling the Online entry point.
However, setting the ManageFaults attribute to NONE prevents the Clean entry point
from being called and prevents the Online entry point from being retried.
OnlineRetryLimit Attribute
The OnlineRetryLimit attribute specifies the number of times the Online entry point is
retried if the initial attempt to bring a resource online is unsuccessful.
When the OnlineRetryLimit set to a non-zero value, the agent framework calls the Clean
entry point before rerunning the Online entry point. Setting the ManageFaults attribute to
NONE prevents the Clean entry point from being called and also prevents the Online
operation from being retried.
ConfInterval Attribute
The ConfInterval attribute defines how long a resource must remain online without
encountering problems before previous problem counters are cleared. The attribute
controls when VCS clears the RestartCount, ToleranceCount and
CurrentMonitorTimeoutCount values.
ToleranceLimit Attribute
The ToleranceLimit attribute defines the number of times the Monitor routine should
return an offline status before declaring a resource offline. This attribute is typically used
when a resource is busy and appears to be offline. Setting the attribute to a non-zero value
instructs VCS to allow multiple failing monitor cycles with the expectation that the
resource will eventually respond. Setting a non-zero ToleranceLimit also extends the time
required to respond to an actual fault.
FaultOnMonitorTimeouts Attribute
The FaultOnMonitorTimeouts attribute defines whether VCS interprets a Monitor entry
point timeout as a resource fault.
If the attribute is set to 0, VCS does not treat Monitor timeouts as a resource faults. If the
attribute is set to 1, VCS interprets the timeout as a resource fault and the agent calls the
Clean entry point to shut the resource down.
By default, the FaultOnMonitorTimeouts attribute is set to 4. This means that the Monitor
entry point must time out four times in a row before the resource is marked faulted.
◆ If the Monitor routine returns online (exit code = 110) during a monitor cycle, the
agent takes no further action. The ToleranceCount attribute is reset to 0 when the
resource is online for a period of time specified by the ConfInterval attribute.
If the resource is detected as being offline a number of times specified by the
ToleranceLimit before the ToleranceCount is reset (TC = TL), the resource is
considered failed.
◆ After the agent determines the resource is not online, VCS checks the Frozen attribute
for the service group. If the service group is frozen, VCS declares the resource faulted
and calls the resfault trigger. No further action is taken.
◆ If the service group is not frozen, VCS checks the ManageFaults attribute. If
ManageFaults=NONE, VCS marks the resource state as ONLINE|ADMIN_WAIT and
calls the resadminwait trigger. If ManageFaults=ALL, VCS calls the Clean entry point
with the CleanReason set to Unexpected Offline.
◆ If the Clean entry point fails (exit code = 1) the resource remains online with the state
UNABLE TO OFFLINE. VCS fires the resnotoff trigger and monitors the resource again.
The resource enters a cycle of alternating Monitor and Clean entry points until the
Clean entry point succeeds or a user intervenes.
◆ If the Clean entry point is successful, VCS examines the value of the RestartLimit (RL)
attribute. If the attribute is set to a non-zero value, VCS increments the RestartCount
(RC) attribute and invokes the Online entry point. This continues till the value of the
RestartLimit equals that of the RestartCount. At this point, VCS attempts to monitor
the resource.
◆ If the Monitor returns an online status, VCS considers the resource online and
resumes periodic monitoring. If the monitor returns an offline status, the resource is
faulted and VCS takes actions based on the service group configuration.
Flowchart
Resource
Online
CMTC =
Monitor YES CMTC + 1
Timeout?
NO YES
TC =
FOMT>
TC + 1
CMTC
Monitor 110
Exit Code NO
100 Resource
Manage NONE
Online | Admin_Wait.
Faults resadminwait Trigger
YES
TL>TC?
ALL
NO
Resource
online. TC not Resource
cleared Group YES Resource fails. online. CMTC
Frozen? Call Restart Trigger Resource not cleared
Going Offline
NO Wait
Resource Resource
Manage NONE Online | Admin_Wait.
Unable to Faults
Offline. resnotoff resadminwaitTrigger
Clean.
Trigger All “Monitor Hung”
Clean.
Unexpected
Offline
Resource
Clean NO
Unable to Offline.
NO Clean Success? resnotoff Trigger
Success?
YES
YES
NO Monitor 110
RL>RC? Monitor
Exit Code
Resource
YES 100
RC=RC+1 A B
Flowchart
A Resource
Offline
Online. Resource
Waiting to Online
Online YES
Timeout?
NO
OWL> YES
OWC? Resource
Offline | Admin_Wait
Manage NONE resadminwait Trigger
NO
Faults
ALL ORC=
ORC+1
Clean. Clean.
“Online Ineffective” “Online Hung”
NO Clean
Success?
YES
Reset OWC
ORL > NO
ORC? B
YES
Flowchart
Resource faults.
resfault Trigger
Offline all
resources in
dependent path
YES
Failover based on
FailOverPolicy
Disabling Resources
Disabling a resource means that the resource is no longer monitored by a VCS agent, and
that the resource cannot be brought online or taken offline. The agent starts monitoring
the resource after the resource is enabled. The resource attribute Enabled determines
whether a resource is enabled or disabled. (See “Resource Attributes” on page 614 for
details.) A persistent resource can be disabled when all its parents are offline. A
non-persistent resource can be disabled when the resource is in an OFFLINE state.
Note Disabling a resource is not an option when the entire service group requires
disabling. In that case, set the service group attribute Enabled to 0.
▼ To disable a resource
To disable the resource when VCS is running:
# hares -modify resource_name Enabled 0
To have the resource disabled initially when VCS is started, set the resource’s Enabled
attribute to 0 in main.cf.
Limitations
When VCS is running, there are certain prerequisites to be met before the resource is
disabled successfully.
✔ An online non-persistent resource cannot be disabled. It must be in a clean OFFLINE
state. (The state must be OFFLINE and IState must be NOT WAITING.)
✔ If it is a persistent resource and the state is ONLINE on some of the systems, all
dependent resources (parents) must be in clean OFFLINE state. (The state must be
OFFLINE and IState must be NOT WAITING)
Therefore, before disabling the resource you may be required to take it offline (if it is
non-persistent) and take other resources offline in the service group.
Additional Considerations
◆ When a group containing disabled resources is brought online, the online transaction
is not propagated to the disabled resources. Children of the disabled resource are
brought online by VCS only if they are required by another enabled resource.
◆ You can bring children of disabled resources online if necessary.
◆ When a group containing disabled resources is taken offline, the offline transaction is
propagated to the disabled resources.
The following figures show how a service group containing disabled resources is brought
online.
e
Resource_1
lin
✗
on
ng
oi
G
Resource_2 Resource_3
Resource_3 is disabled.
Resource_4
Resource_4 is offline.
Resource_5
Resource_5 is offline.
In the figure above, Resource_3 is disabled. When the service group is brought online, the
only resources brought online by VCS are Resource_1 and Resource_2 (Resource_2 is
brought online first) because VCS recognizes Resource_3 is disabled. In accordance with
online logic, the transaction is not propagated to the disabled resource.
In the figure below, Resource_2 is disabled. When the service group is brought online,
resources 1, 3, 4 are also brought online (Resource_4 is brought online first). Note
Resource_3, the child of the disabled resource, is brought online because Resource_1 is
enabled and is dependent on it.
Resource_1
Resource_3
✗
Resource_2
Resource_2 is disabled.
Going online
Resource_4
▼ To clear a resource
1. Take the necessary actions outside VCS to bring all resources into the required state.
2. Verify that resources are in the required state by issuing the command:
# hagrp -clearadminwait group -sys system
This command clears the ADMIN_WAIT state for all resources. If VCS continues to
detect resources that are not in the required state, it resets the resources to the
ADMIN_WAIT state.
3. If resources continue in the ADMIN_WAIT state, repeat step 1 and step 2, or issue the
following command to stop VCS from setting the resource to the ADMIN_WAIT state:
# hagrp -clearadminwait -fault group -sys system
This command has the following results:
◆ If the resadminwait trigger was called for reasons 0 or 1, the resource state is set as
ONLINE|UNABLE_TO_OFFLINE.
◆ If the resadminwait trigger was called for reasons 2, 3, or 4, the resource state is
set as FAULTED. Please note that when resources are set as FAULTED for these
reasons, the clean entry point is not called. Verify that resources in ADMIN-WAIT
are in clean, OFFLINE state prior to invoking this command.
Note When a service group has a resource in the ADMIN_WAIT state, the following service
group operations cannot be performed on the resource: online, offline, switch, and
flush. Also, you cannot use the hastop command when resources are in the
ADMIN_WAIT state. When this occurs, you must issue the hastop command with
-force option only.
DG
state is
disabled.
I/O
fencing No Log DG error
No failover.
enabled? in engine log.
Yes
Yes
Panic system
(dump, crash, Monitor entry point
and halt
processes.)
Clean entry point
Overload Warning
Overload warning provides the notification component of the Load policy. When a server
sustains the preset load level (set by the attribute LoadWarningLevel) for a preset time (set
by the attribute LoadTimeThreshold), the loadwarning trigger is invoked. For a full
description of event management with triggers, see “VCS Event Triggers” on page 431.
For details on the attributes cited above, see “System Attributes” on page 638.
Additional Considerations
VCS provides the option of creating zones for systems in a cluster to further fine-tune
application failover decisions. It also provides options to identify a suitable system to host
a service group when the cluster starts.
System Zones
The SystemZones attribute enables you to create a subset of systems to use in an initial
failover decision. This feature allows fine-tuning of application failover decisions, and yet
retains the flexibility to fail over anywhere in the cluster.
If the attribute is configured, a service group tries to stay within its zone before choosing a
host in another zone. For example, in a three-tier application infrastructure with Web,
application, and database servers, you could create two system zones: one each for the
application and the database. In the event of a failover, a service group in the application
zone will try to fail over to another node within the zone. If no nodes are available in the
application zone, the group will fail over to the database zone, based on the configured
load and limits.
In this configuration, excess capacity and limits on the database backend are kept in
reserve to handle the larger load of a database failover. The application servers handle the
load of service groups in the application zone. During a cascading failure, the excess
capacity in the cluster is available to all service groups.
Load-Based AutoStart
VCS provides a method to determine where a service group comes online when the
cluster starts. Setting the AutoStartPolicy to Load instructs the VCS engine, HAD, to
determine the best system on which to start the groups. VCS places service groups in an
AutoStart queue for load-based startup as soon as the groups probe all running systems.
VCS creates a subset of systems that meet all prerequisites and then chooses the system
with the highest AvailableCapacity.
You can use AutoStartPolicy = Load and SystemZones to establish a list of preferred
systems on which to initially run a group.
system LargeServer1 (
Capacity = 200
Limits = { ShrMemSeg=20, Semaphores=10, Processors=12 }
LoadWarningLevel = 90
LoadTimeThreshold = 600
)
group G1 (
SystemList = { LargeServer1, LargeServer2, MedServer1,
MedServer2 }
SystemZones = { LargeServer1=0, LargeServer2=0,
MedServer1=1, MedServer2=1 }
AutoStartPolicy = Load
AutoStartList = { MedServer1, MedServer2 }
FailOverPolicy = Load
Load = 100
Prerequisites = { ShrMemSeg=10, Semaphores=5, Processors=6 }
)
system Server1 (
Capacity = 100
)
system Server2 (
Capacity = 100
)
system Server3 (
Capacity = 100
)
system Server4 (
Capacity = 100
)
group G1 (
SystemList = { Server1, Server2, Server3, Server4 }
AutoStartPolicy = Load
AutoStartList = { Server1, Server2, Server3, Server4 }
FailOverPolicy = Load
Load = 20
)
group G2 (
SystemList = { Server1, Server2, Server3, Server4 }
AutoStartPolicy = Load
AutoStartList = { Server1, Server2, Server3, Server4 }
FailOverPolicy = Load
Load = 40
)
group G3 (
SystemList = { Server1, Server2, Server3, Server4 }
AutoStartPolicy = Load
AutoStartList = { Server1, Server2, Server3, Server4 }
FailOverPolicy = Load
Load = 30
)
group G4 (
SystemList = { Server1, Server2, Server3, Server4 }
AutoStartPolicy = Load
AutoStartList = { Server1, Server2, Server3, Server4 }
FailOverPolicy = Load
Load = 10
)
group G5 (
SystemList = { Server1, Server2, Server3, Server4 }
AutoStartPolicy = Load
AutoStartList = { Server1, Server2, Server3, Server4 }
FailOverPolicy = Load
Load = 50
)
group G6 (
SystemList = { Server1, Server2, Server3, Server4 }
AutoStartPolicy = Load
AutoStartList = { Server1, Server2, Server3, Server4 }
FailOverPolicy = Load
Load = 30
)
group G7 (
SystemList = { Server1, Server2, Server3, Server4 }
AutoStartPolicy = Load
AutoStartList = { Server1, Server2, Server3, Server4 }
FailOverPolicy = Load
Load = 20
)
group G8 (
SystemList = { Server1, Server2, Server3, Server4 }
AutoStartPolicy = Load
AutoStartList = { Server1, Server2, Server3, Server4 }
FailOverPolicy = Load
Load = 40
)
AutoStart Operation
In this configuration, assume that groups probe in the same order they are described, G1
through G8. Group G1 chooses the system with the highest AvailableCapacity value. All
systems have the same available capacity, so G1 starts on Server1 because this server is
lexically first. Groups G2 through G4 follow on Server2 through Server4. With the startup
decisions made for the initial four groups, the cluster configuration resembles:
As the next groups come online, group G5 starts on Server4 because this server has the
highest AvailableCapacity value. Group G6 then starts on Server1 with AvailableCapacity
of 80. Group G7 comes online on Server3 with AvailableCapacity of 70 and G8 comes
online on Server2 with AvailableCapacity of 60.
The cluster configuration now resembles:
In this configuration, Server2 fires the loadwarning trigger after 600 seconds because it is
at the default LoadWarningLevel of 80 percent.
Failure Scenario
In the first failure scenario, Server4 fails. Group G4 chooses Server1 because Server1 and
Server3 have AvailableCapacity of 50 and Server1 is lexically first. Group G5 then comes
online on Server3. Serializing the failover choice allows complete load-based control and
adds less than one second to the total failover time.
Following the first failure, the configuration now resembles:
In this configuration, Server3 fires the loadwarning trigger to notify that the server is
overloaded. An administrator can then switch group G7 to Server1 to balance the load
across groups G1 and G3. When Server4 is repaired, it rejoins the cluster with an
AvailableCapacity value of 100, making it the most eligible target for a failover group.
system LargeServer1 (
Capacity = 200
Limits = { ShrMemSeg=20, Semaphores=10, Processors=12 }
LoadWarningLevel = 90
LoadTimeThreshold = 600
)
system LargeServer2 (
Capacity = 200
Limits = { ShrMemSeg=20, Semaphores=10, Processors=12 }
LoadWarningLevel=70
LoadTimeThreshold=300
)
system MedServer1 (
Capacity = 100
Limits = { ShrMemSeg=10, Semaphores=5, Processors=6 }
)
system MedServer2 (
Capacity = 100
Limits = { ShrMemSeg=10, Semaphores=5, Processors=6 }
)
group G1 (
SystemList = { LargeServer1, LargeServer2, MedServer1, MedServer2 }
SystemZones = { LargeServer1=0, LargeServer2=0, MedServer1=1,
MedServer2=1 }
AutoStartPolicy = Load
AutoStartList = { LargeServer1, LargeServer2 }
FailOverPolicy = Load
Load = 100
Prerequisites = { ShrMemSeg=10, Semaphores=5, Processors=6 }
)
group G2 (
SystemList = { LargeServer1, LargeServer2, MedServer1, MedServer2 }
SystemZones = { LargeServer1=0, LargeServer2=0, MedServer1=1,
MedServer2=1 }
AutoStartPolicy = Load
AutoStartList = { LargeServer1, LargeServer2 }
FailOverPolicy = Load
Load = 100
Prerequisites = { ShrMemSeg=10, Semaphores=5, Processors=6 }
)
group G3 (
SystemList = { LargeServer1, LargeServer2, MedServer1, MedServer2 }
SystemZones = { LargeServer1=0, LargeServer2=0, MedServer1=1,
MedServer2=1 }
AutoStartPolicy = Load
AutoStartList = { MedServer1, MedServer2 }
FailOverPolicy = Load
Load = 30
)
group G4 (
SystemList = { LargeServer1, LargeServer2, MedServer1, MedServer2 }
SystemZones = { LargeServer1=0, LargeServer2=0, MedServer1=1,
MedServer2=1 }
AutoStartPolicy = Load
AutoStartList = { MedServer1, MedServer2 }
FailOverPolicy = Load
Load = 20
)
AutoStart Operation
In this configuration, the AutoStart sequence resembles:
G1—LargeServer1
G2—LargeServer2
G3—MedServer1
G4—MedServer2
All groups begin a probe sequence when the cluster starts. Groups G1 and G2 have an
AutoStartList of LargeServer1 and LargeServer2. When these groups probe, they are
queued to go online on one of these servers, based on highest AvailableCapacity value. If
G1 probes first, it chooses LargeServer1 because LargeServer1 and LargeServer2 both
have an AvailableCapacity of 200, but LargeServer1 is lexically first. Groups G3 and G4
use the same algorithm to determine their servers.
Normal Operation
The configuration resembles:
Failure Scenario
In this scenario, if LargeServer2 fails, VCS scans all available systems in group G2’s
SystemList that are in the same SystemZone and creates a subset of systems that meet the
group’s prerequisites. In this case, LargeServer1 meets all required Limits. Group G2 is
brought online on LargeServer1. This results in the following configuration:
system LargeServer1 (
Capacity = 200
Limits = { ShrMemSeg=15, Semaphores=30, Processors=18 }
LoadWarningLevel = 80
LoadTimeThreshold = 900
)
system LargeServer2 (
Capacity = 200
Limits = { ShrMemSeg=15, Semaphores=30, Processors=18 }
LoadWarningLevel=80
LoadTimeThreshold=900
)
system LargeServer3 (
Capacity = 200
Limits = { ShrMemSeg=15, Semaphores=30, Processors=18 }
LoadWarningLevel=80
LoadTimeThreshold=900
)
system MedServer1 (
Capacity = 100
Limits = { ShrMemSeg=5, Semaphores=10, Processors=6 }
)
system MedServer2 (
Capacity = 100
Limits = { ShrMemSeg=5, Semaphores=10, Processors=6 }
)
system MedServer3 (
Capacity = 100
Limits = { ShrMemSeg=5, Semaphores=10, Processors=6 }
)
system MedServer4 (
Capacity = 100
Limits = { ShrMemSeg=5, Semaphores=10, Processors=6 }
)
system MedServer5 (
Capacity = 100
Limits = { ShrMemSeg=5, Semaphores=10, Processors=6 }
)
group Database1 (
SystemList = { LargeServer1, LargeServer2, LargeServer3,
MedServer1, MedServer2, MedServer3, MedServer4, MedServer5 }
SystemZones = { LargeServer1=0, LargeServer2=0, LargeServer3=0,
MedServer1=1, MedServer2=1, MedServer3=1, MedServer4=1,
MedServer5=1 }
AutoStartPolicy = Load
AutoStartList = { LargeServer1, LargeServer2, LargeServer3 }
FailOverPolicy = Load
Load = 100
Prerequisites = { ShrMemSeg=5, Semaphores=10, Processors=6 }
)
group Database2 (
SystemList = { LargeServer1, LargeServer2, LargeServer3,
MedServer1, MedServer2, MedServer3, MedServer4, MedServer5 }
SystemZones = { LargeServer1=0, LargeServer2=0, LargeServer3=0,
MedServer1=1, MedServer2=1, MedServer3=1, MedServer4=1,
MedServer5=1 }
AutoStartPolicy = Load
AutoStartList = { LargeServer1, LargeServer2, LargeServer3 }
FailOverPolicy = Load
Load = 100
Prerequisites = { ShrMemSeg=5, Semaphores=10, Processors=6 }
)
group Database3 (
SystemList = { LargeServer1, LargeServer2, LargeServer3,
MedServer1, MedServer2, MedServer3, MedServer4, MedServer5 }
SystemZones = { LargeServer=0, LargeServer2=0, LargeServer3=0,
MedServer1=1, MedServer2=1, MedServer3=1, MedServer4=1,
MedServer5=1 }
AutoStartPolicy = Load
AutoStartList = { LargeServer1, LargeServer2, LargeServer3 }
FailOverPolicy = Load
Load = 100
Prerequisites = { ShrMemSeg=5, Semaphores=10, Processors=6 }
)
group Application1 (
SystemList = { LargeServer1, LargeServer2, LargeServer3,
MedServer1, MedServer2, MedServer3, MedServer4, MedServer5 }
SystemZones = { LargeServer1=0, LargeServer2=0, LargeServer3=0,
MedServer1=1, MedServer2=1, MedServer3=1, MedServer4=1,
MedServer5=1 }
AutoStartPolicy = Load
AutoStartList = { MedServer1, MedServer2, MedServer3, MedServer4,
MedServer5 }
FailOverPolicy = Load
Load = 50
)
group Application2 (
SystemList = { LargeServer1, LargeServer2, LargeServer3,
MedServer1, MedServer2, MedServer3, MedServer4, MedServer5 }
SystemZones = { LargeServer1=0, LargeServer2=0, LargeServer3=0,
MedServer1=1, MedServer2=1, MedServer3=1, MedServer4=1,
MedServer5=1 }
AutoStartPolicy = Load
AutoStartList = { MedServer1, MedServer2, MedServer3, MedServer4,
MedServer5 }
FailOverPolicy = Load
Load = 50
)
group Application3 (
SystemList = { LargeServer1, LargeServer2, LargeServer3,
MedServer1, MedServer2, MedServer3, MedServer4, MedServer5 }
SystemZones = { LargeServer1=0, LargeServer2=0, LargeServer3=0,
MedServer1=1, MedServer2=1, MedServer3=1, MedServer4=1,
MedServer5=1 }
AutoStartPolicy = Load
AutoStartList = { MedServer1, MedServer2, MedServer3, MedServer4,
MedServer5 }
FailOverPolicy = Load
Load = 50
)
group Application4 (
SystemList = { LargeServer1, LargeServer2, LargeServer3,
MedServer1, MedServer2, MedServer3, MedServer4, MedServer5 }
SystemZones = { LargeServer1=0, LargeServer2=0, LargeServer3=0,
MedServer1=1, MedServer2=1, MedServer3=1, MedServer4=1,
MedServer5=1 }
AutoStartPolicy = Load
AutoStartList = { MedServer1, MedServer2, MedServer3, MedServer4,
MedServer5 }
FailOverPolicy = Load
Load = 50
)
group Application5 (
SystemList = { LargeServer1, LargeServer2, LargeServer3,
MedServer1, MedServer2, MedServer3, MedServer4, MedServer5 }
SystemZones = { LargeServer1=0, LargeServer2=0, LargeServer3=0,
MedServer1=1, MedServer2=1, MedServer3=1, MedServer4=1,
MedServer5=1 }
AutoStartPolicy = Load
AutoStartList = { MedServer1, MedServer2, MedServer3, MedServer4,
MedServer5 }
FailOverPolicy = Load
Load = 50
)
AutoStart Operation
Based on the preceding main.cf example, the AutoStart sequence resembles:
Database1 LargeServer1
Database2 LargeServer2
Database3 LargeServer3
Application1 MedServer1
Application2 MedServer2
Application3 MedServer3
Application4 MedServer4
Application5 MedServer5
Normal Operation
The configuration resembles:
Failure Scenario
In the following example, LargeServer3 fails. VCS scans all available systems in the
SystemList for the Database3 group for systems in the same SystemZone and identifies
systems that meet the group’s prerequisites. In this case, LargeServer1 and LargeServer2
meet the required Limits. Database3 is brought online on LargeServer1. This results in the
following configuration:
In this scenario, further failure of either system can be tolerated because each has
sufficient Limits available to accommodate the additional service group.
Parent and child service groups are linked by a rule. This link defines the behavior of the
groups when one of them faults. A link can be configured according to the following
criteria:
◆ The category of the dependency, such as online or offline (described in “Categories of
Service Group Dependencies” on page 387).
◆ The location of the dependency, such as local, global, or remote (described in
“Location of Dependency” on page 388).
◆ The type of dependency, such as soft, firm, or hard (described in “Type of
Dependency” on page 389).
Each service group dependency can be associated with a category, location, and type. For
example, you could have an online local soft dependency. Note that all combinations of
category, location, and dependency type are not supported.
Based on the type of link, VCS brings the parent/child service group online or takes it
offline when one of the linked service groups faults.The link also controls the location
where VCS brings a group online following events such as a resource fault, automatic
group start, system shutdown, etc.
385
Why Configure a Service Group Dependency?
Parent Group
Database
Application
local soft
online
Database
Service
Child Group
Public Network
Child Group
offline
Production Test
Application local Application
Parent Group
System A System B
Private Networks
Location of Dependency
The location of the dependency determines the relative location of parent and child
groups. In the following examples, parent and child groups can be failover or parallel, as
described in “Service Groups” on page 11.
Local Dependency
In a local dependency, an instance of the parent group depends on an instance of the child
group being online or offline on the same system. For example, in an online local
dependency, a child group must be online on a system before the parent group can come
online on the same system.
Global Dependency
In a global dependency an instance of the parent group depends on one or more instances
of the child group being online on any system. In an online global dependency, the child
group must be online somewhere in the cluster before the parent group can come online.
Remote Dependency
In a remote dependency an instance of parent group depends on one or more instances of
the child group being online on any system other than the system on which the parent is
online. For example, for the parent to come online on System A, the child must be online
on any system in the cluster except System A.
Type of Dependency
The type of dependency defines the rigidity of the link between parent and child groups.
There are three dependency types: soft, firm and hard.
Soft Dependency
In a soft dependency, VCS imposes minimal constraints while onlining parent/child
groups. The only constraint is that child group must be online prior to the parent group
being brought online; the location of the dependency determines where the child group
must be online. For example, in an online local soft dependency, an instance of the child
group must be online on the same system before the parent group can come online.
Soft dependency provides the following enhanced flexibility:
◆ If the child group faults, VCS does not immediately take the parent offline. If the child
group cannot fail over, the parent remains online.
◆ When both groups are online, the child group can be taken offline while the parent is
online and vice versa (the parent group can be taken offline while the child is online).
◆ To link a parent and child group, the child group is not required to be online if the
parent is online. However, if the child group is also online, the parent and child may
not be linked in such a way that their online states conflict with the type of link
between parent and child.
The location of the link (local, global, or remote) designates whether or not a parent group
will fail over after a fault and failover of the child group.
Firm Dependency
Firm dependency means VCS imposes more constraints when onlining parent/child
groups. The child group must be online prior to the parent group being brought online; the
location of the dependency determines where the child group must be online. In addition
to the constraints imposed by soft dependency, firm dependency also includes the
following constraints:
◆ If the child group faults, the parent is taken offline. If the child cannot fail over, the
parent remains offline. However, if the child group faults and the parent group is
frozen, the parent remains in its original state.
◆ The child group cannot be taken offline while the parent group is online. However,
the parent group can be taken offline while the child is online.
◆ To link a parent and child group with firm dependency, the parent group must be
offline or the parent and child group must be online in such a way that their online
states do not conflict with the type of link between parent and child.
Both soft and firm dependencies allow that if the parent group faults, the child group does
not. The parent group may or may not fail over, depending on the link constraints and
locations (such as online local versus online global). See “Service Group Dependency
Configurations” on page 391 for more information.
Hard Dependency
A hard dependency imposes maximum constraints on linked service groups and provides
a closer relationship between parent and child groups. In a hard dependency, the child
and the parent groups fail over to the same system together when either the child or the
parent faults.
Online Local Failover System for Child Group No Failover System for Child
Group
Online Local Failover System for Parent Group No Failover System for Parent
Group
Online Global Failover System for Child Group No Failover System for Child
Group
Online Global Failover System for Parent Group No Failover System for Parent
Group
Online Remote Failover System for Child Group No Failover System for Child
Group
Online Remote Failover System for Parent Group No Failover System for Parent
Group
Offline Local Failover System for Child Group No Failover System for Child Group
Offline Local Failover System for Parent Group No Failover System for Parent Group
Automatic Online
If a service group is configured to start automatically on a system, it is brought online only
if the group’s dependency requirements are met. This implies that in an online local
dependency, parent groups are brought online only after all child groups are brought
online.
AutoRestart
If a persistent resource on a service group (GROUP_1 in this example) faults, the service
group is automatically failed over to another system in the cluster under the following
conditions:
◆ The AutoFailOver attribute is set.
◆ There is another system in the cluster to which GROUP_1 can fail over.
If neither of the above conditions is met (the AutoFailOver attribute is not set or other
systems in the cluster are unavailable), GROUP_1 remains offline and faulted, even after
the faulted resource becomes online.
Setting the AutoRestart attribute enables a service group to be brought back online
without manual intervention. In the above example, setting the AutoRestart attribute for
GROUP_1 would enable VCS to bring the group back online, after the resource came
online on the system where the resource faulted.
Or, if GROUP_1 could not fail over to another system because none was available, setting
the AutoRestart attribute would enable VCS to bring the group back online on the first
available system after the group’s faulted resource came online on that system.
For example, NIC is a persistent resource. In some cases, when a system boots and VCS
starts, VCS probes all resources on the system. It is possible that when VCS probes the
NIC resource, the resource may not yet be online because the networking is not up and
fully operational. When this occurs, VCS will mark the NIC resource as faulted, and will
not bring the service group online. However, when the NIC resource becomes online and
if AutoRestart is enabled, the service group is brought online.
Automatic Failover
A failover occurs when a service group faults and is migrated to another system or when
a system crashes. For service groups with dependencies, the following actions occur
during failover
1. The service group is taken offline along with any of its parent service groups that
have an online firm or hard dependency (online local firm, online global firm, online
remote firm, or online local hard).
2. A failover target is chosen from the SystemList of the service group based on the
failover policy and the restrictions brought by the service group dependencies.
If the faulted service group is also the parent service group in a service group
dependency relationship, the service group dependency has an impact on the choice
of a target system. For example, if the faulted service group has an online local (firm
or soft) dependency with a child service group that is online only on that system, no
failover targets are available.
3. If there are no other systems the service group can fail over to, the child service group
and the parents that were taken offline remain offline. Note that for soft
dependencies, when child group faults and cannot fail over, the parent group remains
online.
4. If there is a failover target, then VCS takes any child service group with an online local
hard dependency offline.
5. VCS then checks if there are any conflicting parent service groups that are already
online on the target system. These service groups can be parent service groups that
are linked with an offline local dependency or online remote soft dependency. In
either case, the parent service group is taken offline to enable the child service group
to start on that system.
6. If there is any child service group with an online local hard dependency, first the child
service group and then the service group that initiated the failover are brought online.
7. After the service group is brought online successfully on the target system, VCS takes
any parent service groups offline that have an online local soft dependency to the
failed-over child.
8. Finally, VCS selects a failover target for any parent service groups that may have been
taken offline during steps 1, 5, or 7 and brings the parent service group online on an
available system.
9. If there are no target systems available to fail over the parent service group that has
been taken offline, the parent service group remains offline.
Manual Online
Basic rules governing how to manually bring a service group online also apply to service
groups with dependencies. Additionally, the following rules apply for service groups
configured with dependencies. For example:
◆ For online dependencies, a parent group cannot be brought online manually if the
child is not online.
◆ For online local dependencies, a parent group cannot be brought online manually on
any system other than the system on which the child is online.
◆ For online remote dependencies, a parent group cannot be brought online manually
on the system on which the child is online.
◆ For offline local dependencies, a parent group cannot be brought online manually on
the system on which the child is online.
Typically, bringing a child group online manually is never rejected, except under the
following circumstances:
◆ For online local dependencies, if parent is online, a child group online is rejected for
any system other than the system where parent is online.
◆ For online remote dependencies, if parent is online, a child group online is rejected for
the system where parent is online.
◆ For offline local dependencies, if parent is online, a child group online is rejected for
the system where parent is online.
The following examples describe situations where bringing a parallel child group online
is accepted:
◆ For a parallel child group linked online local with failover/parallel parent, multiple
instances of child group online are acceptable.
◆ For a parallel child group linked online remote with failover parent, multiple
instances of child group online are acceptable, as long as child group does not go
online on the system where parent is online.
◆ For a parallel child group linked offline local with failover/parallel parent, multiple
instances of child group online are acceptable, as long as child group does not go
online on the system where parent is online.
Manual Offline
Basic rules governing how to manually take a service group offline also apply to service
groups with dependencies. Additionally, VCS rejects manual offlining if the procedure
violates existing group dependencies. Typically, firm dependencies are more restrictive to
offlining a child group while parent group is online. Rules for manual offlining include:
◆ Parent group offline is never rejected.
◆ For all soft dependencies, child group can be offlined regardless of the state of parent
group.
◆ For all firm dependencies, if parent group is online, child group offline is rejected.
◆ For the online local hard dependency, if parent group is online, child group offline is
rejected.
Manual Switch
Switching a service group implies manually taking a service group offline on one system,
and manually bringing it back online on another system. Basic rules governing how to
manually switch a service group also apply to service group dependencies. Additionally,
VCS rejects manual switch if the group does not comply with manual offline or manual
online rules described above.
Dependency Limitations
◆ Each parent group can link with only one child group; however, a child group can
have multiple parents.
◆ A service group dependency tree can have three levels, maximum.
◆ You cannot link two service groups whose current states violate the relationship.
◆ All link requests are accepted if all instances of parent group are offline.
◆ All online local link requests are rejected if for an instance of parent group, an
instance of child group is not online on the same system.
◆ All online remote link requests are rejected when an instance of parent group and an
instance of child group are running on the same system.
◆ All offline local link requests are rejected when an instance of parent group and an
instance of child group are running on the same system.
◆ All link requests are rejected, if parent group is online and child group is offline.
◆ All online global/online remote link requests to link two parallel groups are rejected.
◆ All online local link requests to link a parallel parent group to a failover child group
are rejected.
online local firm Failover parent group firm depends on failover child group being
online on the same system.
Parent can be brought online on a system, for example, System A, only if the child is
online on System A.
✔ If the child faults, the parent is taken offline on System A. When a child successfully
fails over to another system, for example System B, VCS migrates the parent to System
B. If child cannot fail over, parent remains offline.
✔ If parent faults on System A, child remains online on System A. Parent cannot fail
over anywhere.
online local hard Failover parent group firm depends on failover child group being online
on the same system.
Parent can be brought online on a system, for example, System A, only if the child is
online on System A.
✔ If the child faults, the parent is taken offline on System A. When a child successfully
fails over to another system, for example System B, VCS migrates the parent to System
B. If child cannot fail over, parent remains offline.
✔ If parent faults on System A, child is taken offline on System A. When child
successfully fails over to System B, VCS migrates the parent to System B. If child
cannot fail over, child continues to run on System A.
online global soft Failover parent group soft depends on failover child group being
online anywhere in the cluster. Parent can be brought online as long as a child group is
running somewhere in the cluster.
✔ If the child faults, the parent remains online when the child faults and fails over. The
parent also remains online when the child faults and cannot fail over.
✔ If parent faults on System A, child remains online on System A. Parent fails over to
next-available system. If no system is available, the parent remains offline.
online global firm Failover parent group firm depends on failover child group being
online anywhere in the cluster.
Parent can be brought online as long as a child group is running somewhere in the cluster.
For example, the parent group is online on System A, and the child group is online on
System B.
✔ If the child faults on System B, the parent group on System A is taken offline. When
the child successfully fails over to another system, for example, System C, the parent
group is brought online on a suitable system. If child group cannot fail over, parent
group remains offline.
✔ If parent faults on System A, child remains online on System A. Parent fails over to
next-available system. If no system is available, the parent remains offline.
online remote soft Failover parent group soft depends on failover child group being
online on any other system in the cluster.
Parent can be brought online on any system other than the system on which the child is
online. For example if child group is online on System B, the parent group can be online
on System A.
✔ If the child faults on System B, the parent remains online on System A unless VCS
selects System A as the target system on which to bring the child group online. In that
case, the parent is taken offline. After the child successfully fails over to System A,
VCS brings the parent online on another system, for example System B. If the child
faults on System A, the parent remains online on System B unless VCS selects System
B as the target system.
online remote firm Failover parent group firm depends on failover child group being
online on any other system in the cluster.
Parent can be brought online on any system other than the system on which the child is
online. For example if child group is online on System A, the parent group can be online
on System B.
✔ If the child faults on System A, the parent is taken offline on System B. After the child
successfully fails over to another system, VCS brings the parent online on a system
other than B where the child is also offline. If no other system is available and if the
child is offline on System B, the parent is restarted on System B.
✔ If the parent faults on System A, the child remains online on System B. The parent on
System A fails over to a system other than A or B. If no system is available, the parent
remains offline.
offline local Failover parent group depends on failover child group being offline on the
same system and vice versa.
Parent can be brought online on any system as long as the child is not online on the
system, and vice versa. For example, if child group is online on System B, the parent can
be brought online on System A.
✔ If the child faults on System B, and if VCS selects System A as the target on which to
bring the child online, the parent on System A is taken offline and the child is brought
online. However, if child selects System C as the target, parent remains online on
System A.
✔ If parent faults, child remains online. If there is no other system to which parent can
fail over, parent remains offline.
online global soft Failover parent group soft depends on all online instances of the child
remaining online.
Failover group can be brought online anywhere as long as one or more instances of the
child group are online somewhere in the cluster.
✔ If one or more instances of the child group fault, the parent remains online.
Consider that multiple instances of the child group are online on Systems A and B, and
the parent group is online on System A.
✔ If parent faults, it fails over to System B. Both instances of the child group remain
online, and the parent group maintains its dependency on the instances.
online global firm Failover parent group firm depends on all instances of the child group
being online anywhere in the cluster.
Failover group can be brought online anywhere as long as all instances of the child group
are online somewhere in the cluster. For example, if two instances of the child are online
on Systems A and B, and the parent is online on System A, if an instance of the child
group faults, the parent is taken offline on System A. After the child has successfully
failed over to System C, VCS fails over the parent group to another system. If the instance
of the child group cannot fail over, the parent may not be brought online.
Consider that multiple instances of the child group are online on Systems A and B, and
the parent group is online on System A.
✔ If parent faults, it fails over to System B. Both instances of the child group remain
online, and the parent group maintains its dependency on the instances.
online remote soft Failover parent group soft depends on all instances of the child group
being online on another system in the cluster.
Parent can be brought online on any system other than the system on which the child is
online. For example if child group is online on Systems A and C, the parent group can be
online on System B.
✔ If the child faults on System A, the parent remains online on System B unless VCS
selects System B as the target system. After the child successfully fails over to System
B, VCS brings the parent online on another system, for example, System D.
✔ If parent group faults on System B, both instances of the child group remain online.
The parent group fails over to System D and maintains its dependency on both
instances of the child group.
online remote firm Failover parent group firm depends on all instances of the child
group being online on another system in the cluster.
Failover group can be brought online anywhere as long as all instances of the child group
are online on another system. For example, if a child group is online on System A and
System C, the parent group can be online on System B. When the child group on System A
faults, the parent is taken offline. After the child has successfully failed over to System B,
VCS brings the parent online on another system, for example, System D. If the child group
fails over to System D, the parent group is restarted on System B.
Note System D is selected as an example only. The parent may be restarted on Systems A,
B, or D, depending on the value of the FailOverPolicy attribute for the parent group
and the system on which the child group is online.
✔ If parent group faults on System B, both instances of the child group remain online.
The parent group fails over to System D and maintains its dependency on both
instances of the child group.
offline local Failover parent group depends on no instances of the child group being
online on the same system, and vice versa.
Failover group can be brought online anywhere as long as any instances of the child
group are not online on that system, and vice versa. For example, if the child group is
online on Systems B and C, the parent group can be brought online on SystemA. If the
child group faults on System C, and if VCS selects System A as the target on which to
bring the child group online, the parent group on System A is taken offline and the child
is brought online. However, if the child group selects System D as the target, the parent
group remains online on System A.
✔ If the parent group faults, the child group remains online. If there is no other system
to which the parent can fail over, the parent remains offline.
online remote soft All instances of parent group soft depend on failover group on any
other system.
An instance of the parent group can be online anywhere as long as the child is online on
another system. For example, the child group is online on System A, the parent group can
be online on System B and System C.
✔ If the child group faults and VCS selects System B as the target on which to bring the
child online, the instance of the parent group running on System B is taken offline.
After the child has successfully failed over to System B, VCS brings online the failed
parent instance to another system, for example, System D.
However, if the child group failed over to System D, the parent remains online. (If
parent group on System B faults, it fails over to System D. The child group remains
online on System A.)
online remote firm (default) All instances of parent group firm depend on failover group
on any other system.
An instance of the parent group can be online anywhere as long as the child is online on
another system. For example, if the child group is online on System A, the parent group
can be online on System B and System C.
✔ If the child faults, all instances of the parent group are taken offline on System B and
System C. After the child has successfully failed over to System C, VCS fails over all
instances of the parent group on Systems A and B to other systems where the child is
also offline. If there are no available systems and if the child is offline on the same
system on which the parent was taken offline, the parent is restarted on the same
system.
offline local All instances of the parent group depend on the child group being offline on
that system and vice versa.
An instance of the parent group can be brought online anywhere as long as the child is not
online on the system, and vice versa. For example, if the child group is online on System
A, the parent group can be online on System B and System C.
✔ If the child faults on System A, and if VCS selects System B as the target on which to
bring the child online, the parent on System B is taken offline first. However, if the
child fails over to System D, the parent group remains online on Systems B and C.
✔ If the parent group faults on System B, the child group remains online on System A
and the parent group fails over to System D.
online local firm An instance of the parent group firm depends on an instance of the child
group on the same system.
An instance of a parent group can be brought online on a system, for example, System A,
only if an instance of a child group is online on System A. For example, two instances of
the parent are online on System A and System B, and each instance depends on an
instance of the child being online on the same system.
✔ If an instance of the child group on System A faults, the instance of the parent group
on System A is taken offline. After the child fails over to another system, for example,
System C, VCS brings an instance of the parent group online on System C. Other
instances of the parent group are unaffected.
✔ If an instance of the parent group on System B faults, it can fail over to System C only
if an instance of the child group is running on System C and no instance of the parent
group is running on System C.
offline local An instance of a parent group depends on an instance of a child group being
offline on the same system and vice versa.
An instance of a parent group can be brought online provided that an instance of the child
is not online on the same system and vice versa. For example, if the child group is online
on System C and System D, the parent can be online on System A and System B.
✔ If the child on System C faults and VCS selects System A as the target on which to
bring the child group online, the instance of the parent on System A is taken offline
first.
✔ When an instance of a child group or parent group faults, it has no effect on the other
running instances.
SNMP
SMTP
Warning SNMP
SMTP Error
SevereError
Information
notifier
HAD HAD
System A System B
415
How Notification Works
SNMP traps sent by VCS are forwarded to the SNMP console. Typically, traps are
predefined for events such as service group or resource faults. You can use the hanotify
utility to send additional traps.
Notification Components
This section describes the notifier process and the hanotify utility.
notifier
hanotify
Queue Queue
had had
System A System B
In this example, the number 1.3.6.1.4.1.1302.3.8.10.2.8.0.10 is the OID for the message
being sent. Because it is a user-defined message, VCS has no way of knowing the OID
associated with the SNMP trap corresponding to this message so the user must provide it.
The other parameters to hanotify specify the message is severity level Warning. The
systems affected are sys1 and sys2. Running this command sends a custom message for
the resource agentres from the agent MyAgent.
Clusters
Remote cluster has faulted. Error The trap for this event includes information
(Global Cluster Option) on how to take over the global service
groups running on the remote cluster before
the cluster faulted.
Heartbeat is down. Error The connector on the local cluster lost its
heartbeat connection to the remote cluster.
Remote cluster is in RUNNING state. Information Local cluster has complete snapshot of the
(Global Cluster Option) remote cluster, indicating the remote cluster
is in the RUNNING state.
User has logged on to VCS. Information A user log on has been recognized because a
user logged on via Cluster Manager, or
because a haxxx command was invoked.
Agents
Agent is faulted. Warning The agent has faulted on one node in the
cluster.
Resources
Resource state is unknown. Warning VCS cannot identify the state of the
resource.
Resource monitoring has timed Warning Monitoring mechanism for the resource has
out. timed out.
Resource is not going offline. Warning VCS cannot take the resource offline.
Resource went online by itself. Warning (not for The resource was brought online on its own.
first probe)
Resource is being restarted by Information The resource is being restarted by its agent.
agent.
The health of cluster resource Information Used by agents to give extra information
improved. about state of resource. Health of the
resource improved while it was online.
Resource monitor time has Warning This trap is generated when statistics
changed. analysis for the time taken by the monitor
entry point of an agent is enabled for the
agent. See “VCS Agent Statistics” on
page 566 for more information.
This trap is generated when the agent
framework detects a sudden or gradual
increase or decrease in the time taken to run
the monitor entry point for a resource. The
trap information contains details of the
change in time required to run the monitor
entry point and the actual times that were
compared to deduce this change.
Resource is in ADMIN_WAIT state. Error The resource is in the admin_wait state. See
“Controlling Clean Behavior on Resource
Faults” on page 346 for more information.
Systems
VCS has exited manually. Information VCS has exited gracefully from one node on
which it was previously running.
CPU usage exceeded threshold on Warning The system’s CPU usage continuously
the system. exceeded the value set in the Notify
threshold for a duration greater than the
Notify time limit. See “Bringing a Resource
Online” on page 557 for more information.
Service Groups
Service group concurrency SevereError A failover service group has become online
violation. on more than one node in the cluster.
Service group has faulted and SevereError Specified service group has faulted on all
cannot be failed over anywhere. nodes where group could be brought online,
and there are no nodes to which the group
can fail over.
Service group is autodisabled. Information VCS has autodisabled the specified group
because one node exited the cluster.
Service group is being switched. Information The service group is being taken offline on
one node and being brought online on
another.
The global service group is SevereError A concurrency violation occurred for the
online/partial on multiple clusters. global service group.
(Global Cluster Option)
SNMP-Specific Files
VCS includes two SNMP-specific files: vcs.mib and vcs_trapd, which are created in
/etc/VRTSvcs/snmp. The file vcs.mib is the textual MIB for built-in traps supported
by VCS. Load this MIB into your SNMP console to add it to the list of recognized traps.
The file vcs_trapd is specific to the HP OpenView Network Node Manager (NNM)
SNMP console, and includes sample events configured for the built-in SNMP traps
supported by VCS. To merge these events with those configured for SNMP traps:
# xnmevents -merge vcs_trapd
When you merge events, the SNMP traps sent by VCS by way of notifier are displayed in
HP OpenView NNM SNMP console, as shown below.
severityId
This variable indicates the severity of the trap being sent. It can take the following values:
Information 0
Important events exhibiting normal behavior
Warning 1
Deviation from normal behavior
Error 2
A fault
Severe Error 3
Critical error that can lead to data loss or corruption
VCS String
GCO String
entityState
This variable describes the state of the entity:
Resources States
◆ Resource state is unknown
◆ Resource monitoring has timed out
◆ Resource is not going offline
◆ Resource is being restarted by agent
◆ Resource went online by itself
◆ Resource has faulted
◆ Resource is in admin wait state
◆ Resource monitor time has changed
System States
◆ VCS is up on the first node in the Cluster
◆ VCS is being restarted by hashadow
◆ VCS is in jeopardy
◆ VCS has faulted
◆ A node running VCS has joined cluster
◆ VCS has exited manually
◆ CPU Usage exceeded the threshold on the system
VCS States
◆ User has logged into VCS
◆ Cluster has faulted
◆ Cluster is in RUNNING state
Agent States
◆ Agent is restarting
◆ Agent has faulted
Note You must configure appropriate severity for the notifier to receive these
notifications. Specifically, to receive the notifications described above, the minimum
acceptable severity level is Information.
Configuring Notification
Configuring notification involves creating a resource for the Notifier Manager
(NotifierMgr) agent in the ClusterService group. See the VERITAS Cluster Server Bundled
Agents Reference Guide for more information about the agent.
VCS provides several methods for configuring notification:
◆ Manually editing the main.cf file.
◆ Using the Notifier wizard. See “Setting up VCS Event Notification Using Notifier
Wizard” on page 201 for instructions.
located at $VCS_HOME/bin/hatrigger.
VCS also passes the name of the event trigger and the parameters specific to the event. For
hatrigger -postonline system service_group. Note that VCS does not wait for
hatrigger or the event trigger to complete execution. After calling the triggers, VCS
Event triggers are invoked by event names, for example violation denotes a
concurrency violation.
Event triggers are invoked on the system where the event occurred, with the following
exceptions:
◆ The sysoffline and nofailover event triggers are invoked from the lowest-numbered
system in RUNNING state.
◆ The violation event trigger is invoked from all systems on which the service group
was brought partially or fully online.
431
List of Event Triggers
Usage -unable_to_restart_had
◆ Chapter 16. “Administering Global Clusters from the Command Line” on page 471
Remote Site A
Global Cluster
445
How VCS Global Clusters Work
Public Clients
Cluster A Network Redirected Cluster B
Application
Failover
Oracle Oracle
Group Group
Replicated
Data
Separate Separate
Storage Storage
wac wac
Process Process
Cluster 1 Cluster 2
The wac process runs on one system in each cluster and connects with peers in remote
clusters. It receives and transmits information about the status of the cluster, service
groups, and systems. This communication enables VCS to create a consolidated view of
the status of all the clusters configured as part of the global cluster. The process also
manages wide-area heartbeating to determine the health of remote clusters. The process
also transmits commands between clusters and returns the result to the originating
cluster.
Wide-Area Heartbeats
The wide-area Heartbeat agent manages the inter-cluster heartbeat. Heartbeats are used
to monitor the health of remote clusters. For a list of attributes associated with the agent,
see “Heartbeat Attributes” on page 651. You can change the default values of the
heartbeat agents using the hahb -modify command.
Sample Configuration
Heartbeat Icmp (
ClusterList = {C1, C2}
AYAInterval@C1 = 20
AYAInterval@C1 = 30
Arguments@c1 = "X.X.X.X XX.XX.XX.XX"
Arguments@c2 = "Y.Y.Y.Y YY.YY.YY.YY"
)
Note A cluster assuming authority for a group does not guarantee the group will be
brought online on the cluster. The attribute merely specifies the right to attempt
bringing the service group online in the cluster. The presence of Authority does not
override group settings like frozen, autodisabled, non-probed, and so on, that
prevent service groups from going online.
VCS Framework
VCS agents now manage external objects that are part of wide-area failover. These objects
include replication, DNS updates, and so on. These agents provide a robust framework
for specifying attributes and restarts, and can be brought online upon fail over.
DNS Agent
The DNS agent updates the canonical name-mapping in the domain name server after a
wide-area failover. See the VERITAS Cluster Server Bundled Agents Reference Guide for
more information about the agent.
RVG Agent
The RVG agent manages the Replicated Volume Group (RVG). Specifically, it brings the
RVG online, monitors read-write access to the RVG, and takes the RVG offline. Use this
agent when using VVR for replication.
RVGPrimary agent
The RVGPrimary agent attempts to migrate or take over a Secondary to a Primary
following an application failover. The agent has no actions associated with the offline and
monitor routines.
RVGSnapshot Agent
The RVGSnapshot agent, used in fire drill service groups, takes space-optimized
snapshots so that applications can be mounted at secondary sites during a fire drill
operation.
Note See the VERITAS Cluster Server Agents for VERITAS Volume Replicator Configuration
Guide for more information about the RVG, RVGPrimary, and RVGSnapshot
agents.
Cluster A Cluster B
Steward
When all communication links between any two clusters are lost, each cluster contacts the
Steward with an inquiry message. The Steward sends an ICMP ping to the cluster in
question and responds with a negative inquiry if the cluster is running or with positive
inquiry if the cluster is down. The Steward can also be used configurations with more
than two clusters. See “Configuring the Steward Process (Optional)” on page 462 for more
information.
A Steward is effective only if there are independent paths from each cluster to the host
running the Steward. If there is only one path between the two clusters, you must prevent
split-brain by confirming manually via telephone or some messaging system with
administrators at the remote site if a failure has occurred. By default, VCS global clusters
fail over an application across cluster boundaries with administrator confirmation. You
can configure automatic failover by setting the ClusterFailOverPolicy attribute to Auto.
If you start a service group on a remote cluster while the service group is running on the
primary cluster, data corruption does not occur because the clusters use replicated data.
Instead, divergent data sets result, which must be merged manually once the split-brain is
resolved. VCS does not automatically take a service group offline after an inter-cluster
split-brain is reconnected.
Cluster Setup
You must have at least two clusters to set up a global cluster. Every cluster must have the
VCS Global Cluster Option license installed. A cluster can be part of one global cluster.
VCS supports a maximum of four clusters participating in a global cluster.
Clusters must be running on the same platform; the operating system versions can be
different. Clusters must be using the same VCS version.
Cluster names must be unique within each global cluster; system and resource names
need not be unique across clusters. Service group names need not be unique across
clusters; however, global service groups must have identical names.
Every cluster must have a valid virtual IP address, which is tied to the cluster. Define this
IP address in the cluster’s ClusterAddress attribute. This address is normally configured
as part of the initial VCS installation. The IP address must have a DNS entry.
For remote cluster operations, you must configure a VCS user with the same name and
privileges in each cluster. See “User Privileges in Global Clusters” on page 57 for more
information.
Configured Applications
Applications to be configured as global groups must be configured to represent each other
in their respective clusters. The multiple application groups of a global group must have
the same name in each cluster. The individual resources of the groups can be different. For
example, one group might have a MultiNIC resource or more Mount-type resources.
Clients redirected to the remote cluster in case of a wide-area failover must be presented
with the same application they saw in the primary cluster.
However, the resources that make up a global group must represent the same application
from the point of the client as its peer global group in the other cluster. Clients redirected
to a remote cluster should not be aware that a cross-cluster failover occurred, except for
some downtime while the administrator initiates or confirms the failover.
Wide-Area Heartbeats
There must be at least one wide-area heartbeat going from each cluster to every other
cluster. VCS starts communicating with a cluster only after the heartbeat reports that the
cluster is alive. VCS uses the ICMP ping by default, the infrastructure for which is bundled
with the product. VCS configures the ICMP heartbeat if you use Cluster Manager (Java
Console) to set up your global cluster. Other heartbeats must be configured manually.
ClusterService Group
The ClusterService group must be configured with the wac, NIC, and IP resources. It is
configured automatically when VCS is installed or upgraded, or by the GCO
configuration wizard. The service group may contain additional resources for Cluster
Manager (Web Console) and notification, if these components are configured.
If you entered a Global Cluster Option license during the VCS install or upgrade, the
ClusterService group, including the wide-area connector process, is automatically
configured.
If you add the license after VCS is operational, you must run the GCO Configuration
wizard. For instructions, see “Running the GCO Configuration Wizard” on page 456.
Replication Setup
VCS global clusters are also used in case of disaster recovery, so you must set up real-time
data replication between clusters. You can use VCS agents for supported replication
solutions to manage the replication. If your configuration uses VERITAS Volume
Replicator, you must add the VTRSvcsr package to all systems.
Note Before beginning the process, review the prerequisites listed in the section “Before
Configuring Global Clusters” on page 453 and make sure your configuration is
ready for a global cluster application.
2. The wizard discovers the NIC devices on the local system and prompts you to enter
the device to be used for the global cluster. Specify the name of the device and press
Enter.
3. If you do not have NIC resources in your configuration, the wizard asks you whether
the specified NIC will be the public NIC used by all systems. Enter y if it is the public
NIC; otherwise enter n. If you entered n, the wizard prompts you to enter the names
of NICs on all systems.
5. If you do not have IP resources in your configuration, the wizard prompts you for the
netmask associated with the virtual IP. The wizard detects the netmask; you can
accept the suggested value or enter another value.
6. The wizard starts running commands to create or update the ClusterService group.
Various messages indicate the status of these commands. After running these
commands, the wizard brings the ClusterService group online.
Configuring Replication
VCS supports several replication solutions for global clustering. Please contact your
VERITAS sales representative for the solutions supported by VCS. This section describes
how to set up replication using VERITAS Volume Replicator (VVR.)
2. Copy the DiskGroup resource from the appgroup to the new group.
3. Configure new resources of type IP and NIC in the appgroup_rep service group. The
IP resource monitors the virtual IP that VVR uses for replication.
4. Configure a new resource of type RVG in the new (appgroup_rep) service group.
The RVG agent ships with the VVR software. If the RVG resource type is not defined
in your configuration, import it, as instructed below.
b. In the Import Types dialog box, click the file from which to import the resource
type. By default, the RVG resource type is located at the path
/etc/VRTSvcs/conf/VVRTypes.cf.
c. Click Import.
Note The RVG resource starts, stops, and monitors the RVG in its current state and does
not promote or demote VVR when you want to change the direction of replication.
That task is managed by the RVGPrimary agent
8. In the appgroup service group, add a resource of type RVGPrimary and configure its
attributes:
◆ RVGResourceName—The name of the RVG resource that this agent will promote.
◆ AutoTakeover—A flag that indicates whether the agent should perform a
takeover in promoting a Secondary RVG if the original Primary is down. Default
is 1, meaning a takeover will be performed.
◆ AutoResync—A flag that indicates whether the agent should configure the RVG
to perform an automatic resynchronization after a takeover and once the original
Primary is restored. Default is 0, meaning automatic resynchronization will not
occur.
9. Set resource dependencies such that the Mount resources depends on the
RVGPrimary resource.
10. If your setup uses BIND DNS, add a resource of type DNS to the appgroup service
group and configure its attributes:
◆ Domain—Domain name. For example, veritas.com.
◆ Alias—Alias to the canonical name. For example, www.
◆ HostName—Canonical name of a system or its IP address. For example,
mtv.veritas.com.
◆ TTL—Time To Live (in seconds) for the DNS entries in the zone being updated.
Default value: 86400.
◆ StealthMasters—List of primary master name servers in the domain. This
attribute is optional if the primary master name server is listed in the zone's NS
record. If the primary master name server is a stealth server, the attribute must be
defined.
Note that a stealth server is a name server that is authoritative for a zone but is not
listed in the zone's NS records.
2. In the view panel, click the Service Groups tab. This opens the service group
dependency graph.
3. Click Link.
4. Click the parent group, appgroup, and move the mouse toward the child group,
appgroup_rep.
6. In the Link Service Groups dialog box, click the online local relationship and the firm
dependency type and click OK.
2. Create a configuration that is similar to the one in the first cluster. You can do this by
either using Cluster Manager (Java Console) to copy and paste resources from the
primary cluster, or by copying the configuration of the appgroup and appgroup_rep
groups from the main.cf file in the primary cluster to the secondary cluster.
3. To assign remote administration privileges to users, configure users with the same
name and privileges on both clusters. See “User Privileges in Global Clusters” on
page 57 for more information.
4. Make appropriate changes to the configuration. For example, you must modify the
SystemList attribute to reflect the systems in the secondary cluster.
Note Make sure that the name of the service group (appgroup) is identical in both
clusters.
It is a VVR best practice to use the same disk group and RVG name on both sites. This
means that just the RLinks attribute needs to be modified to reflect the name of the
secondary’s RLink.
If the volume names are the same on both sides, the Mount resources will mount the
same block devices, and the same Oracle instance will start at the secondary in case of
a failover.
Linking Clusters
Once the VCS and VVR infrastructure has been set up at both sites, you must link the two
clusters. The Remote Cluster Configuration wizard provides an easy interface to link
clusters.
Before linking clusters, verify the virtual IP address for the ClusterAddress attribute for
each cluster is set. Use the same IP address as the one assigned to the IP resource in the
ClusterService group.
If you are adding a stand-alone cluster to an existing global cluster environment, run the
wizard from a cluster in the global cluster environment. Otherwise, run the wizard from
any cluster. From Cluster Explorer, click Edit>Add/Delete Remote Cluster. For
instructions on running the wizard, see “Adding a Remote Cluster” on page 484.
2. In the Heartbeat configuration dialog box, enter the name of the heartbeat and select
the check box next to the name of the cluster.
3. Click the icon in the Configure column to open the Heartbeat Settings dialog box.
4. Specify the value of the Arguments attribute and various timeout and interval fields.
Click + to add an argument value; click - to delete it.
Note If you specify IP addresses in the Arguments attribute, make sure the IP addresses
have DNS entries.
5. Click OK.
1. Identify a system that will host the Steward process. Make sure both clusters can
connect to the system through a ping command.
2. Copy the file steward from a node in the cluster to the Steward system. The file
resides at the path /opt/VRTSvcs/bin/.
3. In both clusters, set the Stewards attribute to the IP address of the system running the
Steward process. For example:
cluster cluster1938 (
UserNames = { admin = gNOgNInKOjOOmWOiNL }
ClusterAddress = "10.182.147.19"
Administrators = { admin }
CredRenewFrequency = 0
CounterInterval = 5
Stewards = "10.212.100.165"
}
4. On the system designated to host the Steward, start the Steward process:
# steward -start
To stop the Steward process, use the following command:
# steward -stop
1. From Cluster Explorer, click Configure Global Groups on the Edit menu.
2. Review the information required for the Global Group Configuration Wizard and
click Next.
b. From the Available Clusters box, click the clusters on which the group can come
online. The local cluster is not listed as it is implicitly defined to be part of the
ClusterList. Click the right arrow to move the cluster name to the ClusterList box.
d. Click Next.
a. Click the Configure icon to review the remote cluster information for each
cluster.
b. Enter the IP address of the remote cluster, the IP address of a cluster system, or
the host name of a cluster system.
c. Enter the user name and the password for the remote cluster.
d. Click OK.
e. Click Next.
5. Click Finish.
Note For remote cluster operations, you must configure a VCS user with the same name
and privileges in each cluster. See “User Privileges in Global Clusters” on page 57
for more information.
1. In the Service Groups tab of the configuration tree, right-click the resource.
2. Click Actions.
c. Click OK.
This begins a fast-failback of the replicated data set. You can monitor the value of the
ResourceInfo attribute for the RVG resource to determine when the resynchronization
has completed.
4. Once the resynchronization completes, switch the service group to the primary
cluster.
a. In the Service Groups tab of the Cluster Explorer configuration tree, right-click
the service group.
c. In the Switch global group dialog box, click the cluster to switch the group. Click
the specific system, or click Any System, and click OK.
Note You can conduct fire drills only on regular VxVM volumes; volume sets (vset) are
not supported.
1. Start the RVG Secondary Fire Drill wizard on the VVR secondary site, where the
service group is not online:
# /opt/VRTSvcs/bin/fdsetup
2. Read the information on the Welcome screen and press the Enter key.
3. The wizard identifies the global service groups. Enter the name of the service group
for the fire drill.
4. The wizard lists the volumes in disk group that could be used for a space-optimized
snapshot. Enter the volumes to be selected for the snapshot. Typically, all volumes
used by the application, whether replicated or not, should be prepared, otherwise a
snapshot might not succeed.
5. Enter the cache size to store writes when the snapshot exists. The size of the cache
must be large enough to store the expected number of changed blocks during the fire
drill. However, the cache is configured to grow automatically if it fills up. Enter disks
on which to create the cache.
6. The wizard starts running commands to create the fire drill setup. Press the Enter key
when prompted.
The wizard creates the application group with its associated resources. It also creates
a fire drill group with resources for the application (Oracle, for example), the Mount,
and the RVGSnapshot types.
The application resources in both service groups define the same application, the
same database in this example. The wizard sets the FireDrill attribute for the
application resource to 1 to prevent the agent from reporting a concurrency violation
when the actual application instance and the fire drill service group are online at the
same time.
Caution Remember to take the fire drill offline once its functioning has been validated.
Failing to take the fire drill offline could cause failures in your environment. For
example, if the application service group were to fail over to the node hosting
the fire drill service group, there would be resource conflicts, resulting in both
service groups faulting.
Global Querying
VCS enables you to query global cluster objects, including service groups, resources,
systems, resource types, agents, and clusters. You may enter query commands from any
system in the cluster. Commands to display information on the global cluster
configuration or system states can be executed by all users; you do not need root
privileges.
471
Global Querying
Querying Resources
▼ To display resource attribute values across clusters
# hares -value resource attribute [system] [-clus cluster |
-localclus]
The option -clus displays the attribute value on the cluster designated by the
variable cluster; the option -localclus specifies the local cluster.
If the attribute has local scope, you must specify the system name, except when
querying the attribute on the system from which you run the command.
Chapter 16, Administering Global Clusters from the Command Line 473
Global Querying
Querying Systems
▼ To display system attribute values across clusters
# hasys -value system attribute [-clus cluster | -localclus]
The option -clus displays the values of a system attribute in the cluster as
designated by the variable cluster; the option -localclus specifies the local cluster.
Querying Clusters
▼ For the value of a specific cluster attribute on a specific cluster
# haclus -value attribute [cluster] [-localclus]
The attribute must be specified in this command. If you do not specify the cluster
name, the command displays the attribute value on the local cluster.
▼ To display the state of a local or remote cluster as seen from the local cluster
# haclus -state [cluster] [-localclus]
The variable cluster represents the cluster. If a cluster is not specified, the state of the
local cluster and the state of all remote cluster objects as seen by the local cluster are
displayed.
▼ For information on the state of a local or remote cluster as seen from the local
cluster
# haclus -display [cluster] [-localclus]
If a cluster is not specified, information on the local cluster is displayed.
Chapter 16, Administering Global Clusters from the Command Line 475
Global Querying
Querying Status
▼ For the status of local and remote clusters
# hastatus
Querying Heartbeats
The hahb command is used to manage WAN heartbeats that emanate from the local
cluster. Administrators can monitor the “health” of the remote cluster via heartbeat
commands and mechanisms such as Internet, satellites, or storage replication
technologies. Heartbeat commands are applicable only on the cluster from which they are
issued.
Note You must have Cluster Administrator privileges to add, delete, and modify
heartbeats.
If the -modify option is specified, the usage for the hahb -modify option is
displayed.
▼ To bring a service group online across clusters for the first time
# hagrp -online -force
Chapter 16, Administering Global Clusters from the Command Line 477
Administering Service Groups
Administering Resources
▼ To take action on a resource across clusters
# hares -action resource token [-actionargs arg1 ...] [-sys system]
[-clus cluster |-localclus]
The option -clus implies resources on the cluster. If the designated system is not part
of the local cluster, an error is displayed. If the -sys option is not used, it implies
resources on the local node.
Administering Clusters
▼ To add a remote cluster object
# haclus -add cluster ip
The variable cluster represents the cluster. This command does not apply to the local
cluster.
Chapter 16, Administering Global Clusters from the Command Line 479
Administering Clusters
Administering Heartbeats
▼ To create a heartbeat
# hahb -add heartbeat
For example, type the following command to add a new heartbeat called ICMP1. This
represents a heartbeat sent from the local cluster and immediately forks off the
specified agent process on the local cluster.
# hahb -add ICMP1
▼ To modify a heartbeat
# hahb -modify heartbeat attribute value ... [-clus cluster]
If the attribute is local, that is, it has a separate value for each remote cluster in the
ClusterList attribute, the option -clus cluster must be specified. Use -delete -keys
to clear the value of any list attributes.
For example, type the following command to modify the ClusterList attribute and
specify targets “phoenix” and “houston” for the newly created heartbeat:
# hahb -modify ICMP ClusterList phoenix houston
▼ To delete a heartbeat
# hahb -delete heartbeat
Chapter 16, Administering Global Clusters from the Command Line 481
Administering Heartbeats
Note Cluster Manager (Java Console) provides disabled individuals access to and use of
information and data that is comparable to the access and use provided to
non-disabled individuals. Refer to the appendix “Accessibility and VCS” for more
information.
483
Adding a Remote Cluster
Note VERITAS does not support adding a cluster that is already part of a global cluster
environment. To merge the clusters of one global cluster environment (for example,
cluster A and cluster B) with the clusters of another global environment (for
example, cluster C and cluster D), separate cluster C and cluster D into standalone
clusters and add them one by one to the environment containing cluster A and
cluster B.
1. From Cluster Explorer, click Add/Delete Remote Cluster on the Edit menu.
or
From the Cluster Explorer configuration tree, right-click the cluster name, and click
Add/Delete Remote Clusters.
2. Review the required information for the Remote Cluster Configuration Wizard and
click Next.
b. Click Next.
a. Enter the host name of a cluster system, an IP address of a cluster system, or the
IP address of the cluster that will join the global environment.
d. Click Next.
5. Enter the details of the existing remote clusters; this information on administrator
rights enables the wizard to connect to all the clusters and make changes to the
configuration:
Chapter 17, Administering Global Clusters from Cluster Manager (Java Console) 485
Adding a Remote Cluster
6. Click the Configure icon. The Remote cluster information dialog box is displayed.
a. Enter the host name of a cluster system, an IP address of a cluster system, or the
IP address of the cluster that will join the global environment.
e. Click OK.
7. Click Next.
8. Click Finish. After running the wizard, the configurations on all the relevant clusters
are opened and changed; the wizard does not close the configurations.
Note Command Center enables you to perform operations on the local cluster; this does
not affect the overall global cluster configuration.
4. Click Apply.
Chapter 17, Administering Global Clusters from Cluster Manager (Java Console) 487
Deleting a Remote Cluster
Note You cannot delete a remote cluster if the cluster is part of a cluster list for global
service groups or global heartbeats, or if the cluster is in the RUNNING, BUILD,
INQUIRY, EXITING, or TRANSITIONING states.
1. From Cluster Monitor, log on to the cluster that will be deleted from the global cluster
environment.
2. In the Service Groups tab of the Cluster Explorer configuration tree, right-click the
wac resource under the Application type in the ClusterService group.
or
Click a service group in the configuration tree, click the Resources tab, and right-click
the wac resource in the view panel.
3. Click Offline, and click the appropriate system from the menu.
1. From Cluster Explorer, click Configure Global Groups on the Edit menu.
2. Click Next.
b. For global to local cluster conversion, click the left arrow to move the cluster
name from the cluster list back to the Available Clusters box.
c. Click Next.
4. Enter or review the connection details for each cluster. Click the Configure icon to
review the remote cluster information for each cluster.
a. Enter the IP address of the remote cluster, the IP address of a cluster system, or the
host name of a cluster system.
e. Click OK.
5. Click Next.
6. Click Finish.
1. From Cluster Explorer, click Add/Delete Remote Cluster on the Edit menu.
or
From the Cluster Explorer configuration tree, right-click the cluster name, and click
Add/Delete Remote Clusters.
2. Review the required information for the Remote Cluster Configuration Wizard and
click Next.
Chapter 17, Administering Global Clusters from Cluster Manager (Java Console) 489
Deleting a Remote Cluster
b. Click Next.
b. Click Next.
5. Review the connection details for each cluster. Click the Configure icon to review the
remote cluster information for each cluster.
a. Enter the IP address of the remote cluster, the IP address of a cluster system, or the
host name of a cluster system.
e. Click OK.
6. Click Finish.
Chapter 17, Administering Global Clusters from Cluster Manager (Java Console) 491
Administering Global Service Groups
2. Review the information required for the Global Group Configuration Wizard and
click Next.
a. Click the name of the service group that will be converted from a local group to a
global group, or vice versa.
b. From the Available Clusters box, click the clusters on which the group can come
online. Click the right arrow to move the cluster name to the Clusters for Service
Group box; for global to local cluster conversion, click the left arrow to move the
cluster name back to the Available Clusters box. A priority number (starting with
0) indicates the cluster in which the group will attempt to come online. If
necessary, double-click the entry in the Priority column to enter a new value.
d. Click Next.
Chapter 17, Administering Global Clusters from Cluster Manager (Java Console) 493
Administering Global Service Groups
Click the Configure icon to review the remote cluster information for each cluster.
a. Enter the IP address of the remote cluster, the IP address of a cluster system, or the
host name of a cluster system.
d. Click OK.
Repeat these steps for each cluster in the global environment.
6. Click Finish.
b. Click the specific system, or click Any System, to bring the group online.
c. Click OK.
Chapter 17, Administering Global Clusters from Cluster Manager (Java Console) 495
Administering Global Service Groups
b. Click the specific system, or click All Systems, to take the group offline.
c. Click OK.
b. Click the specific system, or click Any System, to take the group offline.
c. Click OK.
Chapter 17, Administering Global Clusters from Cluster Manager (Java Console) 497
Administering Global Heartbeats
b. Select the check box next to the name of the cluster to add it to the cluster list for
the heartbeat.
c. Click the icon in the Configure column to open the Heartbeat Settings dialog
box.
d. Specify the value of the Arguments attribute and various timeout and interval
fields. Click + to add an argument value; click - to delete it.
e. Click OK.
3. Click Apply.
Chapter 17, Administering Global Clusters from Cluster Manager (Java Console) 499
Administering Global Heartbeats
c. Select or clear the check box next to the name of a cluster to add or remove it from
the cluster list for the heartbeat.
d. If necessary, click the icon in the Configure column to open the Heartbeat
Settings dialog box. Otherwise, proceed to step 2g.
e. Change the values of the Arguments attribute and various timeout and interval
fields. Click + to add an argument value; click - to delete it.
f. Click OK.
3. Click Apply.
Chapter 17, Administering Global Clusters from Cluster Manager (Java Console) 501
Administering Global Heartbeats
Note Cluster Manager (Web Console) provides disabled individuals access to and use of
information and data that is comparable to the access and use provided to
non-disabled individuals. Refer to the appendix “Accessibility and VCS” for more
information.
503
Adding a Remote Cluster
1. From the Cluster Summary page, click Add Remote Cluster in the left pane.
a. Enter the IP address of the cluster, the IP address of a cluster system, or the name
of a cluster system.
d. Click Next.
Chapter 18, Administering Global Clusters from Cluster Manager (Web Console) 505
Adding a Remote Cluster
3. Click Finish.
Note You cannot delete a remote cluster if the cluster is part of a cluster list for global
service groups or global heartbeats, or if the cluster is in the RUNNING, BUILD,
INQUIRY, EXITING, or TRANSITIONING states.
b. Click OK.
Chapter 18, Administering Global Clusters from Cluster Manager (Web Console) 507
Deleting a Remote Cluster
b. For global to local cluster conversion, select the cluster to delete in the Current
ClusterList box.
c. Click the left arrow to move the cluster name from the current cluster list back to
the Available Clusters box.
e. Click Next.
3. Click the edit icon (...) in the Settings column to specify information about each
cluster.
a. Enter the IP address of the remote cluster, the IP address of a cluster system, or
the host name of a cluster system.
d. Click Next.
Chapter 18, Administering Global Clusters from Cluster Manager (Web Console) 509
Deleting a Remote Cluster
5. Click No if you want the operation to be completed only if the wizard can connect to
all selected clusters.
Click Next.
6. Click Finish.
2. In the Remove Cluster dialog box, select the cluster to delete and click Next.
3. Click the edit icon (...) in the Settings column to specify information about each
cluster.
a. Enter the IP address of the remote cluster, the IP address of a cluster system, or the
host name of a cluster system.
Chapter 18, Administering Global Clusters from Cluster Manager (Web Console) 511
Deleting a Remote Cluster
d. Click Next.
5. Click No if you want the operation to be completed only if the wizard can connect to
all selected clusters.
6. Click Next.
7. Click Finish.
Chapter 18, Administering Global Clusters from Cluster Manager (Web Console) 513
Administering Global Service Groups
a. Select the service group that will serve as the global group.
b. From the Available Clusters box, select the clusters on which the global group
can come online. Click the right arrow to move the cluster name to the Current
ClusterList box.
d. Click Next.
3. Click the edit icon (...) in the Settings column to specify information about the remote
cluster.
a. Enter the IP address of the remote cluster, the IP address of a cluster system, or the
host name of a cluster system.
c. Enter the priority number (starting with 0) for the cluster on which the global
group will attempt to come online.
e. Click Next.
Chapter 18, Administering Global Clusters from Cluster Manager (Web Console) 515
Administering Global Service Groups
5. Click No if you want the operation to be completed only if the wizard can connect to
all selected clusters.
6. Click Next.
7. Click Finish.
a. Select the cluster in which to bring the service group online, or click Anywhere.
b. Select the system on which to bring the service group online, or click Anywhere.
c. To run a PreOnline script, select the Run preonline script check box. This
user-defined script checks for external conditions before bringing a group online.
d. Click OK.
Chapter 18, Administering Global Clusters from Cluster Manager (Web Console) 517
Administering Global Service Groups
a. Select the cluster from which to take the service group offline.
a. Select the system from which to take the service group offline
b. Click OK.
1. From the Service Group page, click Switch in the left pane.
a. Select the cluster to switch the service group to, or click Anywhere.
b. Select the system to switch the service group to, or click Anywhere.
c. Click OK.
Chapter 18, Administering Global Clusters from Cluster Manager (Web Console) 519
Administering Global Heartbeats
b. Clear the check box next to the cluster name if you do not want that cluster added
to the cluster list for the heartbeat.
c. Click the edit icon (...) in the Settings column to specify the value for the
Arguments attribute and various timeout and interval fields.
d. After entering the necessary values in the Advanced Heartbeat Attributes dialog
box, click Save.
e. Click OK.
1. From the Cluster Heartbeats page, click Delete Heartbeat in the left pane.
2. In the Delete Heartbeat dialog box, select the heartbeat and click OK.
Chapter 18, Administering Global Clusters from Cluster Manager (Web Console) 521
Administering Global Heartbeats
b. If necessary, alter the cluster list for the heartbeat by clearing the appropriate
check boxes.
c. Click the edit icon (...) in the Settings column to alter the values of the Arguments
attribute and various timeout and interval fields.
d. After changing the necessary values in the Advanced Heartbeat Attributes dialog
box, click Save.
e. Click OK.
Note VVR supports multiple replication secondary targets for any given primary.
However, RDC for VCS supports only one replication secondary for a primary.
An RDC configuration is appropriate in situations where dual dedicated LLT links are
available between the primary site and the disaster recovery secondary site but lacks
shared storage or SAN interconnect between the primary and secondary data centers. In
an RDC, data replication technology is employed to provide node access to data in a
remote site.
Note You must use dual dedicated LLT links between the replicated nodes.
523
How VCS Replicated Data Clusters Work
Public Clients
Zone 0 Network Redirected Zone 1
Private Network
Oracle Oracle
Group Group
Application
Failover
Replicated
Data
Separate Separate
Storage Storage
In the event of a system or application failure, VCS attempts to fail over the Oracle service
group to another system within the same RDC zone. However, in the event that VCS fails
to find a failover target node within the primary RDC zone, VCS switches the service
group to a node in the current secondary RDC zone (zone 1). VCS also redirects clients
once the application is online on the new location.
Application Group
Application
MountV Lanman
RVGPrimary IP
NIC
Replication Group
RVG
DiskGroup IP
NIC
Setting Up Replication
VERITAS Volume Replicator (VVR) technology is a license-enabled feature of VERITAS
Volume Manager (VxVM), so you can convert VxVM-managed volumes into replicated
volumes managed using VVR. In this example, the process involves grouping the Oracle
data volumes into a Replicated Volume Group (RVG), and creating the VVR Secondary on
hosts in another VCS cluster, located in your DR site.
When setting up VVR, it is a best practice to use the same DiskGroup and RVG name on
both sites. This means that just the RLinks attribute needs to be modified to reflect the
name of the secondary RLink. If the volume names are the same on both zones, the Mount
resources will mount the same block devices, and the same Oracle instance will start on
the secondary in case of a failover.
1. Create a hybrid service group (oragrp_rep) for replication. You can use the
VvrRvgGroup template to create the service group. For more information about
hybrid service groups, see “Types of Service Groups” on page 12.
2. Copy the DiskGroup resource from the application to the new group. Configure the
resource to point to the disk group that contains the RVG.
4. Configure a new resource of type RVG in the service group. The RVG agent ships
with the VVR software. If the RVG resource type is not defined in your configuration,
import it, as instructed below.
b. In the Import Types dialog box, Click the file from which to import the resource
type. By default, the RVG resource type is located at the path
/etc/VRTSvcs/conf/VVRTypes.cf.
c. Click Import.
Note The RVG resource starts, stops, and monitors the RVG in its current state and does
not promote or demote VVR when you want to change the direction of replication.
The RVGPrimary agent manages that task.
7. Set the SystemZones attribute of the child group, oragrp_rep, such that all nodes in
the primary RDC zone are in system zone 0 and all nodes in the secondary RDC zone
are in system zone 1.
1. In the original Oracle service group (oragroup), delete the DiskGroup resource.
3. Set resource dependencies such that all Mount resources depend on the RVGPrimary
resource. If there are a lot of Mount resources, you can set the TypeDependencies
attribute for the group to denote that the Mount resource type depends on the
RVGPRimary resource type.
4. Set the SystemZones attribute of the Oracle service group such that all nodes in the
primary RDC zone are in system zone 0 and all nodes in the secondary RDC zone are
in zone 1. The SystemZones attribute of both the parent and the child group must be
identical.
5. If your setup uses BIND DNS, add a resource of type DNS to the oragroup service
group. Set the Hostname attribute to the canonical name of the host or virtual IP
address that the application uses on that cluster. This ensures DNS updates to the site
when the group is brought online. A DNS resource would be necessary only if the
nodes in the primary and the secondary RDC zones are in different IP subnets.
2. In the view panel, click the Service Groups tab. This opens the service group
dependency graph.
3. Click Link.
4. Click the parent group oragroup and move the mouse toward the child group,
oragroup_rep.
6. On the Link Service Groups dialog box, click the online local relationship and the
hard dependency type and click OK.
1. In the Service Groups tab of the configuration tree, right-click the resource.
2. Click Actions.
c. Click OK.
This begins a fast-failback of the replicated data set. You can monitor the value of the
ResourceInfo attribute for the RVG resource to determine when the resynchronization
has completed.
4. Once the resynchronization completes, switch the service group to the primary
cluster.
a. In the Service Groups tab of the of the Cluster Explorer configuration tree,
right-click the service group.
b. Click Switch To and select the system in the primary RDC zone to switch to and
click OK.
531
How VCS Campus Clusters Work
Campus Cluster
Site A Site B
Node 1 Node 2
The disk group is configured in VCS as a resource of type DiskGroup and is mounted
using the Volume resource type. A resource of type CampusCluster monitors the paths to
the disk group.
VCS continuously monitors and communicates events between cluster nodes. In the event
of a system or application failure, VCS attempts to fail over the Oracle service group to
another system in the cluster. VCS ensures that the disk group is imported by the node
hosting the Oracle service group. If the original system comes up again, the VCS
CampusCluster agent initiates a fast mirror resync (FMR) to synchronize data at both
sites.
Takeover
In case of an outage at site A, VCS imports the disk group at site B and fails the service
group over to the node at site B. The disk group is imported with all devices at the failed
site marked as NODEVICE.
Fast-failback
Fast-failback provides the ability to resynchronize changed regions after a takeover if the
original side returns in its original form, with minimal downtime.
When site A comes up again, the Volume Manager Dual Multi-pathing Daemon (DMP)
detects the disks at site A and adds them to the disk group.
In this scenario, the CampusCluster agent performs a fast resynchronization of the
original disks.
Link Failure
If a link between a node and its shared storage breaks, the node loses access to the remote
disks but no takeover occurs. A power outage at the remote site could cause this situation.
Because the host has its ID stamped on the disks, when the disks return, the
CampusCluster agent initiates a fast mirror resync.
Split-brain
Split-brain occurs when all heartbeat links between the hosts are cut and each side
mistakenly thinks the other side is down. To minimize the effects of split-brain, make sure
the LLT and heartbeat links are robust and do not fail at the same time.
Minimize risks by running heartbeat traffic and I/O traffic over same physical medium
using technologies like DWDM. So if heartbeats are disrupted, the I/O communication is
disrupted too. Each site interprets the situation as a takeover or as link failure.
If you use SCSI III fencing in a two-site campus cluster, you must distribute coordinator
disks such that you have two disks at one site and one disk at the other site. If the site with
the two coordinator disks goes down, the other site panics to protect the data and must be
restarted with the vxfenconfig command. VERITAS recommends having a third site
with a coordinator disk. See “Coordinator Disks” on page 318 for more information.
Prerequisites
✔ Verify VERITAS Volume Manager 4.1 is installed with the FMR license.
✔ You must have a single VCS cluster with at least one node in each of two sites, where
the sites are separated by a physical distance of no more than 80 kilometers. they must
share data and have a private heartbeat network.
✔ All volumes that have data required by the application must be evenly mirrored. Each
site must have at least one plex of all volumes hosting application data, including the
FMR Log volume.
✔ VERITAS recommends that you disable the Volume Manager Relocation Daemon to
prevent plex relocation when the remote site suffers an outage.
✔ VERITAS recommends that you distinguish the physical location of each disk either
by controller number or enclosure name. For example, controller 2 manages local
devices and controller 3 manages remote devices. Differentiation by controller
number avoids the need to scan all disks; differentiation by enclosure might help in
disk placement.
Instructions
1. Set up the physical infrastructure. Verify each node is has access to the local storage
arrays and to remote storage arrays.
4. Create one or more mirrored volumes in the disk group; do not create a mirror
between two disks at the same physical location, such as in the same array.
5. Verify the disk group can be manually deported and imported on each node in the
cluster.
◆ Chapter 21. “Predicting VCS Behavior Using VCS Simulator” on page 539
539
Installing VCS Simulator
2. From Windows Explorer, navigate to the path of the Simulator installer file, located at
windows\WindowsInstallers\WindowsSimulator\EN\.
5. In the Destination Folders dialog box, click Next to accepted the suggested
installation path or click Change to choose a different location.
6. In the Ready to Install the Program dialog box, click Back to make changes to your
selections or click Install to proceed with the installation.
Directory Contents
Simulator Ports
VCS Simulator uses the following ports:
◆ Ports 15550 through 15558 to connect to the various cluster configurations.
◆ Ports 15560 through 15563 for the wide area connector (WAC) process.
Set the WAC port to -1 to disable WAC simulation.
1. Type the following command to grant the system permission to display on the
desktop:
# xhost +
2. Configure the shell environment variable DISPLAY on the system where Cluster
Manager will be launched. For example, if using Korn shell, type the following
command to display on the system myws:
# export DISPLAY=myws:0
b. Accept the suggested system name or enter a new name for a system in the
cluster.
e. If the cluster will be part of a a global cluster configuration, select the Enable
Global Cluster Option check box and enter a unique port number for the
wide-area connector (WAC) process.
f. Click OK.
VCS creates a simulated one-node cluster and creates a new directory for the cluster’s
configuration files. VCS also creates a user called admin with Cluster Administrator
privileges. You can start the simulated cluster and administer it by launching the Java
Console.
3. After the cluster starts, click Launch Console to administer the cluster.
Note VCS Simulator does not validate passwords; you can log on to a simulated cluster
by just entering a valid VCS user name. If you use the default configuration, enter
admin for the user name.
The animated display shows various objects, such as service groups and resources, being
transferred from the server to the console.
Cluster Explorer is launched upon initial logon, and the icons in the cluster panel change
color to indicate an active panel.
Note Select the Enable Global Cluster Option check box and enter a unique port number
for the wide-area connector (WAC) process.
a. Select an existing global cluster or enter the name for a new global cluster.
b. From the Available Clusters list, select the clusters to add to the global cluster
and click the right arrow. The clusters move to the Configured Clusters list.
c. Click OK.
3. Click OK.
2. Right-click an online resource, click Fault Resource, and click the system name.
2. Right-click the cluster, click Fault Cluster, and click the cluster name.
2. Right-click the cluster, click Clear Cluster Fault, and click the cluster name.
1. To simulate a cluster running a particular operating system, copy the types.cf. file for
the operating system from the types directory to
/opt/VRTSsim/default_clus/conf/config/.
For example, if the cluster to be simulated runs on the AIX platform, copy the file
types.cf.aix.
2. Add custom type definitions to the file, if required, and rename the file to types.cf.
Use the command line or the Java Console to manage the simulated cluster.
Note For instructions on simulating a global cluster environment, see “To simulate global
clusters from the command line” on page 550.
1. To simulate a cluster running a particular operating system, copy the types.cf. file for
the operating system from the types directory to
%VCS_SIMULATOR_HOME%\default_clus\conf\config\.
For example, if the cluster to be simulated runs on the AIX platform, copy the file
types.cf.aix.
2. Add custom type definitions to the file, if required, and rename the file to types.cf.
Note For instructions on simulating a global cluster environment, see “To simulate global
clusters from the command line” on page 550.
1. Install VCS Simulator in a directory (sim_dir) on your system. For instructions, see
“Installing VCS Simulator” on page 539.
2. Set up the clusters on your system. Run the following command to add a cluster:
# sim_dir/hasim -setupclus clustername -simport
port_no -wacport port_no
Note Do not use default_clus as the cluster name when simulating a global cluster.
Note To create multiple clusters without simulating a global cluster environment, specify
-1 for the wacport.
4. Set the following environment variables to access VCS Simulator from the command
line:
◆ VCS_SIM_PORT=port_number
◆ VCS_SIM_WAC_PORT=wacport
Note that you must set these variables for each simulated cluster, otherwise Simulator
always connects default_clus, the default cluster.
You can use the Java Console to link the clusters and to configure global service groups.
See “Administering the Cluster from Cluster Manager (Java Console)” on page 109 for
more information.
You can also edit the configuration file main.cf manually to create the global cluster
configuration.
Command Description
hasim -start system_name Starts VCS Simulator. The variable system_name represents
the system that will transition from the LOCAL_BUILD state to
RUNNING.
hasim -setupclus Creates a simulated cluster and associates the specified ports
clustername -simport with the cluster.
port_no
[-wacport port_no]
[-sys systemname]
hasim -fault system_name Faults the specified resource on the specified system.
resource_name
hasim -online system_name Brings specified resource online. This command is useful if
resource_name you have simulated a fault of a persistent resource and want
to now simulate the fix.
Command Description
hasim -disablel10n Disables localized inputs for attribute values. Use this option
when simulating UNIX configurations on Windows systems.
553
How Cluster Components Affect Performance
The VCS agent framework uses multithreading to allow multiple resource operations to
run in parallel for the same type of resources. For example, a single Mount agent handles
all mount resources. The number of agent threads for most resource types is 10 by default.
To change the default, modify the NumThreads attribute for the resource type. The
maximum value of the NumThreads attribute is 20.
Continuing with this example, the Mount agent schedules the monitor entry point for all
mount resources, based on the MonitorInterval or OfflineMonitorInterval attributes. If the
number of mount resources is more than NumThreads, the monitor operation for some
mount resources may be required to wait to execute the monitor entry point until the
thread becomes free.
Additional considerations for modifying the NumThreads attribute include:
◆ If you have only one or two resources of a given type, you can set NumThreads to a
lower value.
◆ If you have many resources of a given type, evaluate the time it takes for the monitor
entry point to execute and the available CPU power for monitoring. For example, if
you have 50 mount points, you may want to increase NumThreads to get the ideal
performance for the Mount agent without affecting overall system performance.
You can also adjust how often VCS monitors various entry points by modifying their
associated attributes. The attributes MonitorTimeout, OnlineTimeOut, and
OfflineTimeout indicate the maximum time (in seconds) within which the monitor,
online, and offline entry points must complete or else be terminated. The default for the
MonitorTimeout attribute is 60 seconds. The defaults for the OnlineTimeout and
OfflineTimeout attributes is 300 seconds. For best results, VERITAS recommends
measuring the time it takes to bring a resource online, take it offline, and monitor before
modifying the defaults. Issue an online or offline command to measure the time it takes
for each action. To measure how long it takes to monitor a resource, fault the resource and
issue a probe, or bring the resource online outside of VCS control and issue a probe.
Agents typically run with normal priority. When you develop agents, consider the
following:
◆ If you write a custom agent, write the monitor entry point using C or C++. If you
write a script-based monitor, VCS must invoke a new process each time with the
monitor. This can be costly if you have many resources of that type.
◆ If monitoring the resources is proving costly, you can divide it into cursory, or
shallow monitoring, and the more extensive deep (or in-depth) monitoring. Whether
to use shallow or deep monitoring depends on your configuration requirements.
Note Onlining service groups as part of AutoStart occurs after VCS transitions to
RUNNING mode.
resource is hung. If the resource is hung and causes the monitor entry point to hang, the
time to detect it depends on MonitorTimeout, FaultOnMonitorTimeouts, and the
efficiency of monitor and clean (if implemented).
Note After modifying the peer inactive timeout, you must unconfigure, then restart LLT
before the change is implemented. To unconfigure LLT, type lltconfig -u. To
restart LLT, type lltconfig -c.
Network Failure
If a network partition occurs, a cluster can “split” into two or more separate sub-clusters.
When two clusters join as one, VCS designates that one system be ejected. GAB prints
diagnostic messages and sends iofence messages to the system being ejected. The system
receiving the iofence messages tries to kill the client process. The -k option applied here.
If the -j option is used in gabconfig, the system is halted when the iofence message is
received.
Quick Reopen
If a system leaves cluster and tries to join the cluster before the new cluster is configured
(default is five seconds), the system is sent an iofence message with reason set to “quick
reopen.” When the system receives the message, it tries to kill the client process.
The time it takes to fail over a service group when a system faults equals
◆ the time it takes to detect system fault
◆ the time it takes to offline the service group on source system
◆ the time it takes for the VCS policy module to select target system
◆ the time it takes to bring the service group online on target system
The time it takes the VCS policy module to determine the target system is negligible in
comparison to the other factors.
If you have a firm group dependency and the child group faults, VCS offlines all
immediate and non-immediate parent groups before bringing the child group online on
the target system. Therefore, the time it takes a parent group to be brought online also
depends on the time it takes the child group to be brought online.
Priority Ranges
The following table displays the platform-specific priority range for RealTime,
TimeSharing, and SRM scheduling (SHR) processes.
Engine RT 52 2 Min: 0 57
(Strongest + 2) (Strongest + 2) Max: 99 (Strongest - 2)
Agent TS 0 N/A 0 0
Script TS 0 N/A 0 0
Engine RT Minimum: 0
Maximum: 99
Agent TS
Script TS
Note For standard configurations, VERITAS recommends using the default values for
scheduling unless specific configuration requirements dictate otherwise.
Note that the default priority value is platform-specific. When priority is set to ""
(empty string), VCS converts the priority to a value specific to the platform on which the
system is running. For TS, the default priority equals the strongest priority supported by
the TimeSharing class. For RT, the default priority equals two less than the strongest
priority supported by the RealTime class. So, if the strongest priority supported by the
RealTime class is 59, the default priority for the RT class is 57. For SHR (on Solaris only),
the default priority is the strongest priority support by the SHR class.
Whenever such an event occurs, VCS resets the internally maintained benchmark
average to this new average. VCS sends notifications regardless of whether the
deviation is an increase or decrease in the monitor cycle time.
For example, a value of 25 means that if the actual average monitor time is 25% more
than the benchmark monitor time average, VCS sends a notification.
MonitorStatsParam
MonitorStatsParam is a resource type-level attribute, which stores the required parameter
values for calculating monitor time statistics.
static str MonitorStatsParam = { Frequency = 10, ExpectedValue
= 3000, ValueThreshold = 100, AvgThreshold = 40 }
◆ Frequency: Defines the number of monitor cycles after which the average monitor
cycle time should be computed and sent to the engine. If configured, the value for this
attribute must be between 1 and 30. It is set to 0 by default.
◆ ExpectedValue: The expected monitor time in milliseconds for all resources of this type.
Default=3000.
◆ ValueThreshold: The acceptable percentage difference between the expected monitor
cycle time (ExpectedValue) and the actual monitor cycle time. Default=100.
◆ AvgThreshold: The acceptable percentage difference between the benchmark average
and the moving average of monitor cycle times. Default=40.
MonitorTimeStats
Stores the average time taken by a number of monitor cycles specified by the Frequency
attribute along with a timestamp value of when the average was computed.
str MonitorTimeStats{} = { Avg = "0", TS = "" }
This attribute is updated periodically after a number of monitor cycles specified by the
Frequency attribute. If Frequency is set to 10, the attribute stores the average of 10 monitor
cycle times and is updated after every 10 monitor cycles.
The default value for this attribute is 0.
ComputeStats
A flag that specifies whether VCS keeps track of the monitor times for the resource.
bool ComputeStats = 0
The value 0 indicates that VCS will not keep track of the time taken by the monitor routine
for the resource. The value 1 indicates that VCS keeps track of the monitor time for the
resource.
The default value for this attribute is 0.
Logging
VCS generates two error message logs: the engine log and the agent log. Log file names
are appended by letters. Letter A indicates the first log file, B the second, C the third, and
so on.
The engine log is located at /var/VRTSvcs/log/engine_A.log. The format of engine
log messages is:
Timestamp (Year/MM/DD) | Mnemonic | Severity | UMI| Message Text
569
Logging
Message Catalogs
VCS includes multilingual support for message catalogs. These binary message catalogs
(BMCs), are stored in the following default locations. The variable language represents a
two-letter abbreviation.
/opt/VRTSvcs/messages/language/module_name
/opt/VRTSgab/messages/language/module_name
/opt/VRTSllt/messages/language/module_name
Preonline IP Check
You can enable a preonline check of a failover IP address to protect against network
partitioning. The check pings a service group’s configured IP address to verify it is not
already in use. If it is, the service group is not brought online. A second check verifies the
system is connected to its public and private networks. If the system receives no response
from a broadcast ping to the public network and a check of the private networks, it
determines the system is isolated and does not bring the service group online.
1. Move the preonline trigger script from the sample triggers directory into the triggers
directory:
# cp /opt/VRTSvcs/bin/sample_triggers/preonline_ipc
/opt/VRTSvcs/bin/triggers/preonline
HAD Diagnostics
When the VCS engine HAD core dumps, the core is written to the directory
/var/VRTSvcs/diag/had. When HAD starts, it renames this directory to
had.timestamp, where timestamp represents the time at which the directory was
renamed.
When HAD core dumps, review the contents of the /var/VRTSvcs/diag/had
directory. See the following logs for more information:
◆ Operating system console log
◆ Engine log
◆ hashadow log
VCS runs the script /opt/VRTSvcs/bin/vcs_diag to collect diagnostic information
when HAD and GAB encounter heartbeat problems. The diagnostic information is stored
in the /var/VRTSvcs/diag/had directory.
Troubleshooting Resources
This section cites the most common problems associated with bringing resources online
and taking them offline. Bold text provides a description of the problem. Recommended
action is also included, where applicable.
The Monitor entry point of the disk group agent returns ONLINE even if the disk group is
disabled.
This is expected agent behavior. VCS assumes that data is being read from or written
to the volumes and does not declare the resource as offline. This prevents potential
data corruption that could be caused by the disk group being imported on two hosts.
You can deport a disabled disk group when all I/O operations are completed or when
all volumes are closed. You can then reimport the disk group to the same system.
Note A disk group is disabled if data including the kernel log, configuration copies, or
headers in the private region of a significant number of disks is invalid. Volumes
can perform read-write operations if no changes are required to the private regions
of the disks.
Troubleshooting Notification
Occasionally you may encounter problems when using VCS notification. This section cites
the most common problems and the recommended actions. Bold text provides a
description of the problem.
Unable to view Cluster Manager on a browser using the Virtual IP/port number in URL
(http://[virtual_ip:port_number]/vcs).
Recommended Action: Verify that the ClusterService service group, which has the IP
and VRTSWebApp resources configured on it, is not offline or faulted on any node. If
it is, use the command line to bring the group back online on at least one node.
✔ Web server port unavailable: By default, the Web server binds itself to ports 8181,
8443, and 14300. If these ports are being used by another application, the Web server
will fail to start.
To determine if this is the reason, review the last few lines of the log file
/var/VRTSweb/log/_start0.0.log. If the output resembles the example below,
the Web server port is already taken by another application:
5/28/03 8:13:35 PM PDT VRTSWEB INFO V-12-1-1041 Exception
encountered
LifecycleException: Protocol handler initialization failed:
java.net.BindException: Address already in use: JVM_Bind:8181
at org.apache.coyote.tomcat4.CoyoteConnector.initialize
(CoyoteConnector.java:1119)
at org.apache.catalina.startup.Embedded.start(Embedded.java:999)
at vrts.tomcat.server.VRTSweb.initServer(VRTSweb.java:2567)
at vrts.tomcat.server.VRTSweb.commandStartServer
(VRTSweb.java:385)
at vrts.tomcat.server.command.start.StartCommand.execute
(StartCommand.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at vrts.tomcat.bootstrap.Main.main(Main.java:243)
Recommended Action: If you cannot make this port available for VRTSweb, see
“Configuring Ports for VRTSweb” on page 656 for instructions on how to change the
value of the Web server port.
✔ Web server IP address unavailable: By default, the Web server binds itself to all IP
addresses on the machine for the default ports 8181 and 8443. If you configure a
specific IP address for the port, verify this IP address is available on the machine
before the Web server starts. The Web server will fail to start if this IP address is not
present on the machine.
To determine if this is the reason, review the last few lines of the two log files
/var/VRTSweb/log/_start0.0.log and
/var/VRTSweb/log/_command0.0.log. If the output resembles the example
below, the IP address is not available:
5/28/03 8:20:16 PM PDT VRTSWEB INFO V-12-1-1041 Exception
encountered
LifecycleException: Protocol handler initialization failed:
java.net.BindException: Cannot assign requested address:
JVM_Bind:8181
at org.apache.coyote.tomcat4.CoyoteConnector.initialize
(CoyoteConnector.java:1119)
at org.apache.catalina.startup.Embedded.start(Embedded.java:999)
at vrts.tomcat.server.VRTSweb.initServer(VRTSweb.java:2567)
at vrts.tomcat.server.VRTSweb.commandStartServer
(VRTSweb.java:385)
at vrts.tomcat.server.command.start.StartCommand.execute
(StartCommand.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at vrts.tomcat.bootstrap.Main.main(Main.java:243)
LifecycleException: Protocol handler initialization failed:
java.net.BindException: Cannot assign requested address:
JVM_Bind:8181
at org.apache.coyote.tomcat4.CoyoteConnector.initialize
(CoyoteConnector.java:1119)
at org.apache.catalina.startup.Embedded.start(Embedded.java:999)
at vrts.tomcat.server.VRTSweb.initServer(VRTSweb.java:2567)
at vrts.tomcat.server.VRTSweb.commandStartServer
(VRTSweb.java:385)
at vrts.tomcat.server.command.start.StartCommand.execute
(StartCommand.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at vrts.tomcat.bootstrap.Main.main(Main.java:243)
Recommended Action: Make this IP address available on the machine and try to bring
the VCSweb resource online again.
After reconfiguring virtual IP address, cannot access the Web Console using the new IP
address.
Recommended Action: If ClusterService service group is online, changes in resource
attributes do not take effect until you take the service group offline and bring it online.
Therefore, you cannot access the Web Console using the new IP address, but you can
from the previous address. To reconfigure the virtual IP address:
Flashing colors appear on Netscape while switching between Cluster Manager and other
open windows.
Recommended Action: If there are flashes of color while viewing Cluster Manager on
Netscape Navigator 4.7 or later, it is mostly likely a color-mapping issue. Set the
display to 256 colors or a higher on the host machine where the GUI is being viewed
to ensure best color and clarity.
“The object type specified is invalid. It should be one of cluster, group, type, resource, or
system.”
Recommended Action: This error (#W10002) occurs if the page URL points to a VCS
object that does not exist or was deleted. If you typed the URL, verify the URL is
correct. Names of VCS objects are case-sensitive: the object name in the URL must be
entered in the correct case. If you clicked a link and got this error, refresh the page and
retry. If you are still unsuccessful, contact VERITAS Technical Support.
“The specified resource type does not exist or has been deleted.”
Recommended Action: This error (#W10005) indicates the resource type whose
information you tried to access does not exist, or was deleted. If you typed the URL,
verify the URL is correct. If you clicked a link to get information about the resource
type, verify the resource type exists. Refresh the display to get current information.
“Retrieving data from the VCS engine. Please try after some time.”
Recommended Action: This error (#R10001) indicates a “snapshot” of the VCS engine,
HAD, is being taken. Wait a few moments then retry the operation.
“The user could not be authenticated at this time. This could be because a snapshot of the
VCS Server is being taken currently.”
Recommended Action: This error (#H10001) indicates a snapshot of the VCS engine is
being taken. Wait a few moments then retry the operation.
“The URL you specified can be accessed only if you are logged on.”
Recommended Action: This error (#G10001) indicates you tried to access a page that
requires authentication. Log on to VCS and retry the operation.
Disaster Declaration
When a cluster in a global cluster transitions to the FAULTED state because it can no longer
be contacted, failover executions depend on whether the cause was due to a split-brain,
temporary outage, or a permanent disaster at the remote cluster.
If you choose to take action on the failure of a cluster in a global cluster, VCS prompts you
to declare the type of failure.
◆ Disaster, implying permanent loss of the primary data center
◆ Outage, implying the primary may return to its current form in some time
◆ Disconnect, implying a split-brain condition; both clusters are up, but the link between
them is broken
◆ Replica, implying that data on the takeover target has been made consistent from a
backup source and that the RVGPrimary can initiate a takeover when the service
group is brought online. This option applies to VVR environments only.
You can select the groups to be failed over to the local cluster, in which case VCS brings
the selected groups online on a node based on the group’s FailOverPolicy attribute. It also
marks the groups as being offline in the other cluster. If you do not select any service
groups to fail over, VCS takes no action except implicitly marking the service groups as
offline on the downed cluster.
VCS Alerts
VCS alerts are identified by the alert ID, which is comprised of the following elements:
◆ alert_type—The type of the alert, described in “Types of Alerts.”
◆ cluster—The cluster on which the alert was generated
◆ system—The system on which this alert was generated
◆ object—The name of the VCS object for which this alert was generated. This could
be a cluster or a service group.
Alerts are generated in the following format:
alert_type-cluster-system-object
For example:
GNOFAILA-Cluster1-oracle_grp
This is an alert of type GNOFAILA generated on cluster Cluster1 for the service group
oracle_grp.
Types of Alerts
VCS generates the following types of alerts.
◆ CFAULT—Indicates that a cluster has faulted
◆ GNOFAILA—Indicates that a global group is unable to fail over within the cluster
where it was online. This alert is displayed if the ClusterFailOverPolicy attribute is set
to Manual and the wide-area connector (wac) is properly configured and running at
the time of the fault.
◆ GNOFAIL—Indicates that a global group is unable to fail over to any system within
the cluster or in a remote cluster.
Some reasons why a global group may not be able to fail over to a remote cluster:
◆ The ClusterFailOverPolicy is set to either Auto or Connected and VCS is unable to
determine a valid remote cluster to which to automatically fail the group over.
◆ The ClusterFailOverPolicy attribute is set to Connected and the cluster in which
the group has faulted cannot communicate with one ore more remote clusters in
the group's ClusterList.
◆ The wide-area connector (wac) is not online or is incorrectly configured in the
cluster in which the group has faulted
Managing Alerts
Alerts require user intervention. You can respond to an alert in the following ways:
◆ If the reason for the alert can be ignored, use the Alerts dialog box in the Java or Web
consoles or the haalert command to delete the alert. You must provide a comment
as to why you are deleting the alert; VCS logs the comment to engine log.
◆ Take an action on administrative alerts that have actions associated with them.You
can do so using the Java or Web consoles. See “Actions Associated with Alerts” for
more information.
◆ VCS deletes or negates some alerts when a negating event for the alert occurs. See
“Negating Events” for more information.
An administrative alert will continue to live if none of the above actions are performed
and the VCS engine (HAD) is running on at least one node in the cluster. If HAD is not
running on any node in the cluster, the administrative alert is lost.
Negating Events
VCS deletes a CFAULT alert when the faulted cluster goes back to the running state
VCS deletes the GNOFAILA and GNOFAIL alerts in response to the following events:
◆ The faulted group's state changes from FAULTED to ONLINE.
◆ The group's fault is cleared.
◆ The group is deleted from the cluster where alert was generated.
Troubleshooting Licensing
This section cites problems you may encounter with VCS licensing. It provides
instructions on how to validate license keys and lists the error messages associated with
licensing.
2. Run the vxlicrep command to make sure a VCS key is installed on the system. The
output of the command resembles:
VERITAS License Manager vxlicrep utility version 3.02.003
Copyright (C) VERITAS Software Corp 2002. All Rights reserved.
Features :=
Platform = Linux
Version = 4.1
Tier = Unused
Reserved = 0
Mode = VCS
Global Cluster Option= Enabled
If the output shows the License Type for a VCS key as DEMO, ensure that the Demo
End Date does not display a past date.
Make sure the Mode attribute displays the correct value.
If you have purchased a license key for the Global Cluster Option, make sure its status
is Enabled.
4. Start VCS. If HAD rejects a license key, see the licensing error message at the end of
the engine_A log file.
[Licensing] Your evaluation period for the %s feature has expired. This feature will not be
enabled the next time VCS starts
The evaluation period for the specified VCS feature has expired.
Administration Matrices
Review the matrices in the following section to determine which command options can be
executed within a specific user category. Checkmarks denote the command and option
can be executed. A dash indicates they cannot.
haagent
595
Administration Matrices
haattr
hacli
Do not use hacli to invoke a command on a remote system that requires user input. The
process can hang and consume resources.
haclus
-display ✔ ✔ ✔ ✔ ✔
-value ✔ ✔ ✔ ✔ ✔
-modify – – – – ✔
Note Only users with root
privileges can execute
the command
haclus -modify
HacliUserLevel.
-add – – – – ✔
delete – – – – ✔
-declare – – – ✔ ✔
-state ✔ ✔ ✔ ✔ ✔
-list ✔ ✔ ✔ ✔ ✔
-status ✔ ✔ ✔ ✔ ✔
-updatelic ✔
haconf
hadebug
hagrp
-add – – – – ✔
-delete – – – – ✔
-link – – – – ✔
-unlink – – – – ✔
-clear – ✔ ✔ ✔ ✔
-online – ✔ ✔ ✔ ✔
-offline – ✔ ✔ ✔ ✔
-state ✔ ✔ ✔ ✔ ✔
-switch – ✔ ✔ ✔ ✔
-freeze – ✔ ✔ ✔ ✔
-freeze -persistent – – ✔ – ✔
-unfreeze – ✔ ✔ ✔ ✔
-unfreeze -persistent – – ✔ – ✔
-enable – – ✔ – ✔
-disable – – ✔ – ✔
-modify – – ✔ – ✔
-display ✔ ✔ ✔ ✔ ✔
-dep ✔ ✔ ✔ ✔ ✔
-resources ✔ ✔ ✔ ✔ ✔
-list ✔ ✔ ✔ ✔ ✔
-value ✔ ✔ ✔ ✔ ✔
-enableresources – – ✔ – ✔
-disableresources – – ✔ – ✔
-flush – ✔ ✔ ✔ ✔
-autoenable – ✔ ✔ ✔ ✔
-ignore – ✔ ✔ ✔ ✔
hahb
halog
hareg
hares
-add – – ✔ – ✔
-delete – – ✔ – ✔
-local – – ✔ – ✔
-global – – ✔ – ✔
-link – – ✔ – ✔
-unlink – – ✔ – ✔
-clear – ✔ ✔ ✔ ✔
-online – ✔ ✔ ✔ ✔
-offline – ✔ ✔ ✔ ✔
-offprop – ✔ ✔ ✔ ✔
-modify – – ✔ – ✔
-state ✔ ✔ ✔ ✔ ✔
-display ✔ ✔ ✔ ✔ ✔
-dep ✔ ✔ ✔ ✔ ✔
-list ✔ ✔ ✔ ✔ ✔
-value ✔ ✔ ✔ ✔ ✔
-probe – ✔ ✔ ✔ ✔
-override – – ✔ – ✔
-undo_override – – ✔ – ✔
-action – ✔ ✔ ✔ ✔
-refreshinfo – ✔ ✔ ✔ ✔
-flushinfo – ✔ ✔ ✔ ✔
hastatus
-sound ✔ ✔ ✔ ✔ ✔
-summary ✔ ✔ ✔ ✔ ✔
-sound -group ✔ ✔ ✔ ✔ ✔
hasys
-add – – – – ✔
-delete – – – – ✔
-freeze – – – ✔ ✔
-freeze -persistent – – – – ✔
-freeze -evacuate – – – – ✔
-freeze -persistent – – – – ✔
-evacuate
-unfreeze – – – ✔ ✔
-unfreeze -persistent – – – – ✔
-display ✔ ✔ ✔ ✔ ✔
-force – – – – ✔
-load – – – – ✔
-modify – – – – ✔
-state ✔ ✔ ✔ ✔ ✔
-list ✔ ✔ ✔ ✔ ✔
-value ✔ ✔ ✔ ✔ ✔
-nodeid ✔ ✔ ✔ ✔ ✔
-updatelic -sys – – – – ✔
-updatelic -all – – – – ✔
hatype
hauser
-add – – – – ✔
-delete – – – – ✔
-update – ✔ ✔ ✔ ✔
-display ✔ ✔ ✔ ✔ ✔
-list ✔ ✔ ✔ ✔ ✔
-addpriv – – ✔ – ✔
-delpriv – – ✔ – ✔
607
Remote Cluster States
The following table provides a list of VCS remote cluster states and their descriptions. See
“Examples of System State Transitions” on page 612 for more information.
State Definition
INIT The initial state of the cluster. This is the default state.
BUILD The local cluster is receiving the initial snapshot from the remote cluster.
RUNNING Indicates the remote cluster is running and connected to the local cluster.
LOST_HB The connector process on the local cluster is not receiving heartbeats from
the remote cluster
LOST_CONN The connector process on the local cluster has lost the TCP/IP connection to
the remote cluster.
UNKNOWN The connector process on the local cluster determines the remote cluster is
down, but another remote cluster sends a response indicating otherwise.
INQUIRY The connector process on the local cluster is querying other clusters on
which heartbeats were lost.
TRANSITIONING The connector process on the remote cluster is failing over to another node
in the cluster.
◆ If a cluster loses all heartbeats to a remote cluster in the RUNNING state, inquiries are
sent. If all inquiry responses indicate the remote cluster is actually down, the cluster
transitions the remote cluster state to FAULTED:
RUNNING -> LOST_HB -> INQUIRY -> FAULTED
◆ If at least one response does not indicate the cluster is down, the cluster transitions the
remote cluster state to UNKNOWN:
RUNNING -> LOST_HB -> INQUIRY -> UNKNOWN
◆ When the ClusterService service group, which maintains the connector process as
highly available, fails over to another system in the cluster, the remote clusters
transition their view of that cluster to TRANSITIONING, then back to RUNNING after the
failover is successful:
RUNNING -> TRANSITIONING -> BUILD -> RUNNING
◆ When a remote cluster in a RUNNING state is stopped (by taking the ClusterService
service group offline), the remote cluster transitions to EXITED:
RUNNING -> EXITING -> EXITED
System States
Whenever the VCS engine is running on a system, it is in one of the states described in the
table below. States indicate a system’s current mode of operation. When the engine is
started on a new system, it identifies the other systems available in the cluster and their
states of operation. If a cluster system is in the state of RUNNING, the new system retrieves
the configuration information from that system. Changes made to the configuration while
it is being retrieved are applied to the new system before it enters the RUNNING state.
If no other systems are up and in the state of RUNNING or ADMIN_WAIT, and the new
system has a configuration that is not marked “stale,” the engine transitions to the state
LOCAL_BUILD, and builds the configuration from disk. If the configuration is marked
“stale,” the system transitions to the state of STALE_ADMIN_WAIT.
The following table provides a list of VCS system states and their descriptions. See
“Examples of System State Transitions” on page 612 for more information.
State Definition
CURRENT_DISCOVER_WAIT The system has joined the cluster and its configuration file is
valid. The system is waiting for information from other
systems before it determines how to transition to another
state.
CURRENT_PEER_WAIT The system has a valid configuration file and another system
is doing a build from disk (LOCAL_BUILD). When its peer
finishes the build, this system transitions to the state
REMOTE_BUILD.
State Definition
INITING The system has joined the cluster. This is the initial state for
all systems.
LEAVING The system is leaving the cluster gracefully. When the agents
have been stopped, and when the current configuration is
written to disk, the system transitions to EXITING.
STALE_DISCOVER_WAIT The system has joined the cluster with a stale configuration
file. It is waiting for information from any of its peers before
determining how to transition to another state.
STALE_PEER_WAIT The system has a stale configuration file and another system
is doing a build from disk (LOCAL_BUILD). When its peer
finishes the build, this system transitions to the state
REMOTE_BUILD.
UNKNOWN The system has not joined the cluster because it does not have
a system entry in the configuration.
◆ If VCS is started on a system with a valid configuration file, and if at least one other
system is already in the RUNNING state, the new system transitions to the RUNNING
state:
INITING -> CURRENT_DISCOVER_WAIT -> REMOTE_BUILD -> RUNNING
◆ If VCS is started on a system with a stale configuration file, and if at least one other
system is already in the RUNNING state, the new system transitions to the RUNNING
state:
INITING -> STALE_DISCOVER_WAIT -> REMOTE_BUILD -> RUNNING
◆ If VCS is started on a system with a stale configuration file, and if all other systems are
in STALE_ADMIN_WAIT state, the system transitions to the STALE_ADMIN_WAIT state as
shown below. A system stays in this state until another system with a valid
configuration file is started, or when the command hasys -force is issued.
INITING -> STALE_DISCOVER_WAIT -> STALE_ADMIN_WAIT
◆ If VCS is started on a system with a valid configuration file, and if other systems are in
the ADMIN_WAIT state, the new system transitions to the ADMIN_WAIT state.
INITING -> CURRENT_DISCOVER_WAIT -> ADMIN_WAIT
◆ If VCS is started on a system with a stale configuration file, and if other systems are in
the ADMIN_WAIT state, the new system transitions to the ADMIN_WAIT state.
INITING -> STALE_DISCOVER_WAIT -> ADMIN_WAIT
◆ When a system in RUNNING state is stopped with the hastop command, it transitions
to the EXITED state as shown below. During the LEAVING state, any online system
resources are taken offline. When all of the system’s resources are taken offline and
the agents are stopped, the system transitions to the EXITING state, then EXITED.
RUNNING -> LEAVING -> EXITING -> EXITED
You can modify the values of attributes labeled “user-defined” from the command line or
graphical user interface, or by manually modifying the main.cf configuration file. The
default values of VCS attributes are suitable for most environments; however, you can
change the attribute values to better suit your environment and enhance performance.
Caution When changing the values of attributes, be aware that VCS attributes interact
with each other. After changing the value of an attribute, observe the cluster
systems to confirm that unexpected behavior does not impair performance.
The values of attributes labeled “system use only” are set by VCS and are read-only. They
contain important information about the state of the cluster.
The values labeled “agent-defined” are set by the corresponding agent and are also
read-only.
In addition to the attributes listed in this appendix, see the VERITAS Cluster Server Agent
Developer’s Guide.
613
Resource Attributes
Resource Attributes
MonitorTimeStats string-association Valid keys are Average and TS. Average is the
(system use only) average time taken by the monitor entry point
over the last Frequency number of monitor
cycles. TS is the timestamp indicating when the
engine updated the resource’s Average value.
Defaults:
Average = 0
TS = ""
AgentClass string-scalar Indicates the scheduling class for the VCS agent
(user-defined) process.
Default = TS
InfoInterval integer-scalar Duration (in seconds) after which the info entry
(user-defined) point is invoked by the agent framework for
ONLINE resources of the particular resource type.
If set to 0, the agent framework does not
periodically invoke the info entry point. To
manually invoke the info entry point, use the
command hares -refreshinfo. If the value
you designate is 30, for example, the entry point
is invoked every 30 seconds for all ONLINE
resources of the particular resource type.
Default = 0
LogFileSize integer-scalar Specifies the size (in bytes) of the agent log file.
(user-defined) Minimum value is 65536 bytes. Maximum value
is 134217728 bytes (128MB).
Default = 33554432 (32MB)
AutoDisabled boolean-scalar Indicates that VCS does not know the status of a
(system use only) service group (or specified system for parallel
service groups). This is due to:
◆ Group not probed (on specified system for
parallel groups) in the SystemList attribute.
◆ VCS engine is not running on a node
designated in the SystemList attribute, but the
node is visible.
FromQ string-association Indicates the system name from which the service
(system use only) group is failing over. This attribute is specified
when service group failover is a direct
consequence of the group event, such as a
resource fault within the group or a group switch.
LastSuccess integer-scalar Indicates the time when service group was last
(system use only) brought online.
PreOnline boolean-scalar Indicates that the VCS engine should not online a
(user-defined) service group in response to a manual group
online, group autostart, or group failover. The
engine should instead call a user-defined script
that checks for external conditions before
bringing the group online.
Default = 0
System Attributes
LicenseType integer-scalar Indicates the license type of the base VCS key
(system use only) used by the system. Possible values are:
◆ 0—DEMO
◆ 1—PERMANENT
◆ 2—PERMANENT_NODE_LOCK
◆ 3—DEMO_NODE_LOCK
◆ 4—NFR
◆ 5—DEMO_EXTENSION
◆ 6—NFR_NODE_LOCK
◆ 7—DEMO_EXTENSION_NODE_LOCK
Cluster Attributes
Heartbeat Attributes
AYATimeout integer-scalar The maximum time (in seconds) that the agent
(user-defined) will wait for a heartbeat AYA entry point to
return ALIVE or DOWN before being cancelled.
Default = 300 seconds
Note The Web server is installed at the path /opt/VRTSweb/ on UNIX systems. On
Windows systems, the default installation path is C:\Program
Files\VERITAS\VRTSweb.
653
Getting Started
1. Access the Web server using an existing port number, for example,
http://hostname:8181/.
To view and select the available VERITAS Web consoles, click the Home tab in the top
left corner of the page.
To view and configure ports, SMTP recipients, SMTP servers, and logging, click
Configuration in the top left corner of the page.
The Configured Ports table lists information about the configured ports.
The SMTP Recipients table displays information about configured SMTP recipients
and the SMTP server.
The Logging table lists the log levels for various Web server components.
1. Access the Web server using an existing port number; for example,
http://hostname:8181/.
Adding Ports
▼ From the command line
Run the following command on the system where VRTSweb is installed:
# $VRTSWEB_HOME/bin/webgui addport portno protocol bind_ip_address
The variable portno represents the port number to be added. The variable protocol
represents the protocol for the port. HTTP specifies a normal HTTP port, HTTPS
specifies a secure SSL port.
Web servers using the HTTP port can be accessed at http://hostname:portno/.
Web servers using the HTTPS port can be accessed at https://hostname:portno/.
The optional variable bind_ip_address specifies that the new port be bound to a
particular IP address instead of each IP address on the system. Use this option to
restrict Web server access to specific administrative subnets. If specified, the IP
address must be available on the system before the Web server is started. Otherwise,
the Web server fails to start.
For example:
# /opt/VRTSweb/bin/webgui addport 443 HTTPS 101.1.1.2
# /opt/VRTSweb/bin/webgui addport 80 HTTP.
1. Access the Web server using an existing port number; for example,
http://hostname:8181/.
b. Choose the HTTP option to add a normal port; choose the HTTPS option to add a
secure SSL port.
Web servers using the HTTP port can be accessed at http://hostname:portno/.
Web servers using the HTTPS port can be accessed at https://hostname:portno/.
c. Enter an IP address to bind the new port to a specific IP address instead of each IP
address on the system. Ensure the IP address is available on the system before
starting the Web server. Use this attribute to restrict Web server access to specific
administrative subnets.
d. Enter the name and password for a user having superuser (administrative)
privileges on the Web server system.
e. Click OK.
Deleting Ports
▼ From the command line
Run the following command on the system where VRTSweb is installed:
# $VRTSWEB_HOME/bin/webgui delport <portno> [bind_ip_address]
The variable portno represents the port number to be deleted. If the port was bound to
a particular IP address, use the bind_ip_address option.
You must ensure that at least one port remains configured for the Web server.
For example:
# /opt/VRTSweb/bin/webgui delport 443 101.1.1.2
# /opt/VRTSweb/bin/webgui delport 80
1. Access the Web server using an existing port number; for example,
http://hostname:8181/.
a. Enter the port number to be deleted. You cannot delete the port being used to
access the Web page.
c. Enter the name and password for a user having superuser (administrative)
privileges on Web server system.
d. Click OK.
Note Certificate management commands are available only via the command line
interface. Commands that modify the certificate require a server restart. You can use
the webgui restart command to restart the Web server.
Note You must restart the server for the new certificate to take effect.
1. If you do not have a self-signed certificate with information that can be verified by the
CA, create one.
# $VRTSWEB_HOME/bin/webgui cert create
See “Creating a Self-Signed SSL Certificate” on page 661 for more information.
2. Generate a Certificate Signing Request (CSR) for the certificate. Run the following
command on the system where VRTSweb is installed:
# $VRTSWEB_HOME/bin/webgui cert certreq certreq_file
The variable certreq_file specifies the file to which the CSR will be written. The file is
written using the Public-Key Cryptography Standard PKCS#10.
For example:
# /opt/VRTSweb/bin/webgui cert certreq /myapp/vrtsweb.csr
3. Submit the CSR to a certification authority, who will issue a CA-signed certificate.
4. Import the CA-issued certificate to VRTSweb. Run the following command on the
system where VRTSweb is installed:
# $VRTSWEB_HOME/bin/webgui import ca_cert_file
The variable cert_file represents the certificate issued to you by the certification
authority.
For example:
# /opt/VRTSweb/bin/webgui cert import /myapp/vrtsweb.cer
Note that the import command fails if the CA root certificate is not a part of the trust
store associated with VRTSweb. If the command fails, add the CA root certificate to
the VRTSweb trust store:
# $VRTSWEB_HOME/bin/webgui cert trust ca_root_cert_file
For example:
# /opt/VRTSweb/bin/webgui cert trust /myapp/caroot.cer
Once the certificate used to sign the CSR is added to VRTSweb trust store, you can
import the CA-assigned certificate into VRTSweb.
5. Restart VRTSweb:
# $VRTSWEB_HOME/bin/webgui restart
1. Access the Web server using an existing port number. For example,
http://hostname:8181/
3. The SMTP Recipients table on the right side of the page displays the configured
SMTP server.
1. Access the Web server using an existing port number. For example,
http://hostname:8181/
3. Click Configure SMTP Server on the left side of the Configuration page.
a. Enter the IP address or hostname of the SMTP server to be used for notification.
An empty string will disable notification.
b. Enter the name and password for a user having superuser (administrative)
privileges on the Web server system.
c. Click OK.
1. Access the Web server using an existing port number. For example,
http://hostname:8181/
3. The SMTP Recipients table on the right side of the Configuration page lists the
configured SMTP recipients.
1. Access the Web server using an existing port number. For example,
http://hostname:8181/
3. Click Add SMTP Recipient on the left side of the Configuration page.
b. From the Severity list, select the threshold for receiving Web server events. You
can select one of the following values: INFO|WARN|ERROR|SEVERE.
c. From the Locale list, select the locale in which notification is to be sent.
d. Enter the name and password for a user having superuser (administrative)
privileges on the Web server system.
e. Click OK.
1. Access the Web server using an existing port number. For example,
http://hostname:8181/
3. Click Delete SMTP Recipient on the left side of the Configuration page.
b. Enter the name and password for a user having superuser (administrative)
privileges on the Web server system.
c. Click OK.
1. Access the Web server using an existing port number. For example,
http://hostname:8181/
3. The Logging table on the right side of the Configuration page lists the log levels for
various components of the Web server. Note that the table does not display the limit
and rollover count of various log files; you must use the command line to retrieve this
information.
1. Access the Web server using an existing port number. For example,
http://hostname:8181/
a. Select the logging levels for the Web server, Web applications, and for other
components.
b. Enter the name and password for a user having superuser privileges on the Web
server system.
c. Click OK.
For example:
# /opt/VRTSweb/bin/webgui log vrtsweb_size=100000 vrtsweb_count=4
# /opt/VRTSweb/bin/webgui log err_size=200000
# /opt/VRTSweb/bin/webgui log webapps_count=4
Parameter Description
vrtsweb_size- The size of the file _vrtsweb.log, which contains the Web server logs
and the tomcat container related logs.
vrtsweb_count The count for the file _vrtsweb.log.
command_size- The size of the file _command.log, which contains the logs related to
administrative commands.
command_count The count for the file _command.log.
binary_size- The size of the file _binary.log, which contains the binary representation
of other log files.
binary_count The count for the file _binary.log.
jvm_size- The size of the file _jvm.log, which contains JVM-related
measurements. The file records memory consumed by the JVM at
various times.
jvm_count The count for the file _jvm.log.
protocol_client_size- The size of the file _protocol_client.log, which contains the
communication sent (and received) by various utilities to the server.
protocol_client_count The count for the file _protocol_client.log.
protocol_server_size- The size of the file _protocol_server.log, which contains the
communication sent (and received) by the running server to various
utilities.
protocol_server_count The count for the file _protocol_server.log.
out_size- The size of the file _out.log, which contains messages logged to the
standard output stream of the JVM.
out_count The count for the file _out.log.
err_size- The size of the file _err.log, which contains messages logged to the
standard error stream of the JVM, including any stack traces.
err_count The count for the file _err.log.
webapps_size- The default size for log files of all Web applications running VRTSweb.
Individual Web applications can override this default value.
webapps_count- The count for log files of all Web applications running VRTSweb.
Individual Web applications can override this default value.
675
Support for Accessibility Settings
Active/Active Configuration
A failover configuration where each systems runs a service group. If either fails, the other
one takes over and runs both service groups. Also known as a symmetric configuration.
Active/Passive Configuration
A failover configuration consisting of one service group on a primary system, and one
dedicated backup system. Also known as an asymmetric configuration.
Cluster
One or more computers linked together for the purpose of multiprocessing and high
availability. The term is used synonymously with VCS cluster, meaning one or more
computers that are part of the same GAB membership.
Disaster Recovery
A solution that supports fail over to a cluster in a remote location in the event that the
local cluster becomes unavailable. Disaster recovery global clustering, heartbeating, and
replication.
Failover
A failover occurs when a service group faults and is migrated to another system.
GAB
Group Atomic Broadcast (GAB) is a communication mechanism of the VCS engine that
manages cluster membership, monitors heartbeat communication, and distributes
information throughout the cluster.
677
Global Service Group
A VCS service group that spans across two or more clusters. The ClusterList attribute
for the group contains the list of clusters over which the group spans.
hashadow Process
A process that monitors and, when required, restarts HAD.
Jeopardy
A node is in jeopardy when it is missing one of the two required heartbeat connections.
When a node is running with one heartbeat only (in jeopardy), VCS does not restart the
applications on a new node. This action of disabling failover is a safety mechanism that
prevents data corruption.
LLT
Low Latency Transport (LLT) is a communication mechanism of the VCS engine that
provides kernel-to-kernel communications and monitors network communications.
main.cf
The file in which the cluster configuration is stored.
Network Partition
If all network connections between any two groups of systems fail simultaneously, a
network partition occurs. When this happens, systems on both sides of the partition can
restart applications from the other side resulting in duplicate services, or “split-brain.” A
split brain occurs when two independent systems configured in a cluster assume they
have exclusive access to a given resource (usually a file system or volume). The most
serious problem caused by a network partition is that it affects the data on shared disks.
See “Jeopardy” and “Seeding”.
Node
The physical host or system on which applications and service groups reside. When
systems are linked by VCS, they becomes nodes in a cluster.
N-to-N
N-to-N refers to multiple service groups running on multiple servers, with each service
group capable of being failed over to different servers in the cluster. For example, consider
a four-node cluster with each node supporting three critical database instances. If any
node fails, each instance is started on a different node, ensuring no single node becomes
overloaded.
N-to-M
N-to-M (or Any-to-Any) refers to multiple service groups running on multiple servers,
with each service group capable of being failed over to different servers in the same
cluster, and also to different servers in a linked cluster. For example, consider a four-node
cluster with each node supporting three critical database instances and a linked two-node
back-up cluster. If all nodes in the four-node cluster fail, each instance is started on a node
in the linked back-up cluster.
Replication
Replication is the synchronization of data between systems where shared storage is not
feasible. The systems that are copied may be in local backup clusters or remote failover
sites. The major advantage of replication, when compared to traditional backup methods,
is that current data is continuously available.
Resources
Individual components that work together to provide application services to the public
network. A resource may be a physical component such as a disk or network interface
card, a software component such as Oracle8i or a Web server, or a configuration
component such as an IP address or mounted file system.
Resource Dependency
A dependency between resources is indicated by the keyword “requires” between two
resource names. This indicates the second resource (the child) must be online before the
first resource (the parent) can be brought online. Conversely, the parent must be offline
before the child can be taken offline. Also, faults of the children are propagated to the
parent.
Glossary 679
Resource Types
Each resource in a cluster is identified by a unique name and classified according to its
type. VCS includes a set of predefined resource types for storage, networking, and
application services.
Seeding
Seeding is used to protect a cluster from a pre-existing network partition. By default,
when a system comes up, it is not seeded. Systems can be seeded automatically or
manually. Only systems that have been seeded can run VCS. Systems are seeded
automatically only when: an unseeded system communicates with a seeded system or all
systems in the cluster are unseeded and able to communicate with each other. See
“Network Partition”.
Service Group
A service group is a collection of resources working together to provide application
services to clients. It typically includes multiple resources, hardware- and software-based,
working together to provide a single service.
Shared Storage
Storage devices that are connected to and used by two or more systems.
SNMP Notification
Simple Network Management Protocol (SNMP) developed to manage nodes on an IP
network.
State
The current activity status of a resource, group or system.
types.cf
The types.cf file describes standard resource types to the VCS engine; specifically, the data
required to control a specific resource.
Virtual IP Address
A unique IP address that associated with the cluster. It may be brought up on any system
in the cluster, along with the other resources of the service group. This address, also
known as the IP alias should not be confused with the base IP address, which is the IP
address that corresponds to the host name of a system.
681
type-specific 47 changing user privileges 148
Authority attribute clearing resource faults 182
about 449 clearing ResourceInfo attribute 188
definition 625 closing configuration files 196
AutoDisabled attribute 625 Cluster Query 139
AutoFailover attribute Command Center 137
about 345 configuration tree 125
definition 625 deleting resources 174
AutoRestart attribute 626 deleting service groups 153
AutoStart attribute deleting users 147
for resources 614 disabling resources 181
for service groups 626 disabling service groups 162
AutoStartIfPartial attribute 626 editing attributes 198
AutoStartList attribute 626 enabling resources 180
AutoStartPolicy attribute 627 enabling service groups 161
AutoStartTimeout attribute 645 flushing service groups 164
AvailableCapacity attribute 638 freezing service groups 159
freezing systems 192
B
importing resource types 189
binary message catalogs
linking resources 183
about 570
linking service groups 165
location of 570
logs 204
boolean attribute type 45
modifying system lists for service
bundled agents 16
groups 135
C monitoring group dependencies 128
campus clusters monitoring resource dependencies 129
about 33 Notifier Wizard 138
setting up 534 opening configuration files 194
campus configuration 33 probing resources 178
Capacity attribute 638 Properties view 127
CleanTimeout attribute 620 refreshing ResourceInfo attribute 188
client process, detecting failure 560 Remote Cluster Status View 133
CloseTimeout attribute 620 Resource View 129
ClusState attribute 645 saving configuration files 195
Cluster Administrator service group configuration wizard 169
about 54 Service Group View 128
adding user as 68 Status View 126
cluster attributes 645 switching service groups 158
Cluster Explorer System Connectivity View 132
about 122 System Manager 135
accessing 122 taking resources offline 176
adding resources 171 taking resources offline and
adding service groups 149 propagating 177
adding systems 190 taking service groups offline 156
adding users 146 tear-off view 125
autoenabling service groups 163 Template View 134
bringing resources online 175 toolbar 123
bringing service groups online 154 unfreezing service groups 160
changing user passwords 147 unfreezing systems 193
Index 683
unlinking service groups 167 ConnectorState attribute 646
commands, scripting 106 coordinator disks 318
CompareRSM attribute 646 CounterInterval attribute 646
ComputeStats attribute 614 CPU usage, how VCS monitors 565
conditional statements 76 CPUUsage attribute 639
ConfidenceLevel attribute 614 cpuusage event trigger 432
ConfigBlockCount attribute 638 CPUUsageMonitoring attribute 639
ConfigCheckSum attribute 638 Critical attribute 614
ConfigDiskState attribute 638 CurrentCount attribute 627
ConfigFile attribute 638 CurrentLimits attribute 639
ConfigInfoCnt attribute 638 custom agents, about 16
ConfigModDate attribute 639
D
configuration
DeferAutoStart attribute 627
backing up 98
dependencies
closing from Java Console 196
between resources 273
closing from Web Console 249
for resources 10
opening from Java Console 194
for service groups
opening from Web Console 249
disability compliance
saving from Java Console 195
in Java Console 109
saving from Web Console 249
in Web Console 211
saving in VCS Simulator 546
DNS agent 450
setting to read/write 67
DumpingMembership attribute 646
setting to read-only 67
DynamicLoad attribute 639
verifying 50
configuration files E
backing up 98 Enabled attribute
generating 38 for resources 615
main.cf 38 for service groups 628
read/write to read-only 67 Encrypting Passwords 83
removing stale designation 67 engine log
restoring 98 format 569
types.cf 38 location 569
configuration language EngineClass attribute 646
local and global attributes 48 EnginePriority attribute 646
type-specific attributes 47 enterprise agents, about 16
configurations entry points
asymmetric 24 about 15
campus 33 modifying for performance 555
global cluster 36 environment variables 59
N+1 28 error messages
N-to-1 26 agent log 569
N-to-N 30 at startup 573
replicated data 35 engine log 569
shared nothing 34 message catalogs 570
shared storage/replicated data 35 Evacuate attribute 628
symmetric 25 Evacuating attribute 628
ConfInterval attribute event triggers
about 349 about 431
definition 620 cpuusage 432
Index 685
haattr -delete command 93 hagrp -offline command
hacf utility for global clusters 478
about 50 for local clusters 77
creating multiple .cf files 50 hagrp -online command
dumping a configuration 50 for global clusters 477
loading a configuration 50 for local clusters 77
pretty-printing 50 hagrp -resources command 71
hacf -verify command 50 hagrp -state command
HacliUserLevel attribute for global clusters 472
about 54 for local clusters 71
definition 647 hagrp -switch command
haclus -add command 479 for global clusters 478
haclus command, permission matrix 597 for local clusters 77
haclus -declare command 480 hagrp -unfreeze command 77
haclus -delete command 479 hagrp -unlink command 91
haclus -display command hagrp -value command 471
for global clusters 475 hagrp -wait command 106
for local clusters 73 hahb -add command 481
haclus -list command 475 hahb -display command 476
haclus -modify command 480 hahb -list command 476
haclus -state command 475 halogin command 65, 66
haclus -status command 475 hamsg -info command 74
haclus -value command hamsg -list command 74
for global clusters 475 hanotify utility 419
for local clusters 73 hares -action command 479
haclus -wait command 106 hares -add command 89
haconf -dump -makero command 67 hares -clear command 79
haconf -makerw command 67 hares command, permission matrix 602
HAD hares -delete command 91
about 17 hares -dep command 72
impact on performance 554 hares -display command
HAD diagnostics 572 for global clusters 473
hagrp -add command 87 for local clusters 72
hagrp -clear command 78 hares -global command 72
hagrp command, permission matrix 598 hares -info command 479
hagrp -delete command 91 hares -link command 90
hagrp -dep command 71 hares -list command
hagrp -disable command 78 for global clusters 473
hagrp -disableresources command 78 for local clusters 76
hagrp -display command hares -local command 85
for global clusters 472 hares -modify command 90
for local clusters 71 hares -offline command 79
hagrp -enable command 78 hares -offprop command 79
hagrp -enableresources command 78 hares -online command 79
hagrp -freeze command 77 hares -override command 92
hagrp -list command hares -probe command 79
for global clusters 472 hares -state command 473
for local clusters 76 hares -undo_override command 92
hagrp -modify command 87 hares -unlink command 91
Index 687
logging on to a cluster 144 agent log 569
overview 109 engine log 569
running commands from 197 message tags 569
setting initial display 110 VRTSweb 670
starting 112 logs
user profiles 146 customizing display in Java Console 204
using with ssh 111 customizing display in Web Console 282
Java Console views for VRTSweb 670
Properties 127 searching from Java Console 204
Remote Cluster Status 133 viewing from Java Console 140
Resource 129 LogSize attribute 648
Service Group 128 low priority links 331
Status 126
M
System Connectivity 132
main.cf
tear-off option 125
about 38
jeopardy
cluster definition 38
about 331
group dependency clause 40
membership 330
include clauses 38
K resource definition 40
keylist attribute dimension 46 resource dependency clause 40
keywords, list of 49 sample configuration 40
service group definition 39
L
system definition 39
LastOnline attribute 616
MajorVersion attribute
LastSuccess attribute 630
for clusters 648
license keys
for systems 641
about 61
ManageFaults attribute
installing 61
about 346
troubleshooting 589
definition 631
upgrading 61
ManualOps attribute 631
LicenseType attribute 640
membership arbitration 320
licensing issues 7
message tags, about 569
Limits attribute 640
MigrateQ attribute 631
LinkHbStatus attribute 640
MinorVersion attribute
LinkMonitoring attribute 647
for clusters 648
links, low priority 331
for systems 641
LLT, about 17
MonitorInterval attribute 622
LLTNodeId attribute 640
MonitorOnly attribute 617
Load attribute 630
MonitorStartParam attribute 622
Load policy for SGWM 365
MonitorTimeout attribute 622
LoadTimeCounter attribute 641
MonitorTimeStats attribute 617
LoadTimeThreshold attribute 641
myVCS page 229
loadwarning event trigger 434
about 229
LoadWarningLevel attribute 641
creating 280
local attributes 48
LockMemory attribute 647 N
LogDbg attribute 621 N+1 configuration 28
LogFileSize attribute 622 Name attribute 617
logging network failure 132
Index 689
Probed attribute refreshing from Web Console 275
for resources 617 ResourceLimit attribute 649
for service groups 634 ResourceOwner attribute 618
ProbesPending attribute 634 resources
ProcessClass attribute 649 about 10
ProcessPriority attribute 649 adding from command line 89
adding from Java Console 171
Q
adding from Web Console 270
quick reopen 560
administering from Java Console 171
R administering from Web Console 263
ReadOnly attribute 649 bringing online from command line 79
Recovering After a Disaster 529 bringing online from Java Console 175
refresh modes, in Web console 228 bringing online from Web Console 263
regardy membership 331 categories of 11
Remote Cluster Configuration wizard 484 clearing faults from Java Console 182
Remote Cluster States 607 clearing faults from Web Console 266
remote clusters conditional statements 76
monitoring from Java Console 133 creating faults in VCS Simulator 547
monitoring from Web Console 126 deleting from command line 91
states of 608 deleting from Java Console 174
replicated data clusters deleting from Web Console 271
about 35 dependencies 243
setting up 525 disabling from command line 358
replicated data configuration 35 disabling from Java Console 181
resadminwait event trigger 437 disabling from Web Console 267
reserved words, list of 49 enabling from command line 78
resfault event trigger 437 enabling from Java Console 180
resnotoff event trigger 438 enabling from Web Console 268
resource attributes 614 how disabling affects states 360
resource dependencies invoking action script 274
creating from command line 90 invoking actions 187
creating from Java Console 183 limitations of disabling 358
creating from Web Console 272 linking from command line 90
displaying from command line 72 linking from Java Console 183
removing from command line 91 On-Off 11
removing from Java Console 185 On-Only 11
removing from Web Console 273 Persistent 11
resource faults probing from Java Console 178
clearing from Java Console 182 probing from Web Console 266
simulating 547 querying from command line 72
resource type attributes 619 taking offline from command line 79
resource types taking offline from Java Console 176
importing 189 taking offline from Web Console 263
querying from command line 72 troubleshooting 577
ResourceInfo attribute unlinking 273
clearing from Java Console 188 unlinking from command line 91
clearing from Web Console 275 unlinking from Java Console 185
definition 617 unlinking from Web Console 273
refreshing from Java Console 188 resource-specific attributes 47
Index 691
querying from command line 71 SysName attribute 642
switching from Java Console 158 sysoffline event trigger 440
switching from Web Console 256 SysState attribute 642
taking offline from Java Console 156 System Attributes 638
taking offline from Web Console 255 system attributes 638
taking remote groups offline 496 system states 610
troubleshooting 574 SystemList attribute
unfreezing from command line 77 about 88
unfreezing from Java Console 160 definition 635
unfreezing from Web Console 257 modifying 88
unlinking from Java Console 167 SystemLocation attribute 642
unlinking from Web Console 261 SystemOwner attribute 643
shared nothing configuration 34 systems
shared storage/replicated data adding from command line 81
configuration 35 adding from Java Console 190
ShutdownTimeout attribute 642 administering from command line 80
Signaled attribute 618 administering from Java Console 190
Simulator. See VCS Simulator administering from Web Console 276
SMTP notification bringing online in VCS Simulator 546
configuring for VRTSweb 664 client process failure 560
SMTP server, retrieving name of 664 deleting from Java Console 191
SNMP 415 detecting failure 559
files for notification 424 displaying node ID from command
HP OpenView 424 line 80
merging events with HP OpenView freezing from Java Console 192
NNM 424 freezing from Web Console 276
SourceFile attribute panic 560
for clusters 649 quick reopen 560
for resource types 624 starting from command line 62
for service groups 634 states 610
for systems 642 unfreezing from Java Console 193
split-brain unfreezing from Web Console 277
about 315 systems and nodes 9
in global clusters 452 SystemZones attribute 635
preventing 316
T
ssh configuration for Java Console 111
Tag attribute 636
Start attribute 618
TargetCount attribute 636
State attribute
templates
for resources 618
accessing Template View 134
for service groups 635
adding resources from 173
static attributes 48
adding service groups from 152
steward process
TFrozen attribute
about 452
for service groups 636
configuring 462
for systems 643
Stewards attribute 649
ToleranceLimit attribute 624
string attribute type 45
ToQ attribute 636
SupportedActions attribute 624
TriggerEvent attribute
symmetric configuration 25
for resources 618
SysInfo attribute 642
Index 693
for clusters 650 navigation buttons 223
for systems 644 navigation in 223
VCSi3Info attribute 650 navigation links 223
VCSMode attribute 650 navigation trails 223
vector attribute dimension 46 online help 220
VERITAS Traffic Director querying configuration 279
integrating with Web Console 285 refresh modes 228
violation event trigger 441 refreshing 228
virtual IP address 4 removing cluster from console 219
VRTSweb Systems page 232
adding ports 657 troubleshooting 580
adding SMTP recipients 667 Update icon 228
deleting ports 659 viewing 224
deleting SMTP recipients 669 Web Console pages
logging 670 Alerts 236
modifying log levels 671 All Attributes 234
notification for 664 Cluster Heartbeats 245
ports for 656 Cluster Summary 225
retrieving log levels 670 Group Dependency 231
retrieving ports 656 Logs 235
setting heap size 674 Management Host 224
setting SMTP server 665 myVCS 227
vxfen. See fencing module Preferences 228
vxlicinst utility 61 Resource 242
Resource Dependency 243
W
Resource Dependency graph 243
wac 448
Resource Dependency text 244
WACPort attribute 650
Resource Type 241
Web Console
Resource Types 233
See Also Web Console pages
Service Group 237
about 211
Service Groups 229
adding cluster to console 218
System 239
administering global clusters 503
System Heartbeats 240
administering resources 263
Systems 232
administering service groups 250
VCS Users 227
administering systems 276
WERO 319
Alerts page 236
wide-area connector 448
connecting from URL 217
wide-area failover 36
customizing log display 282
Wide-Area Heartbeat agent 449
customizing view 280
wizards
disability compliance 211
Apache 303
integrating with Traffic Director 285
Application 288
Java plug-in for 216
GCO Configuration 456
Java Plug-in requirements 216
Global Group Configuration 463
logging in 220
NFS 297
logging out 220
Notifier Resource Configuration 201
Logs page 235
Remote Cluster Configuration 484
managing cluster configuration 249
managing users 246