Storage Foundation HA 5.1 SP1 Gives Organizations New Flexibility to Control and Manage Network Splits

| | Leave a comment
Arbitrating a network split in stretch cluster configurations, such as campus clusters, is a continuing challenge in managing fail over configurations. One specific challenge is optimizing the management of a recovery of a cluster when a network split occurs. It is this flexibility to favor a specific site that is part of a campus cluster such that it emerges the winner is what the new preferred fencing feature in Symantec Veritas Storage Foundation HA 5.1 Service Pack 1 (SP1) provides.

Coordination Point Server (CPS) was introduced as a new feature in the Storage Foundation HA 5.1 release. CPS is a software solution that runs on a remote system or cluster and provides arbitration between two or more nodes that are part of a larger campus cluster.

CPS is based on the same principle of how coordinator disks in a Veritas Cluster Server (VCS) cluster operate now. Coordinator disks are three or more odd-numbered disks or LUNs set aside for I/O fencing and are used to prevent a network split from occurring within a cluster.

A network split may occur when the communication or heartbeat between the nodes in a cluster is lost even though the node hosting the application is still functioning. Losing this heartbeat results in a failover of the application to another node in the cluster such that two nodes could potentially run the same application at the same time.

This is what coordinator disks prevent as they are each accessible by the nodes in the cluster.  So when the heartbeat of a node is lost, all of the nodes in the cluster "race" to grab control of these coordinator disks. The "winner" is the first node in the cluster to get exclusive control of the coordinator disks.

At that point, the sub-cluster to which the winning node belongs assumes responsibility for handling the processing of that application processing until the heartbeat between the two sub-clusters in the cluster is restored. So in a failover configuration (only two nodes in a cluster), the application continues to run only on the node on which it normally resides while in a Storage Foundation CFS or RAC cluster the responsibility for application processing is equally shared between the nodes.

CPS takes this concept of coordinator disks and applies it to campus clusters by arbitrating network splits across multiple geographic locations. However in a campus cluster, should the heartbeat between the two sites be lost, CPS assists in the network arbitration and chooses the winning cluster that will survive and continue hosting the application.

(Note: CPS may also be used to provide arbitration between nodes in a cluster though for the sake of simplicity in this blog entry I am only going to discuss how CPS functions as part of a campus cluster. All of the principles discussed in this blog entry carry over if CPS is used for arbitration in a cluster.)

The difference between coordinator disks and CPS is that CPS is accessed over IP, not via a SAN, to coordinate the arbitration in the event of a network split. Using CPS, if a set of nodes loses communication with the other nodes that are part of its campus cluster, the remaining nodes "race" to get exclusive control of that cluster by communicating with the CPS. CPS then determines the "winning" cluster.

CPS as it was released in Storage Foundation HA 5.1 resolved the problem that enterprise organizations were encountering in network arbitration and resolving a network split in campus clusters. However as organizations implemented CPS they began to want more choices in recovery since not all sites that are part of a campus cluster are created equally.

For example, a cluster may be optimally configured from a hardware perspective to support the applications running on it. So while all of the sites that are part of the campus cluster may satisfy the minimum configuration requirements to support applications running on other sites, certain sites that are part of its campus cluster may be better suited to run the application.

In some cases, it may be that the hardware at one site is better able to handle the performance load associated with a cluster than another. In other situations it may be that individuals at a site have more expertise in managing the applications hosted by a cluster.

Regardless, the point is that organizations, if given a choice, likely have a preference as to which site they want a specific application to run in the event of a network split. So if the cluster is stretched across two sites A and B and if the preference is to have the application online at A in the event of a network split, preferred fencing allows for a greater weight (and hence a preference) to be associated with site A.

This ability to associate a high weight to a specific site in a campus cluster in the event of a network split so that it is more likely to win is what the new preferred I/O fencing feature in Storage Foundation HA 5.1 SP1 enables that should become a powerful new software tool that enterprises organizations find extremely appealing. By being able to show preference to a site or a specific set of nodes within a campus cluster, they gain the added degree of control and flexibility that they increasingly need in today's highly available environments.

Leave a comment

Optional: Sign in with   |  

Entry Sponsorship

This entry is sponsored by Symantec Corp.

About Symantec Corp.

    Symantec is a global leader in infrastructure software, enabling businesses and consumers to have confidence in a connected world. The company helps customers protect their infrastructure, information and interactions by delivering software and services that address risks to security, availability, compliance and performance. Headquartered in Cupertino, Calif., Symantec has operations in more than 40 countries. More information is available at www.symantec.com.

    DCIG is paid a fee by Symantec Corp. in connection with this blog. Symantec undertakes no obligation to update, correct or modify any statements contained in this blog; these statements represent the views and opinions of DCIG only.