Saturday, May 18, 2013

Replicated database - NetApp cluster-mode

The RDB is the 'vol0' volume which is created on every node of the cluster to manage the cluster operations. No user data is contained in it and is not available for client visibility. RDB's are made up of four units namely VLDB, Vifmgr, management and BCOM.

Key features of RDB:
-Cluster with 'n' nodes will have 'n' RDB's where in one of it will be the master and the rest are secondary
-RDBs are synchronized and the operations are transactional; which means the entire transactions are either committed or rolled back
-RDB database reads are performed locally on each node, but an RDB write is only performed to "master" RDB database and the changes are replicated to other databases in the cluster
-The master is elected by the members of the cluster management units; a secondary can become a master and viceversa when master has communication issues
-One node in the cluster has a special tie-breaking ability called 'epsilon', unlike the master that may be different for each RDB unit, epsilon is a single node that applies to all units

-This is one of the RDB unit that runs as 'vldb'
-Contains the index information of which the D-blade currently owns a volume and currently serves an aggregate
-The content of VLDB is cached in memory on each node for instant access by each N-blade and SCSI-blade

-It is the M-host, which enables management of the cluster from any node
-Runs as the management gateway daemon - 'mgwd' on every node
-The information contained is used by the CLI and element manager

-Runs as 'vifmgr'
-Contains LIF configuration and failover policies

BCOM (blocks configuration and operations manager):
-Runs as 'bcomd', to manage the configuration of SAN data access
-Contains LUN maps and initiator groups

-It is a simple majority of connected, healthy nodes
-There will be a cluster-wide quorum and an individual unit being in or out of quorum
-When a unit goes out of quorum, reads from the unit can still occur, but not the writes
-The voting determines which node will be their master; so if a node is taken down for an extended period of time, it should be marked as ineligible so the quorum is not affected

system node modify -node <node> -eligibility false

Read more: