Wednesday, April 15, 2009

How TO: Setup Multi-Master\Advanced Replication with Oracle 10g, 11g

As per the oracle documentation there are 3 main processes involved in replication between masters:
1.) Replication Administration: Create replication groups, defines replication objects and other replication administrative functions.
2.) Replication Propagation: Makes sure that the changes are sent from the database in which they have happened to all the other masters.
3.) Replication Receiver: Applies all the received changes to the destination master databases.

These three functions can be assigned to 3 different schema in the oracle database, or can be done by one schema.
I am going to use the same schema to perform all the above functions. The name of the schema, used most popularly is REPADMIN.

Init parameter JOB_QUEUE_PROCESSES should have a non zero value.

Enable GLOBAL_NAMES
SQL> show parameter global_names
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
global_names boolean FALSE
SQL> alter system set global_names=TRUE;

(Master 1)
SQL> select * from global_name;
GLOBAL_NAME
-----------------
TEST.ORACLE.COM

(Master 2)
SQL> select * from global_name;
GLOBAL_NAME
----------------
ORCL.ORACLE.COM

Incase you want to change the global name:
alter database rename global_name to ;

In my case the name of the database are test (Linux) and orcl (windows) and the setup is replication between 2 databases, one is a Linux installation and the other is windows.

In a multi-master replication setup, one DB acts as the master definition database. This database is used for running administration and maintenance tasks. In our case we will use TEST DB as the master definition database.

Step 1:
Create REPADMIN schema in all the master databases.

SQL> CREATE USER REPADMIN IDENTIFIED BY REPADMIN DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;

User created.

Step 2:
Granting admin permissions to REPADMIN schema. DBMS_REPCAT_ADMIN package is used to perform any administrative activity related to replication.

SQL> EXEC DBMS_REPCAT_ADMIN.GRANT_ADMIN_ANY_SCHEMA('REPADMIN');

PL/SQL procedure successfully completed.

Permission to allow REPADMIN to propogate changes.

SQL> EXEC DBMS_DEFER_SYS.REGISTER_PROPAGATOR('REPADMIN');

PL/SQL procedure successfully completed.

Permission to allow REPADMIN to be a receiver.

SQL> EXEC DBMS_REPCAT_ADMIN.REGISTER_USER_REPGROUP (USERNAME => 'REPADMIN', PRIVILEGE_TYPE => 'RECEIVER', LIST_OF_GNAMES => NULL);

PL/SQL procedure successfully completed.

Step 3:
Create database links between the various master sites. The name of the DB Link should be same as the global name of the target database.
Need to create a public database link first:

(Master 1)
SQL> CREATE PUBLIC DATABASE LINK ORCL.ORACLE.COM USING 'ORCL';

Database link created.
(Master 2)
SQL> CREATE PUBLIC DATABASE LINK TEST.ORACLE.COM USING 'TEST';

Database link created.

Now create a private database link in REPADMIN schema.

(Master 1)
SQL> conn repadmin/repadmin
Connected.
SQL> CREATE DATABASE LINK ORCL.ORACLE.COM CONNECT TO REPADMIN IDENTIFIED BY repadmin USING 'ORCL';

Database link created.

(Master 2)
SQL> conn repadmin/repadmin
Connected.
SQL> CREATE DATABASE LINK TEST.ORACLE.COM CONNECT TO REPADMIN IDENTIFIED BY repadmin USING 'TEST';

Database link created.

Step 4:
Defining or creating replication objects.
I am using the HR schema and the tables present under it for replication purpose.
SQL> SELECT TABLE_NAME FROM USER_TABLES;

TABLE_NAME
------------------------------
LOCATIONS
BLOB_TEST
EMPLOYEES
REGIONS
JOBS
JOB_HISTORY
DEPARTMENTS

7 rows selected.

Step 5:
Creation of replication group.
We need to create this group in the master definition DB or test DB in our case as REPADMIN.

SQL> CONN REPADMIN/REPADMIN
Connected.
SQL> EXEC DBMS_REPCAT.CREATE_MASTER_REPGROUP('REP2');

PL/SQL procedure successfully completed.

You can create multiple replication groups depending on our requirement, for different objects or set of objects.

Step 6:
Add objects/tables to the replication group REP2. Need to be executed only at the master definition DB (test).

SQL> EXEC DBMS_REPCAT.CREATE_MASTER_REPOBJECT('HR', 'BLOB_TEST', 'TABLE', GNAME=>'REP2');
EXEC DBMS_REPCAT.CREATE_MASTER_REPOBJECT('HR', 'DEPARTMENTS', 'TABLE', GNAME=>'REP2');
EXEC DBMS_REPCAT.CREATE_MASTER_REPOBJECT('HR', 'EMPLOYEES', 'TABLE', GNAME=>'REP2');
EXEC DBMS_REPCAT.CREATE_MASTER_REPOBJECT('HR', 'JOBS', 'TABLE', GNAME=>'REP2');
EXEC DBMS_REPCAT.CREATE_MASTER_REPOBJECT('HR', 'JOB_HISTORY', 'TABLE', GNAME=>'REP2');
EXEC DBMS_REPCAT.CREATE_MASTER_REPOBJECT('HR', 'LOCATIONS', 'TABLE', GNAME=>'REP2');
EXEC DBMS_REPCAT.CREATE_MASTER_REPOBJECT('HR', 'REGIONS', 'TABLE', GNAME=>'REP2');

PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.

Step 7:
Define the other master database in the replication setup at the master definition database i.e. add ORCL database as the other master in the replication.
Need to be executed only at the master definition DB (test). Run this as REPADMIN.

SQL> EXEC DBMS_REPCAT.ADD_MASTER_DATABASE (GNAME=> 'REP2', MASTER=>'ORCL.ORACLE.COM',USE_EXISTING_OBJECTS => TRUE,COPY_ROWS=> FALSE, PROPAGATION_MODE=> 'ASYNCHRONOUS');

PL/SQL procedure successfully completed.


Step 8:
Setup replication support for the tables marked for replication in the Step 6.
Need to be executed only at the master definition DB (test). Run this as REPADMIN.

SQL> show user
USER is "REPADMIN"
SQL> exec dbms_repcat.generate_replication_support('HR','BLOB_TEST','TABLE');
exec dbms_repcat.generate_replication_support('HR','DEPARTMENTS','TABLE');
exec dbms_repcat.generate_replication_support('HR','EMPLOYEES','TABLE');
exec dbms_repcat.generate_replication_support('HR','JOBS','TABLE');
exec dbms_repcat.generate_replication_support('HR','JOB_HISTORY','TABLE');
exec dbms_repcat.generate_replication_support('HR','LOCATIONS','TABLE');
exec dbms_repcat.generate_replication_support('HR','REGIONS','TABLE');


PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.
SQL>
PL/SQL procedure successfully completed.

Step 9:

Enable replication between these databases.
Need to be executed only at the master definition DB (test). Run this as REPADMIN.

SQL> EXEC DBMS_REPCAT.RESUME_MASTER_ACTIVITY('REP2');
PL/SQL procedure successfully completed.

Step 10:
Start/create the data push job from all the master databases.
Define the time delay in the replication, what oracle needs to do when it encounters an error, do you need trace files for error, and the degree of parallelism.

(Master 1)
SQL> EXEC DBMS_DEFER_SYS.SCHEDULE_PUSH(DESTINATION =>'ORCL.ORACLE.COM', INTERVAL=>'SYSDATE + 1/60/24',NEXT_DATE=> SYSDATE,STOP_ON_ERROR => FALSE, DELAY_SECONDS => 0, WRITE_TRACE=>TRUE, PARALLELISM => 2);
PL/SQL procedure successfully completed.

(Master 2)
SQL> EXEC DBMS_DEFER_SYS.SCHEDULE_PUSH(DESTINATION =>'TEST.ORACLE.COM', INTERVAL=>'SYSDATE + 1/60/24',NEXT_DATE=> SYSDATE,STOP_ON_ERROR => FALSE, DELAY_SECONDS => 0, WRITE_TRACE=>TRUE, PARALLELISM => 2);
PL/SQL procedure successfully completed.

Step 11:
Define purge job, this purges the data\transaction which has been replicated. Need to run on all the masters.

SQL> EXEC DBMS_DEFER_SYS.SCHEDULE_PURGE(NEXT_DATE => SYSDATE,INTERVAL=>'SYSDATE + 1/24',DELAY_SECONDS=>0, ROLLBACK_SEGMENT => '');
PL/SQL procedure successfully completed.

Step 12:

Do checks:
Check for DBMS_JOBS in both masters:
SQL> @jobs
Job User Schema Last Date Next Date B F What
----------- ---------- ---------- ---------------- ---------------- ---- - -----------------------------------------------------------------
0: 66 REPADMIN REPADMIN 15.04.2009 09:18 15.04.2009 09:28 N 0 dbms_repcat.do_deferred_repcat_admin('"REP2"', FALSE);
0: 67 REPADMIN REPADMIN 15.04.2009 09:25 15.04.2009 09:26 N 0 declare rc binary_integer; begin rc := sys.dbms_defer_sys.pu
0: 68 REPADMIN REPADMIN 15.04.2009 09:25 15.04.2009 09:26 N 0 declare rc binary_integer; begin rc := sys.dbms_defer_sys.pu

Queries that can we used to check for the replication setup:

SELECT SNAME,ONAME,STATUS,GENERATION_STATUS FROM ALL_REPOBJECT; --> Status and other details of the replicating objects.
COL SOURCE FORMAT A10
COL SNAME FORMAT A10
COL ONAME FORMAT A10
SELECT ID,SOURCE,STATUS,TIMESTAMP,SNAME,ONAME,ERRNUM,MESSAGE FROM DBA_REPCATLOG; --> Status, any error details of the replication objects.
SELECT * FROM DBA_REPSITES; --> Master and Slaves site details.
SELECT * FROM DBA_REPSCHEMA; --> Schema's being replicated.
SELECT * FROM DBA_REPGROUP; --> The details of the replication groups.
SELECT * FROM DBA_REPGENERATED;
SELECT * FROM DBA_REPDDL; --> DDL of the changes that need to be sent to the other masters and slaves.
SELECT * FROM DBA_REPCONFLICT; --> Details of any replication conflicts.


Test to see if the replication is working fine:
(Master 1)
SQL> SELECT * FROM REGIONS;
REGION_ID REGION_NAME
------------ -------------------------
5 Antartica
6 Artic
1 Europe
2 Americas
3 Asia
4 Middle East and Africa

SQL> INSERT INTO REGIONS VALUES (7,'CHINA');
1 row created.

SQL> commit;
Commit complete.

SQL> SELECT * FROM REGIONS@ORCL.ORACLE.COM;
REGION_ID REGION_NAME
------------ -------------------------
5 Antartica
6 Artic
7 China
1 Europe
2 Americas
3 Asia
4 Middle East and Africa

You can see that the replicaiton works fine from one master to other. Lets try the other way around also.
(Master 2)
SQL> SELECT * FROM REGIONS;
REGION_ID REGION_NAME
---------- -------------------------
5 Antartica
6 Artic
7 China
1 Europe
2 Americas
3 Asia
4 Middle East and Africa
7 rows selected.

SQL> INSERT INTO REGIONS VALUES (8,'US');
1 row created.

SQL> commit;
Commit complete.

SQL> SELECT * FROM REGIONS@TEST.ORACLE.COM;
REGION_ID REGION_NAME
---------- -------------------------
5 Antartica
6 Artic
7 China
8 US
1 Europe
2 Americas
3 Asia
4 Middle East and Africa
8 rows selected.

While trying to setup this replication setup, I came across some other weblinks also. Please refer to them too as they have a lot of great information.
Setup Multi-master Replication with Oracle

Multimaster replication in Oracle

Advanced Replication FAQ

Wednesday, April 1, 2009

What is Voting Disk & Split Brain Syndrome in RAC

Voting Disk


Oracle Clusterware uses the voting disk to determine which instances are members of a cluster. The voting disk must reside on a shared disk. Basically all nodes in the RAC cluster register their heart-beat information on thes voting disks. The number decides the number of active nodes in the RAC cluster. These are also used for checking the availability of instances in RAC and remove the unavailable nodes out of the cluster. It helps in preventing split-brain condition and keeps database information intact. The split brain syndrome and its affects and how it has been managed in oracle is mentioned below.
For high availability, Oracle recommends that you have a minimum of three voting disks. If you configure a single voting disk, then you should use external mirroring to provide redundancy. You can have up to 32 voting disks in your cluster. What I could understand about the odd value of the number of voting disks is that a noe should see maximun number of voting disk to continue to function, so with 2, if it can see only 1, its not the maximum value but a half value of voting disk. I am still trying to search more on this concept.

Split Brain Syndrome:


In a Oracle RAC environment all the instances/servers communicate with each other using high-speed interconnects on the private network. This private network interface or interconnect are redundant and are only used for inter-instance oracle data block transfers. Now talking about split-brain concept with respect to oracle rac systems, it occurs when the instance members in a RAC fail to ping/connect to each other via this private interconnect, but the servers are all pysically up and running and the database instance on each of these servers is also running. These individual nodes are running fine and can conceptually accept user connections and work independently. So basically due to lack of commincation the instance thinks that the other instance that it is not able to connect is down and it needs to do something about the situation. The problem is if we leave these instance running, the sane block might get read, updated in these individual instances and there would be data integrity issue, as the blocks changed in one instance, will not be locked and could be over-written by another instance. Oracle has efficiently implemented check for the split brain syndrome.

What does RAC do incase node becomes inactive:


In RAC if any node becomes inactive, or if other nodes are unable to ping/connect to a node in the RAC, then the node which first detects that one of the node is not accessible, it will evict that node from the RAC group. e.g. there are 4 nodes in a rac instance, and node 3 becomes unavailable, and node 1 tries to connect to node 3 and finds it not responding, then node 1 will evict node 3 out of the RAC groups and will leave only Node1, Node2 & Node4 in the RAC group to continue functioning.
The split brain concepts can become more complicated in large RAC setups. For example there are 10 RAC nodes in a cluster. And say 4 nodes are not able to communicate with the other 6. So there are 2 groups formed in this 10 node RAC cluster ( one group of 4 nodes and other of 6 nodes). Now the nodes will quickly try to affirm their membership by locking controlfile, then the node that lock the controlfile will try to check the votes of the other nodes. The group with the most number of active nodes gets the preference and the others are evicted. Moreover, I have seen this node eviction issue with only 1 node getting evicted and the rest function fine, so I cannot really testify that if thats how it work by experience, but this is the theory behind it.
When we see that the node is evicted, usually oracle rac will reboot that node and try to do a cluster reconfiguration to include back the evicted node.
You will see oracle error: ORA-29740, when there is a node eviction in RAC. There are many reasons for a node eviction like heart beat not received by the controlfile, unable to communicate with the clusterware etc.
A good metalink note on understanding node eviction and how to address is Note ID: 219361.1

The CSS (Cluster Synchronization Service) daemon in the clusterware maintains the heart beat to the voting disk.