Saturday, September 19, 2009

How To: Configure Data Guard Fast-Start Failover Step By Step

Note: Physcial Standby should be working fine and Data Guard Broker has already been configured and working fine.

How to Setup Data Guard
How To: Configure Data Guard Broker

1.) Set Local_Listener (Primary):

For the database to register with the data guard listener we need to get the local_listener parameter.
SQL> show parameter local_listener
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
local_listener string

SQL> alter system set local_listener='(ADDRESS = (PROTOCOL = TCP)(HOST = 10.224.56.38)(PORT = 1521))';
System altered.

SQL> alter system register;
System altered.

SQL> show parameter local_listener
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
local_listener string (ADDRESS = (PROTOCOL = TCP)(HOST = 10.224.56.38)(PORT = 1521))

2.) Verify force logging:

SQL> select force_logging from v$database;
FOR
---
YES

Should be YES.

3.) Should use spfile:
SQL> show parameter spfile;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string /opt/app/oracle/10.2/dbs/spfileorcl.ora

4.) Should use a password file:

SQL> select * from v$pwfile_users;
USERNAME SYSDB SYSOP
----------------------------------------
SYS TRUE TRUE

5.) Remote login should be enabled:

SQL> show parameter remote_login_passwordfile
NAME TYPE VALUE
-------------------------------------------------
remote_login_passwordfile string EXCLUSIVE

6.) DB_UNIQUE_NAME should be set:

SQL> show parameter db_unique_name
NAME TYPE VALUE
-------------------------------------------------
db_unique_name string orcl

7.) Flash Recovery Ares should be set:

SQL> show parameter db_recovery_file
NAME TYPE VALUE
-----------------------------------------------------------------------------
db_recovery_file_dest string /opt/app/oracle/flash_recovery_area
db_recovery_file_dest_size big integer 2G

To set it, use the following commands:

alter system set db_recovery_file_dest_size = 2g scope=both;
alter system set db_recovery_file_dest = '' scope=both;

8.) Standby file management should be AUTO:

SQL> show parameter standby_file_management
NAME TYPE VALUE
--------------------------------------------------
standby_file_management string MANUAL

SQL> alter system set standby_file_management=auto;
System altered.

SQL> show parameter standby_file_management
NAME TYPE VALUE
-------------------------------------------------------
standby_file_management string AUTO

9.) log_archive_config should be set:

SQL> show parameter log_archive_config
NAME TYPE VALUE
-----------------------------------------------------------------------------
log_archive_config string DG_CONFIG=(orcl,orcl1)

10.) Standby Redo logs should be created:

SQL> select group#, type, member from v$logfile where type = 'STANDBY';
GROUP# TYPE MEMBER
-------------------------------------------------------------------------------------
3 STANDBY /opt/app/oracle/flash_recovery_area/orcl/onlinelog/o1_mf_3_5bvzkzgs_.log
4 STANDBY /opt/app/oracle/flash_recovery_area/orcl/onlinelog/o1_mf_4_5bvzl8hf_.log

10.) Enable flash back database:

alter database flashback on
*
ERROR at line 1:
ORA-38759: Database must be mounted by only one instance and not open.

Well need to mount the database to run this command.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 2684354560 bytes
Fixed Size 2086352 bytes
Variable Size 570427952 bytes
Database Buffers 2097152000 bytes
Redo Buffers 14688256 bytes
Database mounted.
SQL> alter database flashback on;
Database altered.

SQL> alter system set db_flashback_retention_target = 60 scope=both;
System altered.

SQL> select flashback_on from v$database;
FLASHBACK_ON
------------------
YES

SQL> show parameter db_flashback_retention_target
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_flashback_retention_target integer 60

SQL> alter database set standby database to maximize availability;
Database altered.

SQL> select protection_mode from v$database;
PROTECTION_MODE
--------------------
MAXIMUM AVAILABILITY

Enable Flashback DB on Standby also.
SQL> alter database recover managed standby database cancel;
Database altered.

SQL> alter database flashback on;
Database altered.

SQL> alter system set db_flashback_retention_target = 60 scope=both;
System altered.

11.) Muliplex the data broker configuration files in 2 different disk, for protection.

SQL> show parameter dg_broker_config_file
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
dg_broker_config_file1 string /opt/app/oracle/10.2/dbs/dr1orcl.dat
dg_broker_config_file2 string /opt/app/oracle/10.2/dbs/dr2orcl.dat

SQL> alter system set dg_broker_config_file2='/opt/app/oracle/oradata/orcl/dr2orcl.dat';
alter system set dg_broker_config_file2='/opt/app/oracle/oradata/orcl/dr2orcl.dat'
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-16573: attempt to change configuration file for an enabled broker configuration

Well looks like the above problem is not easy to resolve. We need to disable the configuration, and also stop DMON process by doing alter system set dg_broker_start='false';

And then we can change the location of these files. Well I am not doing that right now. As I saw a bug filed on metalink that there are errors in restarting DMON after change in the file location. (BUG:5955795).

Well now doing the change of location.
SQL> ALTER SYSTEM SET DG_BROKER_START=FALSE;

System altered.
[oracle@APP3 dbs]$ mv dr2orcl.dat old.dr2orcl.dat
[oracle@APP3 dbs]$ mv dr1orcl.dat old.dr1orcl.dat

SQL> alter system set dg_broker_config_file2='/opt/app/oracle/oradata/orcl/dr2orcl.dat';
System altered.

SQL> ALTER SYSTEM SET DG_BROKER_START=TRUE;
System altered.

SQL> show parameter dg_broker_config_file
NAME TYPE VALUE
-----------------------------------------------------------------------------
dg_broker_config_file1 string /opt/app/oracle/10.2/dbs/dr1regdb.dat
dg_broker_config_file2 string /opt/app/oracle/oradata/orcl/dr2orcl.dat


DGMGRL> create configuration 'DBTEST' as primary database is 'orcl' connect identifier is 'orcl';
Configuration "DBTEST" created with primary database "orcl"
DGMGRL> show configuration

Configuration
Name: DBTEST
Enabled: NO
Protection Mode: MaxPerformance
Fast-Start Failover: DISABLED
Databases:
orcl - Primary database

Current status for "DBTEST":
DISABLED

DGMGRL> ALTER CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY ;
Succeeded.
DGMGRL> show configuration

Configuration
Name: DBTEST
Enabled: NO
Protection Mode: MaxAvailability
Fast-Start Failover: DISABLED
Databases:
orcl - Primary database

Current status for "DBTEST":
DISABLED

FYI very useful metalink 808783.1 (Step By Step How to Recreate Dataguard Broker Configuration)

12.) Edit database setting in data guard like

DGMGRL> edit database orcl set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database orcl set property NetTimeout=10;
Property "nettimeout" updated
DGMGRL> edit database orcl1 set property NetTimeout=10;
Property "nettimeout" updated
DGMGRL> enable configuration
Error: ORA-16627: operation disallowed since no standby databases would remain to support protection mode

Enable the LogXptMode to SYNC for standby also.
DGMGRL> edit database orcl1 set property LogXptMode='SYNC';
Property "logxptmode" updated

DGMGRL> enable configuration
Operation requires shutdown of instance "orcl" on database "orcl"
Shutting down instance "orcl"...
Database closed.
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "orcl" on database "orcl"

Startup the DB manually.

We I had some more errors:
Warning: ORA-16608: one or more databases have warnings
Error: ORA-16795: database resource guard detects that database re-creation is required

i did
DGMGRL> enable resource orcl1
Enabled.

Well did not help. I problem was grave, I needed standby redo log files in the standby DB also.
SQL> alter database recover managed standby database cancel;
Database altered.

SQL> alter database add standby logfile group 3 size 50M;
Database altered.

SQL> alter database add standby logfile group 4 size 50M;
Database altered.

GROUP# STATUS TYPE MEMBER IS_
--------------------------------------------------------------------------------- ---
1 ONLINE /home/oracle/oradata/orcl/redo01.log NO
2 ONLINE /home/oracle/oradata/orcl/redo02.log NO
3 STANDBY /home/oracle/oradata/flash_recovery_area/orcl/orcl1/onlinelog/o1_mf_3_5c3kdj01_.log YES
4 STANDBY /home/oracle/oradata/flash_recovery_area/orcl/orcl1/onlinelog/o1_mf_4_5c3kdw63_.log YES

I decide to start redo apply !!
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE disconnect;

Database altered.

Well some more errors: in data guard log.
RSM0: HEALTH CHECK WARNING: ORA-16826: apply service state is inconsistent with the DelayMins property
DGMGRL> show site verbose orcl
Site
Name: 'orcl'
Hostname: 'APP3'
Instance name: 'orcl'
Service Name: 'orcl'
Standby Type: 'physical'
Enabled: 'yes'
Required: 'yes'
Default state: 'PRIMARY'
Intended state: 'PRIMARY'
PFILE: ''
Number of resources: 1
Resources:
Name: orcl (default) (verbose name='orcl')
Current status for "orcl":
Warning: ORA-16570: operation requires restart of database "orcl"

Need to restart the primary DB.
Whenever you see any error in the "show configuration" output. Check "show site verbose " to find which site has what error.
Well after the restart of the primary database:
DGMGRL> show configuration

Configuration
Name: DBTEST
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
orcl - Primary database
orcl1 - Physical standby database
- Fast-Start Failover target

Current status for "DBTEST":
SUCCESS

13.) Enable Fast-start failover:

DGMGRL> enable fast_start failover;
Enabled.

14.) Start Observer:
Idealy observer should be started on another host, to observe the primary, as if the host goes down it can monitor the outage and do specify actions.
DGMGRL> start observer;
Observer started
The session will seem to hang. You can do a control C. Or create and run the script as nohup.
eg.
#!/bin/sh

dgmgrl sys/oracle <start observer;
!
Another Way:
CentOS4.4-64:wrpws:oracle > nohup dgmgrl sys/oracle@orcl "start observer file='$ORACLE_HOME/dbs/fsfo.dat'" -logfile $HOME/observer.log &

And another way to start:
[oracle@APP3 scripts]$ dgmgrl -logfile $HOME/observer.log sys/oracle@orcl "start observer" &

15. Lets test the configuration:
SQL> select FS_FAILOVER_STATUS,FS_FAILOVER_CURRENT_TARGET,FS_FAILOVER_THRESHOLD,FS_FAILOVER_OBSERVER_PRESENT,FS_FAILOVER_OBSERVER_HOST from v$database;

FS_FAILOVER_STATUS FS_FAILOVER_CURRENT_TARGET FS_FAILOVER_THRESHOLD FS_FAIL FS_FAILOVER_OBSERVER_HOST
--------------------- ------------------------------ --------------------- ------- ------------------------------
SYNCHRONIZED orcl1 30 YES wrpws


Switchover in both directions:
DGMGRL for Linux: Version 10.2.0.4.0 - 64bit Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> show configuration

Configuration
Name: DBTEST
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
orcl - Primary database
orcl1 - Physical standby database
- Fast-Start Failover target

Current status for "DBTEST":
SUCCESS

DGMGRL> switchover to orcl1;
Performing switchover NOW, please wait...
Operation requires shutdown of instance "orcl" on database "orcl"
Shutting down instance "orcl"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires shutdown of instance "orcl" on database "orcl1"
Shutting down instance "orcl"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "orcl" on database "orcl"
Starting instance "orcl"...
Unable to connect to database
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

Failed.
You are no longer connected to ORACLE
Please connect again.
Unable to start instance "orcl"
You must start instance "orcl" manually
Operation requires startup of instance "orcl" on database "orcl1"
You must start instance "orcl" manually
Switchover succeeded, new primary is "orcl1"

I had to manually start up the databases to see if they switched well. They did reverse the roles both sides.
Checked metalink note: 308943.1 to resolve the above problem.
The listener.ora entries:
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = orcl)
(ORACLE_HOME = /home/oracle/10.2)
(SID_NAME = orcl)
(SERVICE_NAME = orcl1)
)
(SID_DESC =
(GLOBAL_DBNAME = orcl1_DGMGRL) ##--->Should be db_unique_name_DGMGRL.db_domain
(ORACLE_HOME = /home/oracle/10.2)
(SID_NAME = orcl)
(SERVICE_NAME = orcl1)
)
)

After fixing the listener issue, the error was rectified and the switchover was working seemlessly.
DGMGRL> switchover to orcl1;
Performing switchover NOW, please wait...
Operation requires shutdown of instance "orcl" on database "orcl"
Shutting down instance "orcl"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires shutdown of instance "orcl" on database "orcl1"
Shutting down instance "orcl"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "orcl" on database "orcl"
Starting instance "orcl"...
ORACLE instance started.
Database mounted.
Operation requires startup of instance "orcl" on database "orcl1"
Starting instance "orcl"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "orcl1"
DGMGRL> show configuration

Configuration
Name: DBTEST
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
orcl - Physical standby database
- Fast-Start Failover target
orcl1 - Primary database

Current status for "DBTEST":
SUCCESS

Switchover Back to the Primary:

DGMGRL> switchover to orcl;
Performing switchover NOW, please wait...
Operation requires shutdown of instance "orcl" on database "orcl1"
Shutting down instance "orcl"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires shutdown of instance "orcl" on database "orcl"
Shutting down instance "orcl"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "orcl" on database "orcl1"
Starting instance "orcl"...
ORACLE instance started.
Database mounted.
Operation requires startup of instance "orcl" on database "orcl"
Starting instance "orcl"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "orcl"
DGMGRL>

This completes the testing for switchover.
Its takes about 2.5 mins to switchover.

Now the testing for Failover:

Check that there is enough flashback data available in the primary database, atleast 30-40 mins.

SQL> select (sysdate - oldest_flashback_time)*24*60 as history from v$flashback_database_log;

HISTORY
----------
486.133333

Lets create a table and put some data to see if there is any data loss in the failover.

SQL> create table x as select * from all_objects;

Table created.

SQL> select count(*) from x;

COUNT(*)
----------
9915

SQL> shutdown abort;
ORACLE instance shut down.
SQL> exit

Observer Log:

[W000 09/17 09:53:56.79] Failed to start the Observer.
Error: ORA-16647: could not start more than one observer

Failed.

[W000 09/17 10:02:22.38] Failed to start the Observer.
Observer started
[W000 09/17 10:02:53.82] Observer started.
Observer stopped
Observer started
[W000 09/17 10:15:42.88] Observer started.

15:14:04.26 Friday, September 18, 2009
Initiating fast-start failover to database "orcl1"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "orcl1"
15:14:13.79 Friday, September 18, 2009

Status from dgmgrl on the new primary:
CentOS4.4-64:wrpws:oracle > dgmgrl sys/oracle
DGMGRL for Linux: Version 10.2.0.4.0 - 64bit Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> show configuration

Configuration
Name: DBTEST
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
orcl - Physical standby database (disabled)
- Fast-Start Failover target
orcl1 - Primary database

Current status for "DBTEST":
Warning: ORA-16608: one or more databases have warnings

Record count from the new primary:
CentOS4.4-64:wrpws:oracle > sqlplus sys as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 18 15:17:32 2009

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Enter password:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select count(*) from x;

COUNT(*)
----------
9915

Now to reinstate the old primary as standby:
I started the old primary in mount mode, after about 30 sec I see activity in the observer.log
15:25:27.74 Friday, September 18, 2009
Initiating reinstatement for database "orcl"...
Reinstating database "orcl", please wait...
Operation requires shutdown of instance "orcl" on database "orcl"
Shutting down instance "orcl"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "orcl" on database "orcl"
Starting instance "orcl"...
ORACLE instance started.
Database mounted.
Continuing to reinstate database "orcl" ...
Reinstatement of database "orcl" succeeded
15:26:27.32 Friday, September 18, 2009

When you do show configuration in dgmgrl you will get the following error:
Warning: ORA-16817: unsynchronized Fast-Start Failover configuration

Just switch a couple fo logfile (alter system switch logfile;) in sqlplus and then check the output of dgmgrl, it should be fine:
CentOS4.4-64:wrpws:oracle > dgmgrl sys/oracle
DGMGRL for Linux: Version 10.2.0.4.0 - 64bit Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> show configuration

Configuration
Name: DBTEST
Enabled: YES
Protection Mode: MaxAvailability
Fast-Start Failover: ENABLED
Databases:
orcl - Physical standby database
- Fast-Start Failover target
orcl1 - Primary database

Current status for "DBTEST":
SUCCESS

DGMGRL>

Well this completes my testing and implementation of Fast-Start Failover!!!
I hope this helps.

4 Comments:

Bharat Damarla said...

Excellent Stuff :)

Unknown said...

Apun,
Awesome document. It really helps.

Thanks,
Perminder (Ex-PATH infotech. Hope you still remember me)

MIR ALI said...

Very cool. Would be helpful if you can distinguish which steps are for Primary and standby site?

MIR ALI said...

Very cool. Would be great if you can separate the Primary/standby steps.
Thanks