Oracle Data Guard In A Cloud - Part VI

Look out, I am watching you!

Ritesh Patel

Hi, I am Ritesh Patel. I live in a beautiful town surrounded by the mountains. C&O Canal is few miles away. State parks are only a distance away & bike trails galore. It is home sweet home Frederick, MD. A passionate developer. Love to cook. Enjoy playing "Bollywood Tunes" on my harmonica. Apart from that just a normal guy.

In previous article we configured Oracle Data Guard Broker. We also learned how to switchover databases using the broker. It was a great test and it assured us that our active data guard environment is configured correctly.

But wait a second: we did switchover manually using the command line. Now think, what happens if the primary server goes down in the middle of the night. In usual scenario you have to wake up one of your DBA(s) to log in and failover primary to standby server so business is back to normal, right? In other words your DBA will wake up, login and make physical standby database as a new primary database.

Now the scenario I gave above is very generic. I mean depending on your requirements and service level agreements may be you are willing to tolerate some down time or wake up your DBA in the middle of the night. I just don't know, but there has to be a better way to handle this situation, right? And there is one.

Oracle active data guard has one more built in component called: Observer. It does exactly that: It Observes. Observer when configured observes your active data guard environment. Once the observer is started, no further user interaction is required.

If observer and the standby database lose connectivity to the primary database for longer than the threshold specified by FastStartFailoverThreshold configuration property, the observer will initiate a fast start failover to the standby database. Another tongue twister.

We must use Oracle Data Guard Observer for the "Fast Start Failover(s)" and run our active data guard environment in a maximum performance or maximum availability mode for Fast-Start Failover(s).

Below is how the observer works in a fast start failover scenario.

Before Fast Start Failover: Data Guard is operating in a steady state, with the primary database transmitting redo data to the target standby database and the observer monitoring the state of the entire configuration.

Fast Start Failover Ensues: Disaster strikes the primary database and its network connections to both the observer and the target standby database are lost. Upon detecting the break in communication, the observer attempts to reestablish a connection with the primary database for the amount of time defined by the FastStartFailoverThreshold property before initiating a fast-start failover. If the observer is unable to regain a connection to the primary database within the specified time, and the target standby database is ready for fast-start failover, then fast-start failover ensues.

After Fast-Start Failover: The fast-start failover has completed and the target standby database is running in the primary database role. After the former primary database has been repaired, the observer reestablishes its connection to that database and reinstates it as a new standby database. The new primary database starts transmitting redo data to the new standby database.

Note: Observer must run on a separate machine. Additionally, this server must run same OS and Oracle binaries as primary and standby servers.

Configure Observer for Fast-Start Failover

Remember, Observer runs on a separate server and it doesn't know anything about primary or standby server yet. We have to configure service names on the server that will run the observer. Let's do that first.

Did you copy our useful scripts on to the server that will run the observer? If not, then do so.

Now locate the tnsnames file under oracle home and open it in a vi editor.

[oracle@ob_server ora_scripts] cd $ORACLE_HOME/dbhome_1/network/admin
[oracle@ob_server admin] vi tnsnames.ora

Cut and paste the content below in the tnsnames.ora file.

# tnsnames.ora Network Configuration File: /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

CSSDEV =
  (DESCRIPTION =
     (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = primary_server)(PORT = 1521))
     )
     (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = cssdev)
    )
  )

CSSDG1 =
 (DESCRIPTION =
  (ADDRESS_LIST =
   (ADDRESS = (PROTOCOL = TCP)(HOST = standby_server)(PORT = 1521))
  )
  (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = cssdg1)(UR=A)
  )
 )

Run tnsping tests to ensure these changes are working appropriately.

Standby Redo Logs

Ensure standby redo logs are configured on primary and standby server(s). Go to SQL prompt and run these SQL commands on primary server.

[oracle@primary_server ora_scripts] ./sqlplus.sh

/*set col formatting*/
SQL > col member format a50;

/*check the log members and type*/
SQL > SELECT member, type from v$logfile;

You should see an output below on the primary server.

MEMBER                                              TYPE
-------------------------------------------------- -------
+DATA/cssdev/onlinelog/group_3.266.864573377       ONLINE
+DATA/cssdev/onlinelog/group_3.267.864573377       ONLINE
+DATA/cssdev/onlinelog/group_2.264.864573373       ONLINE
+DATA/cssdev/onlinelog/group_2.265.864573375       ONLINE
+DATA/cssdev/onlinelog/group_1.262.864573371       ONLINE
+DATA/cssdev/onlinelog/group_1.263.864573373       ONLINE
+DATA/cssdev/onlinelog/group_4.295.867856577       STANDBY
+DATA/cssdev/onlinelog/group_5.298.867856591       STANDBY
+DATA/cssdev/onlinelog/group_6.292.867856603       STANDBY

Perform same check on the standby server(s). If standby redo logs are configured for both servers then you should see them in the resultset.

Here is the syntax for adding the standby redo logs. Perform these steps on each standby.

[oracle@primary_server ora_scripts] ./sqlplus.sh
SQL > alter database add standby logfile group 'group#' '+DATA' size 50M;

Add groups as per your requirements.

SYNC LogXptMode

LogXptMode must be set to SYNC on primary and standby server(s).

Go to DGMGRL command line interface on the standby server.

[oracle@standby_server ora_scripts] ./dgmgrl.sh

/*connect to primary database*/
DGMGRL > connect sys/yourpassword@cssdev;

/*set LogXptMode for primary database*/
DGMGRL > edit database 'cssdev' set property LogXptMode = 'SYNC';

/*set LogXptMode for standby database */
DGMGRL > edit database 'cssdg1' set property LogXptMode = 'SYNC';

/*confirm primary database LogXptMode*/
DGMGRL > show database verbose 'cssdev';

/*confirm standby database LogXptMode*/
DGMGRL > show database verbose 'cssdg1';

Protection Mode

You could be running active data guard in a Max Performance Mode. Let's upgrade the protection mode to Max Availability Mode. There are 3 protection modes offered by active data guard. Here is the explanation on each mode straight from Oracle Docs.

Max Availability mode
This protection mode provides the highest level of data protection that is possible without compromising the availability of a primary database. Transactions do not commit until all redo data needed to recover those transactions has been written to the online redo log and to at least one synchronized standby database. If the primary database cannot write its redo stream to at least one synchronized standby database, it operates as if it were in maximum performance mode to preserve primary database availability until it is again able to write its redo stream to a synchronized standby database.

This mode ensures that no data loss will occur if the primary database fails, but only if a second fault does not prevent a complete set of redo data from being sent from the primary database to at least one standby database.

Max Performance mode
This protection mode provides the highest level of data protection that is possible without affecting the performance of a primary database. This is accomplished by allowing transactions to commit as soon as all redo data generated by those transactions has been written to the online log. Redo data is also written to one or more standby databases, but this is done asynchronously with respect to transaction commitment, so primary database performance is unaffected by delays in writing redo data to the standby database(s).

This protection mode offers slightly less data protection than maximum availability mode and has minimal impact on primary database performance.

This is the default protection mode.

Max Protection mode
This protection mode ensures that zero data loss occurs if a primary database fails. To provide this level of protection, the redo data needed to recover a transaction must be written to both the online redo log and to at least one synchronized standby database before the transaction commits. To ensure that data loss cannot occur, the primary database will shut down, rather than continue processing transactions, if it cannot write its redo stream to at least one synchronized standby database.

Because this data protection mode prioritizes data protection over primary database availability, Oracle recommends that a minimum of two standby databases be used to protect a primary database that runs in maximum protection mode to prevent a single standby database failure from causing the primary database to shut down.

While on the DGMGRL prompt, run following command.

DGMGRL > edit configuration set protection mode as ‘MaxAvailability’;

/*check the configuration again*/
DGMGRL > show configuration;

Your configuration should look something like below.

Configuration - DRSolution

  Protection Mode: MaxAvailability
  Databases:
    cssdev - Primary database
    cssdg1 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

As you notice "Fast-Start Failover" still says DISABLED. Why? Because we haven't enabled it yet :-)

Enable Fast-Start Failover

While at the DGMGRL prompt run the command below.

DGMGRL > enable fast_start failover;

Note: If you run into error Ora-16651: requirements not met for enabling fast_start failover, then look for the options below to fix the error.

Ensure standby redo logs are configured on the primary and target standby databases.
Ensure the LogXptMode Property is set to SYNC.
Set the FastStartFailoverTarget configuration property.

[oracle@primary_server ora_scripts] ./dgmgrl
DGMGRL > connect sys/password@cssdev;
DGMGRL > edit database 'cssdev' set property FastStartFailoverTarget='cssdg1';
DGMGRL > edit database 'cssdg1' set property FastStartFailoverTarget='cssdev';
/* confirm changes for cssdev */
DGMGRL > show database verbose 'cssdev';
/*confirm changes for cssdg1 */
DGMGRL > show database verbose 'cssdg1';

Upgrade the protection mode to MAXAVAILABILITY, if necessary.
Enable Flashback Database on the primary and target standby databases.

[oracle@primary_server ora_scripts] ./sqlplus.sh
/* disable managed recovery on standby */
SQL > alter database managed recover standby database cancel;
/* turn flashback on */
SQL > alter database flashback on;

After enabling the Fast-Start Failover let's check the DGMGRL configuration one more time.

Configuration - DRSolution

  Protection Mode: MaxAvailability
  Databases:
    cssdev - Primary database
      Warning: ORA-16819: fast-start failover observer not started

    cssdg1 - (*) Physical standby database
      Warning: ORA-16819: fast-start failover observer not started

Fast-Start Failover: ENABLED

Configuration Status:
WARNING

Ah, I see some Warnings. Not to worry. It's perfectly normal. We have not started the Observer yet. But did you notice that Fast-Start Failover is enabled? Perfect.

Start Observer

Now we are ready to turn on the Observer. Remember we have always used DGMGRL on the standby server. To turn on the Observer we have to use DGMGRL on the server that will be running the Observer. Additionally, I recommend running Observer process in the background. Why?

Let's say you are connected to your cloud servers via SSH. Now you turn on the Observer via SSH client. All great, except as soon as you disconnect from your SSH client the Observer process is killed and now we don't have anyone observing the data guard environment.

So how do you start a process in the background? Use nohup to start the observer. Below is the script to turn on the Observer as a background process.

#!/bin/bash

"Setting up Oracle environment"
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
echo "ORACLE_HOME set to: " $ORACLE_HOME
cd $ORACLE_HOME/bin
nohup ./dgmgrl -silent sys/yourpassword@cssdev "start observer" &

Verify Observer
So we configured and started the Observer. How do we know it is really watching our active data guard environment? Well go to the standby server and start DGMGRL. I know we are a bit bouncing around, but stay with me. We are almost done here.

[oracle@standby_server ora_scripts] ./dgmgrl.sh

/*connect to primary database*/
DGMGRL > connect sys/yourpassword@cssdev;

/*show configuration*/
DGMGRL > show configuration verbose;

Your configuration should look like below.

Configuration - DRSolution

  Protection Mode: MaxAvailability
  Databases:
    cssdev - Primary database
    cssdg1 - (*) Physical standby database

  (*) Fast-Start Failover target

Fast-Start Failover: ENABLED

  Threshold:        30 seconds
  Target:           cssdg1
  Observer:         Ob_Server
  Lag Limit:        30 seconds (not in use)
  Shutdown Primary: TRUE
  Auto-reinstate:   TRUE

Configuration Status:
SUCCESS

And voila! What do we see here?

Fast-Start Failover is Enabled
We see name of the server where Observer is running
Threshold is set to 30 seconds. Which means if Observer is not able to connect to primary for 30 seconds it will initiate a failover.I went with the defaults. You can adjust the threshold as per your requirements.
Auto-reinstate: TRUE. Yes, Observer will start the reinstate process of the failed primary database once it is able to reconnect with the primary database.

Cool stuff, right?

So let's recap what we have so far:

Active Data Guard
Data Guard Broker
Data Guard Observer

So far we did all configurations on the servers. But what about clients? How will a client connect to our environment? What happens if a client is connected via an application and for some reason we switchover the databases. What happens to an existing connection? In short we need to test our active data guard from the client side.

Good news again. In the next article we will make few more changes for any application to work seamlessly with switchover(s) and failover(s).

You can say: this will be an icing on the cake.

See you then!

Ritesh Patel

January 17, 2015

Subscribe to this blog

You can also find Ritesh Patel on Twitter and LinkedIn. Get in touch by Email.

Oracle Data Guard In A Cloud - Part VI

Look out, I am watching you!

Share this article with friends