Oracle Data Guard 19c Deployment Series from Zero to Fast Failover part 7 auto fast failover

Up to this point, most companies follow the same path.
However, only some organizations continue beyond this stage.

There are many reasons for this. One of the most important is that the primary database may be affected in terms of performance, because Data Guard is configured in SYNC mode and operates in a High Availability setup.
In such a configuration, the network must be fully stable and reliable, along with other operational considerations.

Now let’s move to the most important goal of this article: Fast Failover.


1. Checking Overall Data Guard Status

On one of the nodes:

dgmgrl sys/oracle@vahiddbdc2
show configuration;

At this stage, we verify that:

  • vahiddbdc2Primary

  • vahiddbPhysical Standby

  • Configuration status → SUCCESS

  • Protection Mode → initially MaxPerformance


2. Creating an Application Service on Both Databases (on the PDB, and ONLY for PRIMARY)

On both nodes (using each database’s own name):

On vahiddbdc2 (current Primary):.

srvctl add service -db vahiddbdc2 -service APP_SERVICE -pdb vahidpdb -role PRIMARY -policy AUTOMATIC
srvctl start service -db vahiddbdc2 -service APP_SERVICE

On vahiddb (current Standby):

srvctl add service -db vahiddb -service APP_SERVICE -pdb vahidpdb -role PRIMARY -policy AUTOMATIC

On vahiddb, we do not start the service manually, because this service must come up automatically via Clusterware/Broker only when this database becomes PRIMARY.


3. Verifying Services in the Listener

On each node:

lsnrctl status

On the Primary node, you should see:

  • Service "APP_SERVICE" ... status READY

On the Standby node, APP_SERVICE must not be READY (unless it was mistakenly started manually).


4. Setting Redo Transport to SYNC for Zero Data Loss

In dgmgrl:

EDIT DATABASE vahiddbdc2 SET PROPERTY LogXptMode='SYNC';
EDIT DATABASE vahiddb    SET PROPERTY LogXptMode='SYNC';

5. Changing Protection Mode to MaxAvailability

In the same dgmgrl session:

EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;

At this point, the Zero Data Loss requirement for FSFO is satisfied, because:

  • Protection Mode = MAXAVAILABILITY

  • Redo Transport = SYNC


6. Enabling Fast-Start Failover (FSFO)

In dgmgrl:

ENABLE FAST_START FAILOVER;

Important output:

Enabled in Zero Data Loss Mode.

Setting the Threshold

EDIT DATABASE vahiddbdc2 SET PROPERTY FastStartFailoverThreshold = 30;

7. Configuring tnsnames.ora on Windows (Oracle Client)

For Fast Failover to work correctly, an Observer must run on a machine other than the two database servers.
I selected the Windows system hosting the virtual machines. Since Oracle Database was not installed on that machine, I installed Oracle Client as Administrator.

On Windows, under the following path:

c:\app\client\Vahid\product\19.0.0\client_1\network\admin\

we create three entries:

vahiddb =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.21)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = vahiddb)
    )
  )

vahiddbdc2 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.22)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = vahiddbdc2)
    )
  )

APP_SERVICE =
 (DESCRIPTION =
   (ADDRESS_LIST =
     (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.21)(PORT = 1521))
     (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.22)(PORT = 1521))
   )
   (CONNECT_DATA =
     (SERVICE_NAME = APP_SERVICE)
   )
   (CONNECT_TIMEOUT = 5)
   (RETRY_COUNT = 3)
   (RETRY_DELAY = 5)
   (TRANSPORT_CONNECT_TIMEOUT = 3)
 )

  • vahiddb and vahiddbdc2 → for direct administrative access to each database (sqlplus, rman, dgmgrl, etc.)

  • APP_SERVICE → for applications (the application connection string)


8. Starting the Observer on Windows

On Windows:

 
dgmgrl
connect sys@vahiddb
start observer;

From this point forward:

  • The Observer continuously monitors the Primary

  • If the Primary is not reachable for the configured threshold time (e.g., 30 seconds), FSFO is triggered

  • The Standby (vahiddb) automatically becomes the new Primary

  • Later, when the former Primary (vahiddbdc2) comes back, it will be automatically reinstated (Auto-Reinstate) and rejoin as Standby


Why Do We Create a Separate Service Named APP_SERVICE?

1) Separating Application Traffic from Administrative Traffic

Default services such as vahiddb, vahiddbdc2, and vahidpdb are intended for DBAs and management tools.

By creating an independent service (APP_SERVICE), we are effectively saying:

“Anything that belongs to the application must enter only through this door.”


2) Ensuring Connections Go Only to the Primary

By using the -role PRIMARY option in srvctl add service, Oracle Clusterware ensures that:

  • APP_SERVICE is started only when the database role is PRIMARY

After a switchover or failover, the service automatically moves to the new Primary.

Result: the application will never accidentally connect to the Standby.


3) Integration with Data Guard Broker and FSFO

When FSFO happens, not only does the database role change—services must move accordingly as well.

Using role-based services ensures that without changing the connection string, the application always connects to the current Primary.


4) Readiness for More Complex Scenarios

In the future, you can define different services for read/write, read-only, reporting, batch operations, and more.

The same pattern is extensible; only the roles and preferred/available instances change.


Why Is the APP_SERVICE Connection String Designed This Way, and What MUST Be Included in Applications?

1) Using ADDRESS_LIST with Both Servers

In APP_SERVICE:

(ADDRESS_LIST =
  (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.21)(PORT = 1521))
  (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.22)(PORT = 1521))
)

This tells the client that there are two possible connection endpoints.

If one node is completely down (or the listener is not reachable), Oracle Client can try the next address.

This is the foundation of client-side failover.


2) Timeouts (These Are the Two Critical Parameters for Applications)

The minimum required settings for reliable connections in an FSFO environment are

(CONNECT_TIMEOUT = 5)
(TRANSPORT_CONNECT_TIMEOUT = 3)

  • CONNECT_TIMEOUT: Maximum time the client waits to connect to a specific address.

  • TRANSPORT_CONNECT_TIMEOUT: TCP-level timeout before a session is even established.

If you do not set these:

  • When the old Primary goes down,

  • the client may hang for a long time on the dead address,

  • even if FSFO has already completed and the new Primary is ready, users will continue to see errors/timeouts for an extended period.

That is why, in real applications, at least these two timeouts must be configured—either in the TNS file or inside the connection string in code.


3) Helper Parameters: RETRY_COUNT and RETRY_DELAY

(RETRY_COUNT = 3)
(RETRY_DELAY = 5)
  • RETRY_COUNT: number of retry attempts

  • RETRY_DELAY: delay between retries

These help when a node is temporarily unreachable or during failover, allowing the client to retry automatically before returning an error to the application.


4) Important Recommendation

For applications, always use APP_SERVICE, not vahiddb or vahiddbdc2.

  • vahiddb and vahiddbdc2 are suitable only for administration, monitoring, and manual testing.

  • In environments where FSFO and role-based services are enabled, configuring timeouts (at least CONNECT_TIMEOUT and TRANSPORT_CONNECT_TIMEOUT) is an essential part of connection string design, not a decorative option.