guide to using sample in expdp for extracting random data in oracle

One of the common challenges in operational and testing environments is extracting a portion of database data for analysis, migration, or testing. Oracle Data Pump (expdp) provides a feature called SAMPLE, which helps extract a subset of data from one or multiple tables without copying the entire dataset.

In this article, we will explore the correct use of SAMPLE in expdp in two different scenarios and address issues related to log tables, foreign keys, and constraints.


What is SAMPLE and How Does It Work?

The SAMPLE parameter in expdp enables random extraction of a percentage of data from each table. This method is efficient, but there are important considerations:

  • SAMPLE works at the data block level, not at the row level.
  • It does not guarantee that exactly the specified percentage of data will be retrieved.
  • It applies to all tables by default but can be limited to specific tables.
  • It works with BLOB and CLOB data types, but some values may not be retained.

To better understand its functionality, we will create a large table in the HR schema, simulating log tables in operational schemas. Typically, schemas contain both core tables and log tables, where log tables occupy most of the database storage.


Creating a Large Table for Testing

The following commands create a large table in the HR schema, simulating a log table with a high volume of data:

sys@orcl(41)>  CREATE TABLE hr.log_table AS SELECT * FROM dba_extents;

Table created.

sys@orcl(41)> INSERT INTO hr.log_table SELECT * FROM hr.log_table;

7888 rows created.

sys@orcl(41)> /
15776 rows created.

sys@orcl(41)> /
31552 rows created.

sys@orcl(41)> /
63104 rows created.

sys@orcl(41)> /
126208 rows created.

sys@orcl(41)> /
252416 rows created.

sys@orcl(42)> SELECT COUNT(*) FROM "HR"."LOG_TABLE";

  COUNT(*)
----------
   1009664

Now, let's explore two different scenarios where SAMPLE can be useful.


Scenario 1: Exporting 10% of All Tables in HR Schema

The following expdp command exports 10% of all tables in the HR schema:

 

expdp directory=oracle_base dumpfile=sample_10_percent.dmp logfile=sample_10_percent.log schemas=hr sample=10

 

Export Process Output:

 

. . exported "HR"."LOG_TABLE"                            5.908 MB  101087 rows
. . exported "HR"."EMPLOYEES"                            10.17 KB       9 rows
. . exported "HR"."LOCATIONS"                            7.320 KB       2 rows
. . exported "HR"."JOB_HISTORY"                          6.843 KB       0 rows
. . exported "HR"."JOBS"                                 6.445 KB       1 rows
. . exported "HR"."DEPARTMENTS"                          6.523 KB       2 rows
. . exported "HR"."COUNTRIES"                            5.937 KB       2 rows
. . exported "HR"."REGIONS"                              5.476 KB       0 rows

 

Result: Only about 10% of the data from each table is exported.


Handling Foreign Key Dependencies During Import (impdp)

During the import process, foreign key constraints may fail due to missing parent keys.

To demonstrate, we rename the schema to hrtest2 before import:

 

impdp directory=oracle_base dumpfile=sample_10_percent.dmp logfile=imp_sample_10_percent.log remap_schema=hr:hrtest2

 

Import Process Output with Errors:

 

ORA-02298: cannot validate (HRTEST2.DEPT_LOC_FK) - parent keys not found
ORA-02298: cannot validate (HRTEST2.EMP_MANAGER_FK) - parent keys not found
ORA-02298: cannot validate (HRTEST2.EMP_JOB_FK) - parent keys not found

 

Problem: Many constraints fail because SAMPLE=10 only exports a portion of each table, which can break parent-child relationships in foreign keys.

✅ Solution: Disable Constraints During Import

To avoid these issues, exclude constraints during import:

 

impdp directory=oracle_base dumpfile=sample_10_percent.dmp logfile=imp_sample_10_percent.log remap_schema=hr:hrtest2 exclude=CONSTRAINT

 

Note: This solution works, but it may not be ideal for production-like environments that require constraints to be enforced.


Scenario 2: Exporting 10% Data from Only Log Tables

A better approach is to export only log tables with SAMPLE, while keeping core tables intact.

 

expdp directory=oracle_base dumpfile=sample_one_table.dmp logfile=sample_one_table.log schemas=hr sample=HR.LOG_TABLE:10

 

Export Process Output:

 

. . exported "HR"."LOG_TABLE"                            5.885 MB  100707 rows
. . exported "HR"."EMPLOYEES"                            17.08 KB     107 rows
. . exported "HR"."LOCATIONS"                            8.437 KB      23 rows
. . exported "HR"."JOB_HISTORY"                          7.195 KB      10 rows
. . exported "HR"."JOBS"                                 7.109 KB      19 rows
. . exported "HR"."DEPARTMENTS"                          7.125 KB      27 rows
. . exported "HR"."COUNTRIES"                            6.367 KB      25 rows
. . exported "HR"."REGIONS"                              5.546 KB       4 rows

 

Result:

  • The log table (LOG_TABLE) is exported with only 10% of its data.
  • All other core tables are exported completely.
  • This ensures that foreign keys remain intact while keeping log tables small.

Importing Data Without Issues

 

impdp directory=oracle_base dumpfile=sample_one_table.dmp logfile=imp_sample_one_table.log remap_schema=hr:hrtest

 

Success! The import completes without any constraint issues.