 We may want to slim down a database schema, pruning old data from all tables, for a number of reasons. Let’s say to set up the test environment for a data migration project, a replica of the legacy database schema has been created. The legacy database schema has millions of records spanning multiple years, but for testing migration scripts in the replica schema, just the most recent records – those of the current year – will do.
What is the simplest way to set up the schema so that it contains only the current year’s rows, and the rest of the rows are deleted? Consider that each table in the schema has a standard audit column CREATED_DATETIME to store when the row was first inserted into the table.
We tried out two approaches to delete old data from the schema: one a direct method, the other a slightly roundabout but better performing method. The two approaches are:
We may want to slim down a database schema, pruning old data from all tables, for a number of reasons. Let’s say to set up the test environment for a data migration project, a replica of the legacy database schema has been created. The legacy database schema has millions of records spanning multiple years, but for testing migration scripts in the replica schema, just the most recent records – those of the current year – will do.
What is the simplest way to set up the schema so that it contains only the current year’s rows, and the rest of the rows are deleted? Consider that each table in the schema has a standard audit column CREATED_DATETIME to store when the row was first inserted into the table.
We tried out two approaches to delete old data from the schema: one a direct method, the other a slightly roundabout but better performing method. The two approaches are:
- The DELETE script approach, which is easier to write but less efficient to execute.
- The CTAS-Drop-Rename script approach, which involves a few more steps to write but performs better for large data volumes.
The steps, scripts used and their execution is given below.
The Direct DELETE Script Approach
Steps:
1. Using an SQL script generator, generate script delete_statements.sql to delete rows where CREATED_DATETIME does not fall in the current year. 
2. Run the auto-generated script delete_statements.sql of step 1, to delete rows older than the current year.
2. Run the auto-generated script delete_statements.sql of step 1, to delete rows older than the current year.
Script Generators:
Here is the SQL that generates the script delete_statements.sql as output:
1
2
3
4
5
6
7
8
9
10
 
-- This SQL creates an SQL script to 
-- delete old rows from all tables
-- in the schema.
select  'delete from '
     ||  object_name
     ||' where created_datetime <='
     ||' TRUNC(SYSDATE,'||''''||'YEAR'||''''||');'
     delete_old_rows
from user_objects
where object_type = 'TABLE';
 
| 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 | -- This SQL creates an SQL script to -- delete old rows from all tables-- in the schema.select'delete from '     ||  object_name     ||' where created_datetime <='     ||' TRUNC(SYSDATE,'||''''||'YEAR'||''''||');'     delete_old_rowsfromuser_objectswhereobject_type = 'TABLE'; | 
Execution:
The script in action:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 
SQL> -- This SQL creates an SQL script to
SQL> -- delete old rows from all tables
SQL> -- in the schema.
SQL> select  'delete from '
  2       ||  object_name
  3       ||' where created_datetime <='
  4       ||'TRUNC(SYSDATE,'||''''||'YEAR'||''''||');'
  5       delete_old_rows
  6  from user_objects
  7  where object_type = 'TABLE';
 
DELETE_OLD_ROWS
--------------------------------------------------------------------------
 
delete from REGIONS where created_datetime <= TRUNC(SYSDATE,'YEAR');
delete from COUNTRIES where created_datetime <= TRUNC(SYSDATE,'YEAR');
delete from LOCATIONS where created_datetime <= TRUNC(SYSDATE,'YEAR');
delete from DEPARTMENTS where created_datetime <= TRUNC(SYSDATE,'YEAR');
delete from JOBS where created_datetime <= TRUNC(SYSDATE,'YEAR');
delete from EMPLOYEES where created_datetime <= TRUNC(SYSDATE,'YEAR');
delete from JOB_HISTORY where created_datetime <= TRUNC(SYSDATE,'YEAR');
 
Run delete_old_rows.sql i.e. the output received from the SQL above. Verify that the result is fine after deletion, then commit.
This approach is perhaps the most intuitive way to delete old data from the entire schema, but massive DELETEs would generate large redo and undo data. If that is a concern, approach 2 would work better.
| 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 | SQL> -- This SQL creates an SQL script toSQL> -- delete old rows from all tablesSQL> -- in the schema.SQL> select'delete from '  2       ||  object_name  3       ||' where created_datetime <='  4       ||'TRUNC(SYSDATE,'||''''||'YEAR'||''''||');'  5       delete_old_rows  6  fromuser_objects  7  whereobject_type = 'TABLE';DELETE_OLD_ROWS--------------------------------------------------------------------------deletefromREGIONS wherecreated_datetime <= TRUNC(SYSDATE,'YEAR');deletefromCOUNTRIES wherecreated_datetime <= TRUNC(SYSDATE,'YEAR');deletefromLOCATIONS wherecreated_datetime <= TRUNC(SYSDATE,'YEAR');deletefromDEPARTMENTS wherecreated_datetime <= TRUNC(SYSDATE,'YEAR');deletefromJOBS wherecreated_datetime <= TRUNC(SYSDATE,'YEAR');deletefromEMPLOYEES wherecreated_datetime <= TRUNC(SYSDATE,'YEAR');deletefromJOB_HISTORY wherecreated_datetime <= TRUNC(SYSDATE,'YEAR'); | 
The CTAS-Drop-Rename Script Approach
Steps:
1. Using an SQL script generator, generate script create_slim_tables.sql to create tables via CTAS including only the rows in which CREATED_DATETIME falls in the current year. In the names of all the newly created tables, append a common suffix that does not clash with any of the old table names, say _TMP. 
2. Using an SQL script generator, generate script drop_legacy_tables.sql to drop tables that do not have the suffix _TMP. 
3. Using an SQL script generator, generate script rename_slim_tables.sql to rename _TMP tables as the original table. Recreate the indexes, constraints and grants on the tables.
2. Using an SQL script generator, generate script drop_legacy_tables.sql to drop tables that do not have the suffix _TMP.
3. Using an SQL script generator, generate script rename_slim_tables.sql to rename _TMP tables as the original table. Recreate the indexes, constraints and grants on the tables.
Script generators:
The script generator for create_slim_tables.sql:
1
2
3
4
5
6
7
8
9
10
11
12
13
 
-- This SQL generates an SQL script to 
-- create _TMP tables with only the
-- latest rows from corresponding tables
-- in the schema.
select  'create table '
     ||  object_name || '_TMP'
     ||' as select * from '
     ||  object_name
     ||' where created_datetime >='
     ||' TRUNC(SYSDATE,'||''''||'YEAR'||''''||');'
     create_slim_tables
from user_objects
where object_type = 'TABLE';
 
The script generator for drop_legacy_tables.sql:
1
2
3
4
5
6
7
8
9
10
11
 
-- This SQL generates an SQL script to 
-- drop the legacy tables from the schema.
-- It excludes the newly created 
-- tables ie those that end in _TMP.
select  'drop table '
     ||  object_name 
     || ';'
     drop_legacy_tables
from user_objects
where object_type = 'TABLE'
and object_name not like '%_TMP';
 
The script generator for rename_slim_tables.sql :
1
2
3
4
5
6
7
8
9
10
11
12
 
-- This SQL generates an SQL script 
-- to remove the _TMP suffix from the 
-- new  tables in the schema.
select  'rename '
     ||  object_name 
     ||' to '
     || substr(object_name,0,length(object_name)-4)
     || ';'
     rename_slim_tables
from user_objects
where object_type = 'TABLE'
and object_name like '%_TMP';
 
| 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 | -- This SQL generates an SQL script to -- create _TMP tables with only the-- latest rows from corresponding tables-- in the schema.select'create table '     ||  object_name || '_TMP'     ||' as select * from '     ||  object_name     ||' where created_datetime >='     ||' TRUNC(SYSDATE,'||''''||'YEAR'||''''||');'     create_slim_tablesfromuser_objectswhereobject_type = 'TABLE'; | 
| 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 | -- This SQL generates an SQL script to -- drop the legacy tables from the schema.-- It excludes the newly created -- tables ie those that end in _TMP.select'drop table '     ||  object_name      || ';'     drop_legacy_tablesfromuser_objectswhereobject_type = 'TABLE'andobject_name notlike'%_TMP'; | 
| 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 | -- This SQL generates an SQL script -- to remove the _TMP suffix from the -- new  tables in the schema.select'rename '     ||  object_name      ||' to '     || substr(object_name,0,length(object_name)-4)     || ';'     rename_slim_tablesfromuser_objectswhereobject_type = 'TABLE'andobject_name like'%_TMP'; | 
Execution:
The scripts in action – the one to create _TMP tables first. Spool or export the output of the SQL below into file create_slim_tables.sql.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
 
SQL> -- This SQL generates an SQL script to
SQL> -- create _TMP tables with only the
SQL> -- latest rows from corresponding tables
SQL> -- in the schema.
SQL> select  'create table '
  2       ||  object_name || '_TMP'
  3       ||' as select * from '
  4       ||  object_name
  5       ||' where created_datetime >='
  6       ||' TRUNC(SYSDATE,'||''''||'YEAR'||''''||');'
  7       create_slim_tables
  8  from user_objects
  9  where object_type = 'TABLE';
 
CREATE_SLIM_TABLES
--------------------------------------------------------------------------------
 
create table REGIONS_TMP as select * from REGIONS where created_datetime >= TRUNC(SYSDATE,'YEAR');
create table COUNTRIES_TMP as select * from COUNTRIES where created_datetime >= TRUNC(SYSDATE,'YEAR');
create table LOCATIONS_TMP as select * from LOCATIONS where created_datetime >= TRUNC(SYSDATE,'YEAR');
create table DEPARTMENTS_TMP as select * from DEPARTMENTS where created_datetime >= TRUNC(SYSDATE,'YEAR');
create table JOBS_TMP as select * from JOBS where created_datetime >= TRUNC(SYSDATE,'YEAR');
create table EMPLOYEES_TMP as select * from EMPLOYEES where created_datetime >= TRUNC(SYSDATE,'YEAR');
create table JOB_HISTORY_TMP as select * from JOB_HISTORY where created_datetime >= TRUNC(SYSDATE,'YEAR');
 
Run create_slim_tables.sql i.e. the output received from the SQL above. After this step, for each table [X] in the schema, a new table [X]_TMP will get created with the latest records from table [X]. Next, we get rid of the old tables from the legacy database schema i.e. those tables that do not have the suffix _TMP. Spool or export the output of the SQL below into drop_legacy_tables.sql.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
 
SQL> -- This SQL generates an SQL script to
SQL> -- drop the legacy tables from the schema.
SQL> -- It excludes the newly created
SQL> -- tables ie those that end in _TMP.
SQL> select  'drop table '
  2       ||  object_name
  3       || ';'
  4       drop_legacy_tables
  5  from user_objects
  6  where object_type = 'TABLE'
  7  and object_name not like '%_TMP';
 
DROP_LEGACY_TABLES
-----------------------------------------------
 
drop table REGIONS;
drop table COUNTRIES;
drop table LOCATIONS;
drop table DEPARTMENTS;
drop table JOBS;
drop table EMPLOYEES;
drop table JOB_HISTORY;
 
Run drop_legacy_tables.sql i.e. the output received from the SQL above. After this step, all the old tables will be dropped from the schema. Make sure that you have got a copy of the index/constraint/grant creation scripts on the original tables before dropping them. Next, we reset the names of all the _TMP tables to replace their originals. Spool or export the output of the SQL below into rename_slim_tables.sql.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
 
SQL> -- This SQL generates an SQL script
SQL> -- to remove the _TMP suffix from the
SQL> -- new  tables in the schema.
SQL> select  'rename '
  2       ||  object_name
  3       ||' to '
  4       || substr(object_name,0,length(object_name) - 4)
  5       || ';'
  6       rename_slim_tables
  7  from user_objects
  8  where object_type = 'TABLE'
  9  and object_name like '%_TMP';
 
RENAME_SLIM_TABLES
---------------------------------------
 
rename REGIONS_TMP to REGIONS;
rename COUNTRIES_TMP to COUNTRIES;
rename LOCATIONS_TMP to LOCATIONS;
rename DEPARTMENTS_TMP to DEPARTMENTS;
rename JOBS_TMP to JOBS;
rename EMPLOYEES_TMP to EMPLOYEES;
rename JOB_HISTORY_TMP to JOB_HISTORY;
 
Run rename_slim_tables.sql i.e. the output of the SQL above. This will rename every table of the form [X]_TMP to [X]. Finally, run the script to recreate the indexes, constraints and grants.
At this stage we have got what we wanted – we have now replaced the legacy tables with their slimmer versions.
| 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 | SQL> -- This SQL generates an SQL script toSQL> -- create _TMP tables with only theSQL> -- latest rows from corresponding tablesSQL> -- in the schema.SQL> select'create table '  2       ||  object_name || '_TMP'  3       ||' as select * from '  4       ||  object_name  5       ||' where created_datetime >='  6       ||' TRUNC(SYSDATE,'||''''||'YEAR'||''''||');'  7       create_slim_tables  8  fromuser_objects  9  whereobject_type = 'TABLE';CREATE_SLIM_TABLES--------------------------------------------------------------------------------createtableREGIONS_TMP asselect* fromREGIONS wherecreated_datetime >= TRUNC(SYSDATE,'YEAR');createtableCOUNTRIES_TMP asselect* fromCOUNTRIES wherecreated_datetime >= TRUNC(SYSDATE,'YEAR');createtableLOCATIONS_TMP asselect* fromLOCATIONS wherecreated_datetime >= TRUNC(SYSDATE,'YEAR');createtableDEPARTMENTS_TMP asselect* fromDEPARTMENTS wherecreated_datetime >= TRUNC(SYSDATE,'YEAR');createtableJOBS_TMP asselect* fromJOBS wherecreated_datetime >= TRUNC(SYSDATE,'YEAR');createtableEMPLOYEES_TMP asselect* fromEMPLOYEES wherecreated_datetime >= TRUNC(SYSDATE,'YEAR');createtableJOB_HISTORY_TMP asselect* fromJOB_HISTORY wherecreated_datetime >= TRUNC(SYSDATE,'YEAR'); | 
| 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 | SQL> -- This SQL generates an SQL script toSQL> -- drop the legacy tables from the schema.SQL> -- It excludes the newly createdSQL> -- tables ie those that end in _TMP.SQL> select'drop table '  2       ||  object_name  3       || ';'  4       drop_legacy_tables  5  fromuser_objects  6  whereobject_type = 'TABLE'  7  andobject_name notlike'%_TMP';DROP_LEGACY_TABLES-----------------------------------------------droptableREGIONS;droptableCOUNTRIES;droptableLOCATIONS;droptableDEPARTMENTS;droptableJOBS;droptableEMPLOYEES;droptableJOB_HISTORY; | 
| 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 | SQL> -- This SQL generates an SQL scriptSQL> -- to remove the _TMP suffix from theSQL> -- new  tables in the schema.SQL> select'rename '  2       ||  object_name  3       ||' to '  4       || substr(object_name,0,length(object_name) - 4)  5       || ';'  6       rename_slim_tables  7  fromuser_objects  8  whereobject_type = 'TABLE'  9  andobject_name like'%_TMP';RENAME_SLIM_TABLES---------------------------------------rename REGIONS_TMP toREGIONS;rename COUNTRIES_TMP toCOUNTRIES;rename LOCATIONS_TMP toLOCATIONS;rename DEPARTMENTS_TMP toDEPARTMENTS;rename JOBS_TMP toJOBS;rename EMPLOYEES_TMP toEMPLOYEES;rename JOB_HISTORY_TMP toJOB_HISTORY; | 
 
 
No comments:
Post a Comment