We may want to slim down a database schema, pruning old data from all tables, for a number of reasons. Let’s say to set up the test environment for a data migration project, a replica of the legacy database schema has been created. The legacy database schema has millions of records spanning multiple years, but for testing migration scripts in the replica schema, just the most recent records – those of the current year – will do.
What is the simplest way to set up the schema so that it contains only the current year’s rows, and the rest of the rows are deleted? Consider that each table in the schema has a standard audit column CREATED_DATETIME to store when the row was first inserted into the table.
We tried out two approaches to delete old data from the schema: one a direct method, the other a slightly roundabout but better performing method. The two approaches are:
- The DELETE script approach, which is easier to write but less efficient to execute.
- The CTAS-Drop-Rename script approach, which involves a few more steps to write but performs better for large data volumes.
The steps, scripts used and their execution is given below.
The Direct DELETE Script Approach
Steps:
1. Using an SQL script generator, generate script delete_statements.sql to delete rows where CREATED_DATETIME does not fall in the current year.
2. Run the auto-generated script delete_statements.sql of step 1, to delete rows older than the current year.
2. Run the auto-generated script delete_statements.sql of step 1, to delete rows older than the current year.
Script Generators:
Here is the SQL that generates the script delete_statements.sql as output:
1
2
3
4
5
6
7
8
9
10
-- This SQL creates an SQL script to
-- delete old rows from all tables
-- in the schema.
select
'delete from '
|| object_name
||
' where created_datetime <='
||
' TRUNC(SYSDATE,'
||
''
''
||
'YEAR'
||
''
''
||
');'
delete_old_rows
from
user_objects
where
object_type =
'TABLE'
;
1
2
3
4
5
6
7
8
9
10
| -- This SQL creates an SQL script to -- delete old rows from all tables -- in the schema. select 'delete from ' || object_name || ' where created_datetime <=' || ' TRUNC(SYSDATE,' || '' '' || 'YEAR' || '' '' || ');' delete_old_rows from user_objects where object_type = 'TABLE' ; |
Execution:
The script in action:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SQL>
-- This SQL creates an SQL script to
SQL>
-- delete old rows from all tables
SQL>
-- in the schema.
SQL>
select
'delete from '
2 || object_name
3 ||
' where created_datetime <='
4 ||
'TRUNC(SYSDATE,'
||
''
''
||
'YEAR'
||
''
''
||
');'
5 delete_old_rows
6
from
user_objects
7
where
object_type =
'TABLE'
;
DELETE_OLD_ROWS
--------------------------------------------------------------------------
delete
from
REGIONS
where
created_datetime <= TRUNC(SYSDATE,
'YEAR'
);
delete
from
COUNTRIES
where
created_datetime <= TRUNC(SYSDATE,
'YEAR'
);
delete
from
LOCATIONS
where
created_datetime <= TRUNC(SYSDATE,
'YEAR'
);
delete
from
DEPARTMENTS
where
created_datetime <= TRUNC(SYSDATE,
'YEAR'
);
delete
from
JOBS
where
created_datetime <= TRUNC(SYSDATE,
'YEAR'
);
delete
from
EMPLOYEES
where
created_datetime <= TRUNC(SYSDATE,
'YEAR'
);
delete
from
JOB_HISTORY
where
created_datetime <= TRUNC(SYSDATE,
'YEAR'
);
Run delete_old_rows.sql i.e. the output received from the SQL above. Verify that the result is fine after deletion, then commit.
This approach is perhaps the most intuitive way to delete old data from the entire schema, but massive DELETEs would generate large redo and undo data. If that is a concern, approach 2 would work better.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| SQL> -- This SQL creates an SQL script to SQL> -- delete old rows from all tables SQL> -- in the schema. SQL> select 'delete from ' 2 || object_name 3 || ' where created_datetime <=' 4 || 'TRUNC(SYSDATE,' || '' '' || 'YEAR' || '' '' || ');' 5 delete_old_rows 6 from user_objects 7 where object_type = 'TABLE' ; DELETE_OLD_ROWS -------------------------------------------------------------------------- delete from REGIONS where created_datetime <= TRUNC(SYSDATE, 'YEAR' ); delete from COUNTRIES where created_datetime <= TRUNC(SYSDATE, 'YEAR' ); delete from LOCATIONS where created_datetime <= TRUNC(SYSDATE, 'YEAR' ); delete from DEPARTMENTS where created_datetime <= TRUNC(SYSDATE, 'YEAR' ); delete from JOBS where created_datetime <= TRUNC(SYSDATE, 'YEAR' ); delete from EMPLOYEES where created_datetime <= TRUNC(SYSDATE, 'YEAR' ); delete from JOB_HISTORY where created_datetime <= TRUNC(SYSDATE, 'YEAR' ); |
The CTAS-Drop-Rename Script Approach
Steps:
1. Using an SQL script generator, generate script create_slim_tables.sql to create tables via CTAS including only the rows in which CREATED_DATETIME falls in the current year. In the names of all the newly created tables, append a common suffix that does not clash with any of the old table names, say _TMP.
2. Using an SQL script generator, generate script drop_legacy_tables.sql to drop tables that do not have the suffix _TMP.
3. Using an SQL script generator, generate script rename_slim_tables.sql to rename _TMP tables as the original table. Recreate the indexes, constraints and grants on the tables.
2. Using an SQL script generator, generate script drop_legacy_tables.sql to drop tables that do not have the suffix _TMP.
3. Using an SQL script generator, generate script rename_slim_tables.sql to rename _TMP tables as the original table. Recreate the indexes, constraints and grants on the tables.
Script generators:
The script generator for create_slim_tables.sql:
1
2
3
4
5
6
7
8
9
10
11
12
13
-- This SQL generates an SQL script to
-- create _TMP tables with only the
-- latest rows from corresponding tables
-- in the schema.
select
'create table '
|| object_name ||
'_TMP'
||
' as select * from '
|| object_name
||
' where created_datetime >='
||
' TRUNC(SYSDATE,'
||
''
''
||
'YEAR'
||
''
''
||
');'
create_slim_tables
from
user_objects
where
object_type =
'TABLE'
;
The script generator for drop_legacy_tables.sql:
1
2
3
4
5
6
7
8
9
10
11
-- This SQL generates an SQL script to
-- drop the legacy tables from the schema.
-- It excludes the newly created
-- tables ie those that end in _TMP.
select
'drop table '
|| object_name
||
';'
drop_legacy_tables
from
user_objects
where
object_type =
'TABLE'
and
object_name
not
like
'%_TMP'
;
The script generator for rename_slim_tables.sql :
1
2
3
4
5
6
7
8
9
10
11
12
-- This SQL generates an SQL script
-- to remove the _TMP suffix from the
-- new tables in the schema.
select
'rename '
|| object_name
||
' to '
|| substr(object_name,0,length(object_name)-4)
||
';'
rename_slim_tables
from
user_objects
where
object_type =
'TABLE'
and
object_name
like
'%_TMP'
;
1
2
3
4
5
6
7
8
9
10
11
12
13
| -- This SQL generates an SQL script to -- create _TMP tables with only the -- latest rows from corresponding tables -- in the schema. select 'create table ' || object_name || '_TMP' || ' as select * from ' || object_name || ' where created_datetime >=' || ' TRUNC(SYSDATE,' || '' '' || 'YEAR' || '' '' || ');' create_slim_tables from user_objects where object_type = 'TABLE' ; |
1
2
3
4
5
6
7
8
9
10
11
| -- This SQL generates an SQL script to -- drop the legacy tables from the schema. -- It excludes the newly created -- tables ie those that end in _TMP. select 'drop table ' || object_name || ';' drop_legacy_tables from user_objects where object_type = 'TABLE' and object_name not like '%_TMP' ; |
1
2
3
4
5
6
7
8
9
10
11
12
| -- This SQL generates an SQL script -- to remove the _TMP suffix from the -- new tables in the schema. select 'rename ' || object_name || ' to ' || substr(object_name,0,length(object_name)-4) || ';' rename_slim_tables from user_objects where object_type = 'TABLE' and object_name like '%_TMP' ; |
Execution:
The scripts in action – the one to create _TMP tables first. Spool or export the output of the SQL below into file create_slim_tables.sql.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SQL>
-- This SQL generates an SQL script to
SQL>
-- create _TMP tables with only the
SQL>
-- latest rows from corresponding tables
SQL>
-- in the schema.
SQL>
select
'create table '
2 || object_name ||
'_TMP'
3 ||
' as select * from '
4 || object_name
5 ||
' where created_datetime >='
6 ||
' TRUNC(SYSDATE,'
||
''
''
||
'YEAR'
||
''
''
||
');'
7 create_slim_tables
8
from
user_objects
9
where
object_type =
'TABLE'
;
CREATE_SLIM_TABLES
--------------------------------------------------------------------------------
create
table
REGIONS_TMP
as
select
*
from
REGIONS
where
created_datetime >= TRUNC(SYSDATE,
'YEAR'
);
create
table
COUNTRIES_TMP
as
select
*
from
COUNTRIES
where
created_datetime >= TRUNC(SYSDATE,
'YEAR'
);
create
table
LOCATIONS_TMP
as
select
*
from
LOCATIONS
where
created_datetime >= TRUNC(SYSDATE,
'YEAR'
);
create
table
DEPARTMENTS_TMP
as
select
*
from
DEPARTMENTS
where
created_datetime >= TRUNC(SYSDATE,
'YEAR'
);
create
table
JOBS_TMP
as
select
*
from
JOBS
where
created_datetime >= TRUNC(SYSDATE,
'YEAR'
);
create
table
EMPLOYEES_TMP
as
select
*
from
EMPLOYEES
where
created_datetime >= TRUNC(SYSDATE,
'YEAR'
);
create
table
JOB_HISTORY_TMP
as
select
*
from
JOB_HISTORY
where
created_datetime >= TRUNC(SYSDATE,
'YEAR'
);
Run create_slim_tables.sql i.e. the output received from the SQL above. After this step, for each table [X] in the schema, a new table [X]_TMP will get created with the latest records from table [X]. Next, we get rid of the old tables from the legacy database schema i.e. those tables that do not have the suffix _TMP. Spool or export the output of the SQL below into drop_legacy_tables.sql.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SQL>
-- This SQL generates an SQL script to
SQL>
-- drop the legacy tables from the schema.
SQL>
-- It excludes the newly created
SQL>
-- tables ie those that end in _TMP.
SQL>
select
'drop table '
2 || object_name
3 ||
';'
4 drop_legacy_tables
5
from
user_objects
6
where
object_type =
'TABLE'
7
and
object_name
not
like
'%_TMP'
;
DROP_LEGACY_TABLES
-----------------------------------------------
drop
table
REGIONS;
drop
table
COUNTRIES;
drop
table
LOCATIONS;
drop
table
DEPARTMENTS;
drop
table
JOBS;
drop
table
EMPLOYEES;
drop
table
JOB_HISTORY;
Run drop_legacy_tables.sql i.e. the output received from the SQL above. After this step, all the old tables will be dropped from the schema. Make sure that you have got a copy of the index/constraint/grant creation scripts on the original tables before dropping them. Next, we reset the names of all the _TMP tables to replace their originals. Spool or export the output of the SQL below into rename_slim_tables.sql.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SQL>
-- This SQL generates an SQL script
SQL>
-- to remove the _TMP suffix from the
SQL>
-- new tables in the schema.
SQL>
select
'rename '
2 || object_name
3 ||
' to '
4 || substr(object_name,0,length(object_name) - 4)
5 ||
';'
6 rename_slim_tables
7
from
user_objects
8
where
object_type =
'TABLE'
9
and
object_name
like
'%_TMP'
;
RENAME_SLIM_TABLES
---------------------------------------
rename REGIONS_TMP
to
REGIONS;
rename COUNTRIES_TMP
to
COUNTRIES;
rename LOCATIONS_TMP
to
LOCATIONS;
rename DEPARTMENTS_TMP
to
DEPARTMENTS;
rename JOBS_TMP
to
JOBS;
rename EMPLOYEES_TMP
to
EMPLOYEES;
rename JOB_HISTORY_TMP
to
JOB_HISTORY;
Run rename_slim_tables.sql i.e. the output of the SQL above. This will rename every table of the form [X]_TMP to [X]. Finally, run the script to recreate the indexes, constraints and grants.
At this stage we have got what we wanted – we have now replaced the legacy tables with their slimmer versions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| SQL> -- This SQL generates an SQL script to SQL> -- create _TMP tables with only the SQL> -- latest rows from corresponding tables SQL> -- in the schema. SQL> select 'create table ' 2 || object_name || '_TMP' 3 || ' as select * from ' 4 || object_name 5 || ' where created_datetime >=' 6 || ' TRUNC(SYSDATE,' || '' '' || 'YEAR' || '' '' || ');' 7 create_slim_tables 8 from user_objects 9 where object_type = 'TABLE' ; CREATE_SLIM_TABLES -------------------------------------------------------------------------------- create table REGIONS_TMP as select * from REGIONS where created_datetime >= TRUNC(SYSDATE, 'YEAR' ); create table COUNTRIES_TMP as select * from COUNTRIES where created_datetime >= TRUNC(SYSDATE, 'YEAR' ); create table LOCATIONS_TMP as select * from LOCATIONS where created_datetime >= TRUNC(SYSDATE, 'YEAR' ); create table DEPARTMENTS_TMP as select * from DEPARTMENTS where created_datetime >= TRUNC(SYSDATE, 'YEAR' ); create table JOBS_TMP as select * from JOBS where created_datetime >= TRUNC(SYSDATE, 'YEAR' ); create table EMPLOYEES_TMP as select * from EMPLOYEES where created_datetime >= TRUNC(SYSDATE, 'YEAR' ); create table JOB_HISTORY_TMP as select * from JOB_HISTORY where created_datetime >= TRUNC(SYSDATE, 'YEAR' ); |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| SQL> -- This SQL generates an SQL script to SQL> -- drop the legacy tables from the schema. SQL> -- It excludes the newly created SQL> -- tables ie those that end in _TMP. SQL> select 'drop table ' 2 || object_name 3 || ';' 4 drop_legacy_tables 5 from user_objects 6 where object_type = 'TABLE' 7 and object_name not like '%_TMP' ; DROP_LEGACY_TABLES ----------------------------------------------- drop table REGIONS; drop table COUNTRIES; drop table LOCATIONS; drop table DEPARTMENTS; drop table JOBS; drop table EMPLOYEES; drop table JOB_HISTORY; |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| SQL> -- This SQL generates an SQL script SQL> -- to remove the _TMP suffix from the SQL> -- new tables in the schema. SQL> select 'rename ' 2 || object_name 3 || ' to ' 4 || substr(object_name,0,length(object_name) - 4) 5 || ';' 6 rename_slim_tables 7 from user_objects 8 where object_type = 'TABLE' 9 and object_name like '%_TMP' ; RENAME_SLIM_TABLES --------------------------------------- rename REGIONS_TMP to REGIONS; rename COUNTRIES_TMP to COUNTRIES; rename LOCATIONS_TMP to LOCATIONS; rename DEPARTMENTS_TMP to DEPARTMENTS; rename JOBS_TMP to JOBS; rename EMPLOYEES_TMP to EMPLOYEES; rename JOB_HISTORY_TMP to JOB_HISTORY; |
No comments:
Post a Comment