Reverse Engineering a Data Model

advertisement
Reverse Engineering a Data Model
Using the Oracle Data Dictionary
The Problem
Let's say you have to work with an Oracle database, using a data model that
somebody else wrote. Maybe you're extending the data model or building an
application that references it. There's only one problem: whomever created the
original data model left without writing a line of documentation.
What do you do? How to you reverse engineer the data model to unearth the table
definitions, constraints, indexes, views, sequences, triggers, and PL/SQL functions
and procedures?
This ends up being an easy task if you use the Oracle data dictionary.
The Oracle Data Dictionary
Just like you use Oracle tables to store your data, Oracle uses tables to store its data.
A set of tables, called the Oracle data dictionary, contains information about all the
structures (tables, views, etc.) and procedural code (triggers, PL/SQL procedures,
etc.) created by each user.
For example, there's a table called USER_TAB_COLUMNS that contains
information about all the columns you've defined, including: what table the column
belongs to, the data type (number, varchar, etc.), what the default value is, whether
the column can be null, etc.
The Oracle data dictionary is huge and contains a lot of esoteric stuff, but when you
whittle it down to only the info you need, it's not so menacing. Here are the data
dictionary tables I find useful. You can do SELECTs on them, just as you would any
other table in Oracle:
USER_TABLES
Lists each table that belongs to your Oracle user.
USER_TAB_COMMENTS Shows comments on the tables and views.
USER_TAB_COLUMNS
Tells you the names, data types, default values, etc.
of each column in each table.
USER_COL_COMMENTS Shows comments on the columns.
USER_CONSTRAINTS
Gives you all constraints (either single- or multicolumn), such as primary key, foreign key, not null,
check constraints, etc.
USER_CONS_COLUMNS Maps constraints to columns (since a constraint can
act on one or many columns).
USER_INDEXES
Lists indexes defined on columns (either defined
explicitly when creating the data model or defined
automatically by Oracle, as is the case with indexes
on primary keys).
USER_IND_COLUMNS
Maps indexes to columns.
USER_VIEWS
Lists all views, along with the text used to
originally create them.
USER_SYNONYMS
Lists the synonyms and original table names.
USER_SEQUENCES
Lists all sequences, including min value, max value,
and amount by which to increment.
USER_TRIGGERS
Contains trigger names, criteria for activating each
trigger, and the code that is run.
Contains the source code for all PL/SQL objects,
including functions, procedures, packages, and
package bodies.
All of the above tables (the USER_* tables) only contain objects defined by the
current Oracle user. Oracle also maintains a set of tables of identical structure that
start with ALL_*. These show you every object that you have access to, regardless
of whether you created that object (e.g., our beloved friend DUAL). Similarly,
Oracle provides DBA_* tables that contain info about all users' objects, but this
group of tables is only accessible to the database administrator.
USER_SOURCE
Reverse Engineering the Data Model
In the following sections, I'll show you the queries you need to do to find the following:










Table names
Table comments
Columns, including names, data types, default values
Column comments
Constraints
Indexes
Views
Sequences
Triggers
PL/SQL Objects (procedures, functions, packages, package bodies)
Note: the following queries have been tested with Oracle 8i and Oracle 9i. I have not
tried them out on other versions.
The queries:
1. Table Names
Find out what tables have been defined in your system:
select TABLE_NAME
from USER_TABLES
TABLE_NAME is really the only important info we can get from Oracle's data
dictionary table USER_TABLES. When tables are created, most of the action
takes place in the definition of individual columns, which we'll look at later.
For example, if you have four tables defined in your system, your query will
return four rows:
TABLE_NAME
EMPLOYEES
OFFICES
SOFTBALL_TEAMS
EMPLOYEES_AUDIT
2. Table Comments
For each table, get any comments written by the data model author:
select COMMENTS
from USER_TAB_COMMENTS
where TABLE_NAME = 'TABLE_NAME'
and COMMENTS is not null
Note that the TABLE_NAME must be written in all uppercase letters.
Example: if we do this query for the EMPLOYEES table, we find the following
comment:
COMMENTS
This is a table to hold all current, past, and future employees. Application
developers might find the views EMPLOYEES_CURRENT,
EMPLOYEES_PAST and EMPLOYEES_FUTURE useful.
In my experience, very few developers document their tables within Oracle (if the
tables are documented, the documentation is generally done in some file
somewhere else). But if you want to be a conscientious developer and ensure that
your comments show up in the data dictionary for future programmers to find,
you can use the command:
comment on table TABLE_NAME
is 'This is my comment.'
3. Columns
If you only want basic info about each column (name, type, and whether it's
nullable), the easiest way to get it is to DESCRIBE the table (or DESC, for short).
Let's see what columns the EMPLOYEES table contains:
SQL> desc employees;
Name
------------------------------------------------------------EMPLOYEE_ID
NUMBER(38)
LAST_NAME
VARCHAR2(200)
FIRST_NAME
VARCHAR2(200)
EMAIL
VARCHAR2(100)
PRIMARY_OFFICE_ID
NUMBER(38)
START_DATE
END_DATE
SALARY
NUMBER(9,2)
YEARS_EXPERIENCE
MANAGEMENT_TRACK_P
SHORT_BIO
VARCHAR2(4000)
LIFE_STORY
PHOTO
Null?
Type
-------- ------NOT NULL
NOT NULL
NOT NULL
NOT NULL
NOT NULL DATE
DATE
NUMBER
CHAR(1)
CLOB
BLOB
But if you want more detailed -- and parseable -- information about your tables,
you will have to query from the data dictionary. Here's how we get the column
info (note: this does not include the comments, constraints, and indexes, which
are stored elsewhere in the data dictionary):
select COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION,
DATA_SCALE, NULLABLE, DATA_DEFAULT
from USER_TAB_COLUMNS
where TABLE_NAME = 'TABLE_NAME'
For example, if we do the above query for the EMPLOYEES table, we get back
the following results:
COLUMN_NAME
DATA_TY
PE
DAT
DATA_ DATA_
DATA_
A_ NULLAB
LENG PRECISI
DEFAU
SCAL
LE
TH
ON
LT
E
EMPLOYEE_ID
NUMBER 22
0
LAST_NAME
VARCHA
200
R2
N
FIRST_NAME
VARCHA
200
R2
N
EMAIL
VARCHA
100
R2
Y
PRIMARY_OFFICE_I
NUMBER 22
D
0
N
N
START_DATE
DATE
7
N
END_DATE
DATE
7
Y
SALARY
NUMBER 22
9
2
Y
YEARS_EXPERIENCE NUMBER 22
Y
MANAGEMENT_TRA
CHAR
CK_P
Y
1
sysdate
SHORT_BIO
VARCHA
4000
R2
Y
LIFE_STORY
CLOB
4000
Y
PHOTO
BLOB
4000
Y
'f'
Useful facts for deciphering the above:
o
o
o
The internal data type VARCHAR2 (the data type Oracle uses)
corresponds to the external data type VARCHAR (the data type you use in
table declarations). A column with DATA_TYPE = VARCHAR2 and
DATA_LENGTH = 200 would be defined as VARCHAR(200).
The internal data type NUMBER corresponds to the external data type:
 INTEGER if the scale is 0 and precision is null
 NUMBER if the scale and precision are both null
 NUMBER(9,2) if the precision is 9 and the scale is 2
 NUMBER(3) if the precision is 3 and scale is 0
DATA_LENGTH is irrelevant for NUMBERs, DATEs, CLOBs and
BLOBs
Based on this, we can derive the following table definition:
create table eve_employees (
employee_id
integer not null,
last_name
varchar(200) not null,
first_name
varchar(200) not null,
email
varchar(100),
primary_office_id integer not null,
start_date
date default sysdate not null,
end_date
salary
years_experience
management_track_p
short_bio
life_story
photo
date,
number(9,2),
number,
char(1) default 'f',
varchar(4000),
clob,
blob
);
Note that we still haven't looked up any constraints, indexes, or column
comments.
4. Column Comments
select COLUMN_NAME, COMMENTS
from USER_COL_COMMENTS
where TABLE_NAME = 'TABLE_NAME'
The EMPLOYEES table has two columns with comments:
COLUMN_NAME
COMMENTS
PRIMARY_OFFICE_ID
The office that the employee spends most of their
time in.
MANAGEMENT_TRACK_P
Has the employee expressed a desire and aptitude
for management training?
Note that if you want to put comments into the data dictionary for future
programmers to find, you can use the following syntax:
comment on column TABLE_NAME.COLUMN_NAME
is 'This is my comment.'
5. Constraints
select UCC.CONSTRAINT_NAME, UCC.COLUMN_NAME,
UC.CONSTRAINT_TYPE, UC.SEARCH_CONDITION, UC2.TABLE_NAME as
REFERENCES_TABLE
from USER_CONS_COLUMNS UCC, USER_CONSTRAINTS UC,
USER_CONSTRAINTS UC2
where UCC.CONSTRAINT_NAME = UC.CONSTRAINT_NAME
and UC.R_CONSTRAINT_NAME = UC2.CONSTRAINT_NAME(+)
and UCC.TABLE_NAME = 'TABLE_NAME'
order by UCC.CONSTRAINT_NAME
For the EMPLOYEES table, we get:
CONSTRAINT_
NAME
COLUMN_NAME
SYS_C005701
LAST_NAME
5
CONSTRA
REFEREN
SEARCH_CONDITI
INT_
CES_
ON
TYPE
TABLE
C
"LAST_NAME"
IS NOT NULL
SYS_C005701
FIRST_NAME
6
C
"FIRST_NAME"
IS NOT NULL
SYS_C005701 PRIMARY_OFFICE
C
7
_ID
"PRIMARY_OFFI
CE_ID" IS NOT
NULL
SYS_C005701
START_DATE
8
"START_DATE"
IS NOT NULL
C
SYS_C005701 MANAGEMENT_T
C
9
RACK_P
management_track
_p in ('t','f')
SYS_C005702
SHORT_BIO
0
C
short_bio is not
null or life_story is
not null
SYS_C005702
LIFE_STORY
0
C
short_bio is not
null or life_story is
not null
SYS_C005702
EMPLOYEE_ID
1
P
SYS_C005702
EMAIL
2
U
SYS_C005702 PRIMARY_OFFICE
R
3
_ID
OFFICES
There are four types of constraint:
o
o
o
o
P: primary key
U: unique
R: references
C: check
Note that the constraint SYS_C0057020 appears twice above; this is because it is
a multi-column constraint. Note also that the "not null" constraints appear here
even though they also appear in USER_TAB_COLUMNS (a little redundancy).
Based on the information we have so far, we can document the table as follows:
-- This is a table to hold all current, past, and future
employees. Application
-- developers might find the views EMPLOYEES_CURRENT,
EMPLOYEES_PAST and
-- EMPLOYEES_FUTURE useful.
create table employees (
employee_id
integer primary key,
last_name
varchar(200) not null,
first_name
varchar(200) not null,
email
varchar(100) unique,
-- The office that the employee spends most of their time
in.
primary_office_id not null references offices,
start_date
date default sysdate not null,
end_date
date,
salary
number(9,2),
years_experience
number,
-- Has the employee expressed a desire and aptitude for
management training?
management_track_p char(1) default 'f'
check(management_track_p in ('t','f')),
short_bio
varchar(4000),
life_story
clob,
photo
blob,
check(short_bio is not null or life_story is not null)
);
6. Indexes
SELECT INDEX_NAME, COLUMN_NAME
FROM USER_IND_COLUMNS
WHERE TABLE_NAME='TABLE_NAME'
ORDER BY INDEX_NAME
The indexes on EMPLOYEES:
INDEX_NAME
COLUMN_NAME
EMPLOYEE_DATES_IDX START_DATE
EMPLOYEE_DATES_IDX END_DATE
EMPLOYEE_YE_IDX
YEARS_EXPERIENCE
SYS_C0057021
EMPLOYEE_ID
SYS_C0057022
EMAIL
EMPLOYEE_DATES_IDX appears twice because it is a multi-column index.
Oracle automatically created the index on EMPLOYEE_ID because it is a
primary key. Oracle automatically created the index on EMAIL because that
column has a unique constraint.
From this, we can see that the original index definitions were:
create index employee_dates_idx on employees(start_date,
end_date);
create index employee_ye_idx on
employees(years_experience);
7. Views
select UV.VIEW_NAME, UV.TEXT, UTC.COMMENTS
from USER_VIEWS UV, USER_TAB_COMMENTS UTC
where UV.VIEW_NAME = UTC.TABLE_NAME(+)
In our example data model, we have the following views defined:
VIEW_NAME
EMPLOYEES_C
URRENT
TEXT
COMME
NTS
All
employe
select
es
"EMPLOYEE_ID","LAST_NAME","FIRST_NA who've
ME",
already
"EMAIL","PRIMARY_OFFICE_ID","START_D started
ATE","END_DATE",
working
"SALARY","YEARS_EXPERIENCE","MANAG here and
EMENT_TRACK_P",
who
"SHORT_BIO","LIFE_STORY","PHOTO"
have not
from employees
yet
where start_date <= sysdate
ended
and end_date >= sysdate
their
employ
ment.
select
"EMPLOYEE_ID","LAST_NAME","FIRST_NA
ME",
"EMAIL","PRIMARY_OFFICE_ID","START_D
EMPLOYEES_FU ATE","END_DATE",
TURE
"SALARY","YEARS_EXPERIENCE","MANAG
EMENT_TRACK_P",
"SHORT_BIO","LIFE_STORY","PHOTO"
from employees
where start_date > sysdate
select
"EMPLOYEE_ID","LAST_NAME","FIRST_NA
ME",
"EMAIL","PRIMARY_OFFICE_ID","START_D
EMPLOYEES_PA ATE","END_DATE",
ST
"SALARY","YEARS_EXPERIENCE","MANAG
EMENT_TRACK_P",
"SHORT_BIO","LIFE_STORY","PHOTO"
from employees
where end_date < sysdate
select
OFFICES_REGIO
"OFFICE_ID","OFFICE_NAME","STATE_OR_
N_I
PROVINCE"
from offices
where state_or_province in
('CA','WA','OR','HI','AZ')
Based on this, we know that the view OFFICES_REGION_I was created with the
following statement:
create or replace view OFFICES_REGION_I as
select "OFFICE_ID","OFFICE_NAME","STATE_OR_PROVINCE"
from offices
where state_or_province in ('CA','WA','OR','HI','AZ')
8. Sequences
select SEQUENCE_NAME, MIN_VALUE, MAX_VALUE, INCREMENT_BY,
CYCLE_FLAG, ORDER_FLAG, CACHE_SIZE
from USER_SEQUENCES
Here are the sequences in our example system:
SEQUENCE_N MIN_VA MAX_VA INCREMEN CYCLE_F ORDER_F CACHE_
AME
LUE
LUE
T_BY
LAG
LAG
SIZE
EMPLOYEE
1
_SEQ
1.0000E
1
+27
N
N
20
MISC_SEQ
1.0000E
2
+13
Y
N
10
N
N
20
1
MISC2_SEQ 1.000E+ -1
26
-1
Let's decipher these values. All of the values in the EMPLOYEE_SEQ row above
are Oracle's default values, so we know it was created with the simple statement
"create sequence employee_seq".
The other two sequences had optional arguments specified. We can deduce that
the original sequence definitions were:
create sequence employee_seq;
create sequence misc_seq
increment by 2
start with 314
maxvalue 10000000000000
cycle
cache 10;
create sequence misc2_seq
increment by -1;
As an aside, notice that the max value for EMPLOYEE_SEQ is 1.0000E+27 (or
1,000,000,000,000,000,000,000,000,000). Sometimes novice Oracle programmers
feel uncomfortable using sequences to generate primary keys because they fear
the sequences might "run out" of values. But even if each of the six billion people
in the world orders a quadrillion items from your online store, there will still be
plenty of sequence values left for their future purchases.
9. Triggers
select TRIGGER_NAME, TRIGGER_TYPE, TRIGGERING_EVENT,
TABLE_NAME, WHEN_CLAUSE, DESCRIPTION, TRIGGER_BODY
from USER_TRIGGERS
In the example system, we have three triggers defined:
DESCRIPTION
softball_teams_tr
after insert on offices
for each row
WHEN_CLAUSE
TRIGGER_BODY
begin
insert into
softball_teams (
team_id, team_name
) values (
misc_seq.nextval,
:new.office_name
);
end;
begin
update softball_teams
softball_teams_update_tr old.office_name
set team_name =
after update on offices !=
:new.office_name
for each row
new.office_name
where team_name =
:old.office_name;
end;
employees_audit_tr
before update or delete
on employees
for each row
begin
insert into
employees_audit (
employee_id,
last_name, first_name,
email,
primary_office_id,
start_date, end_date,
salary,
years_experience,
management_track_p,
short_bio,
life_story, photo
) values (
:old.employee_id,
:old.last_name,
:old.first_name,
:old.email,
:old.primary_office_id,
:old.start_date,
:old.end_date,
:old.salary,
:old.years_experience,
:old.management_track_p,
:old.short_bio,
:old.life_story,
:old.photo
);
end;
From this, it's easy to put together the original trigger definitions, for example:
create or replace trigger softball_teams_tr
after insert on offices
for each row
begin
insert into softball_teams (
team_id, team_name
) values (
misc_seq.nextval, :new.office_name
);
end;
/
show errors;
and:
create or replace trigger softball_teams_update_tr
after update on offices
for each row
when (old.office_name != new.office_name)
begin
update softball_teams
set team_name = :new.office_name
where team_name = :old.office_name;
end;
/
show errors;
The general form is:
create or replace trigger TRIGGER_DESCRIPTION
when (WHEN_CLAUSE) [leave out this line if the WHEN_CLAUSE
is null]
TRIGGER_BODY
/
show errors;
10. PL/SQL Objects (functions, procedures, packages, and package bodies)
select NAME, TYPE, LINE, TEXT
from USER_SOURCE
order by NAME, TYPE, LINE
Our example results show that we have four PL/SQL objects defined:
NAME
TYPE
LI
NE
TEXT
HUMAN_RESOUR
PACKAGE
CES
1
package human_resources
HUMAN_RESOUR
PACKAGE
CES
2
is
HUMAN_RESOUR
PACKAGE
CES
3
HUMAN_RESOUR
PACKAGE
CES
4
HUMAN_RESOUR
PACKAGE
CES
5
HUMAN_RESOUR
PACKAGE
CES
6
HUMAN_RESOUR
PACKAGE
CES
7
HUMAN_RESOUR
PACKAGE
CES
8
end human_resources;
HUMAN_RESOUR PACKAGE
CES
BODY
1
package body human_resources
HUMAN_RESOUR PACKAGE
CES
BODY
2
is
HUMAN_RESOUR PACKAGE
CES
BODY
3
HUMAN_RESOUR PACKAGE
CES
BODY
4
HUMAN_RESOUR PACKAGE
CES
BODY
5
HUMAN_RESOUR PACKAGE
CES
BODY
6
HUMAN_RESOUR PACKAGE
CES
BODY
7
HUMAN_RESOUR PACKAGE
CES
BODY
8
HUMAN_RESOUR PACKAGE
CES
BODY
9
function add_office
(
v_office_name IN varchar
) return number;
function add_office
(
v_office_name IN varchar
) return number is
v_office_id number;
begin
select misc_seq.nextval into v_o
ffice_id from dual;
HUMAN_RESOUR PACKAGE
CES
BODY
10
HUMAN_RESOUR PACKAGE
CES
BODY
11
insert into offices
HUMAN_RESOUR PACKAGE
CES
BODY
12
(office_id, office_name)
HUMAN_RESOUR PACKAGE
CES
BODY
13
values
HUMAN_RESOUR PACKAGE
CES
BODY
14
(v_office_id, v_office_name);
HUMAN_RESOUR PACKAGE
CES
BODY
15
HUMAN_RESOUR PACKAGE
CES
BODY
16
HUMAN_RESOUR PACKAGE
CES
BODY
17
HUMAN_RESOUR PACKAGE
CES
BODY
18
HUMAN_RESOUR PACKAGE
CES
BODY
19
HUMAN_RESOUR PACKAGE
CES
BODY
20 end human_resources;
SOFTBALL_TEAM PROCEDUR
1
_DELETE
E
return v_office_id;
end add_office;
procedure softball_team_delete (
SOFTBALL_TEAM PROCEDUR
2
_DELETE
E
v_team_id IN number
SOFTBALL_TEAM PROCEDUR
3
_DELETE
E
)
SOFTBALL_TEAM PROCEDUR
4
_DELETE
E
is
SOFTBALL_TEAM PROCEDUR
5
_DELETE
E
begin
SOFTBALL_TEAM PROCEDUR
6
_DELETE
E
delete from softball_teams
SOFTBALL_TEAM PROCEDUR
7
_DELETE
E
where team_id = v_team_id;
SOFTBALL_TEAM PROCEDUR
8
_DELETE
E
end softball_team_delete;
To query the source for just one PL/SQL object, do the following:
select TEXT
from USER_SOURCE
where name='OBJECT_NAME'
and type='OBJECT_TYPE'
order by LINE
If we do this for our procedure SOFTBALL_TEAM_DELETE, we get
procedure softball_team_delete (
v_team_id IN number
)
is
begin
delete from softball_teams
where team_id = v_team_id;
end softball_team_delete;
Based on this, we know that the original procedure definition would have been:
create or replace procedure softball_team_delete (
v_team_id IN number
)
is
begin
delete from softball_teams
where team_id = v_team_id;
end softball_team_delete;
/
show errors;
Putting It Into a Script
While you certainly can do all of the above queries by hand, it may be more useful to put
them into a script which you can run at your convenience.
Here is an example script. If you are running the ArsDigita Community System 3.x, you
can run this script and other useful functions by downloading the small Schema and Data
Browser module that I wrote. If you're using a different architecture, you can use this
script as an example to write your own. Please let me know if you write such a script; I'll
add a link to it from here.
Download