Reverse Engineering a Data Model Using the Oracle Data Dictionary The Problem Let's say you have to work with an Oracle database, using a data model that somebody else wrote. Maybe you're extending the data model or building an application that references it. There's only one problem: whomever created the original data model left without writing a line of documentation. What do you do? How to you reverse engineer the data model to unearth the table definitions, constraints, indexes, views, sequences, triggers, and PL/SQL functions and procedures? This ends up being an easy task if you use the Oracle data dictionary. The Oracle Data Dictionary Just like you use Oracle tables to store your data, Oracle uses tables to store its data. A set of tables, called the Oracle data dictionary, contains information about all the structures (tables, views, etc.) and procedural code (triggers, PL/SQL procedures, etc.) created by each user. For example, there's a table called USER_TAB_COLUMNS that contains information about all the columns you've defined, including: what table the column belongs to, the data type (number, varchar, etc.), what the default value is, whether the column can be null, etc. The Oracle data dictionary is huge and contains a lot of esoteric stuff, but when you whittle it down to only the info you need, it's not so menacing. Here are the data dictionary tables I find useful. You can do SELECTs on them, just as you would any other table in Oracle: USER_TABLES Lists each table that belongs to your Oracle user. USER_TAB_COMMENTS Shows comments on the tables and views. USER_TAB_COLUMNS Tells you the names, data types, default values, etc. of each column in each table. USER_COL_COMMENTS Shows comments on the columns. USER_CONSTRAINTS Gives you all constraints (either single- or multicolumn), such as primary key, foreign key, not null, check constraints, etc. USER_CONS_COLUMNS Maps constraints to columns (since a constraint can act on one or many columns). USER_INDEXES Lists indexes defined on columns (either defined explicitly when creating the data model or defined automatically by Oracle, as is the case with indexes on primary keys). USER_IND_COLUMNS Maps indexes to columns. USER_VIEWS Lists all views, along with the text used to originally create them. USER_SYNONYMS Lists the synonyms and original table names. USER_SEQUENCES Lists all sequences, including min value, max value, and amount by which to increment. USER_TRIGGERS Contains trigger names, criteria for activating each trigger, and the code that is run. Contains the source code for all PL/SQL objects, including functions, procedures, packages, and package bodies. All of the above tables (the USER_* tables) only contain objects defined by the current Oracle user. Oracle also maintains a set of tables of identical structure that start with ALL_*. These show you every object that you have access to, regardless of whether you created that object (e.g., our beloved friend DUAL). Similarly, Oracle provides DBA_* tables that contain info about all users' objects, but this group of tables is only accessible to the database administrator. USER_SOURCE Reverse Engineering the Data Model In the following sections, I'll show you the queries you need to do to find the following: Table names Table comments Columns, including names, data types, default values Column comments Constraints Indexes Views Sequences Triggers PL/SQL Objects (procedures, functions, packages, package bodies) Note: the following queries have been tested with Oracle 8i and Oracle 9i. I have not tried them out on other versions. The queries: 1. Table Names Find out what tables have been defined in your system: select TABLE_NAME from USER_TABLES TABLE_NAME is really the only important info we can get from Oracle's data dictionary table USER_TABLES. When tables are created, most of the action takes place in the definition of individual columns, which we'll look at later. For example, if you have four tables defined in your system, your query will return four rows: TABLE_NAME EMPLOYEES OFFICES SOFTBALL_TEAMS EMPLOYEES_AUDIT 2. Table Comments For each table, get any comments written by the data model author: select COMMENTS from USER_TAB_COMMENTS where TABLE_NAME = 'TABLE_NAME' and COMMENTS is not null Note that the TABLE_NAME must be written in all uppercase letters. Example: if we do this query for the EMPLOYEES table, we find the following comment: COMMENTS This is a table to hold all current, past, and future employees. Application developers might find the views EMPLOYEES_CURRENT, EMPLOYEES_PAST and EMPLOYEES_FUTURE useful. In my experience, very few developers document their tables within Oracle (if the tables are documented, the documentation is generally done in some file somewhere else). But if you want to be a conscientious developer and ensure that your comments show up in the data dictionary for future programmers to find, you can use the command: comment on table TABLE_NAME is 'This is my comment.' 3. Columns If you only want basic info about each column (name, type, and whether it's nullable), the easiest way to get it is to DESCRIBE the table (or DESC, for short). Let's see what columns the EMPLOYEES table contains: SQL> desc employees; Name ------------------------------------------------------------EMPLOYEE_ID NUMBER(38) LAST_NAME VARCHAR2(200) FIRST_NAME VARCHAR2(200) EMAIL VARCHAR2(100) PRIMARY_OFFICE_ID NUMBER(38) START_DATE END_DATE SALARY NUMBER(9,2) YEARS_EXPERIENCE MANAGEMENT_TRACK_P SHORT_BIO VARCHAR2(4000) LIFE_STORY PHOTO Null? Type -------- ------NOT NULL NOT NULL NOT NULL NOT NULL NOT NULL DATE DATE NUMBER CHAR(1) CLOB BLOB But if you want more detailed -- and parseable -- information about your tables, you will have to query from the data dictionary. Here's how we get the column info (note: this does not include the comments, constraints, and indexes, which are stored elsewhere in the data dictionary): select COLUMN_NAME, DATA_TYPE, DATA_LENGTH, DATA_PRECISION, DATA_SCALE, NULLABLE, DATA_DEFAULT from USER_TAB_COLUMNS where TABLE_NAME = 'TABLE_NAME' For example, if we do the above query for the EMPLOYEES table, we get back the following results: COLUMN_NAME DATA_TY PE DAT DATA_ DATA_ DATA_ A_ NULLAB LENG PRECISI DEFAU SCAL LE TH ON LT E EMPLOYEE_ID NUMBER 22 0 LAST_NAME VARCHA 200 R2 N FIRST_NAME VARCHA 200 R2 N EMAIL VARCHA 100 R2 Y PRIMARY_OFFICE_I NUMBER 22 D 0 N N START_DATE DATE 7 N END_DATE DATE 7 Y SALARY NUMBER 22 9 2 Y YEARS_EXPERIENCE NUMBER 22 Y MANAGEMENT_TRA CHAR CK_P Y 1 sysdate SHORT_BIO VARCHA 4000 R2 Y LIFE_STORY CLOB 4000 Y PHOTO BLOB 4000 Y 'f' Useful facts for deciphering the above: o o o The internal data type VARCHAR2 (the data type Oracle uses) corresponds to the external data type VARCHAR (the data type you use in table declarations). A column with DATA_TYPE = VARCHAR2 and DATA_LENGTH = 200 would be defined as VARCHAR(200). The internal data type NUMBER corresponds to the external data type: INTEGER if the scale is 0 and precision is null NUMBER if the scale and precision are both null NUMBER(9,2) if the precision is 9 and the scale is 2 NUMBER(3) if the precision is 3 and scale is 0 DATA_LENGTH is irrelevant for NUMBERs, DATEs, CLOBs and BLOBs Based on this, we can derive the following table definition: create table eve_employees ( employee_id integer not null, last_name varchar(200) not null, first_name varchar(200) not null, email varchar(100), primary_office_id integer not null, start_date date default sysdate not null, end_date salary years_experience management_track_p short_bio life_story photo date, number(9,2), number, char(1) default 'f', varchar(4000), clob, blob ); Note that we still haven't looked up any constraints, indexes, or column comments. 4. Column Comments select COLUMN_NAME, COMMENTS from USER_COL_COMMENTS where TABLE_NAME = 'TABLE_NAME' The EMPLOYEES table has two columns with comments: COLUMN_NAME COMMENTS PRIMARY_OFFICE_ID The office that the employee spends most of their time in. MANAGEMENT_TRACK_P Has the employee expressed a desire and aptitude for management training? Note that if you want to put comments into the data dictionary for future programmers to find, you can use the following syntax: comment on column TABLE_NAME.COLUMN_NAME is 'This is my comment.' 5. Constraints select UCC.CONSTRAINT_NAME, UCC.COLUMN_NAME, UC.CONSTRAINT_TYPE, UC.SEARCH_CONDITION, UC2.TABLE_NAME as REFERENCES_TABLE from USER_CONS_COLUMNS UCC, USER_CONSTRAINTS UC, USER_CONSTRAINTS UC2 where UCC.CONSTRAINT_NAME = UC.CONSTRAINT_NAME and UC.R_CONSTRAINT_NAME = UC2.CONSTRAINT_NAME(+) and UCC.TABLE_NAME = 'TABLE_NAME' order by UCC.CONSTRAINT_NAME For the EMPLOYEES table, we get: CONSTRAINT_ NAME COLUMN_NAME SYS_C005701 LAST_NAME 5 CONSTRA REFEREN SEARCH_CONDITI INT_ CES_ ON TYPE TABLE C "LAST_NAME" IS NOT NULL SYS_C005701 FIRST_NAME 6 C "FIRST_NAME" IS NOT NULL SYS_C005701 PRIMARY_OFFICE C 7 _ID "PRIMARY_OFFI CE_ID" IS NOT NULL SYS_C005701 START_DATE 8 "START_DATE" IS NOT NULL C SYS_C005701 MANAGEMENT_T C 9 RACK_P management_track _p in ('t','f') SYS_C005702 SHORT_BIO 0 C short_bio is not null or life_story is not null SYS_C005702 LIFE_STORY 0 C short_bio is not null or life_story is not null SYS_C005702 EMPLOYEE_ID 1 P SYS_C005702 EMAIL 2 U SYS_C005702 PRIMARY_OFFICE R 3 _ID OFFICES There are four types of constraint: o o o o P: primary key U: unique R: references C: check Note that the constraint SYS_C0057020 appears twice above; this is because it is a multi-column constraint. Note also that the "not null" constraints appear here even though they also appear in USER_TAB_COLUMNS (a little redundancy). Based on the information we have so far, we can document the table as follows: -- This is a table to hold all current, past, and future employees. Application -- developers might find the views EMPLOYEES_CURRENT, EMPLOYEES_PAST and -- EMPLOYEES_FUTURE useful. create table employees ( employee_id integer primary key, last_name varchar(200) not null, first_name varchar(200) not null, email varchar(100) unique, -- The office that the employee spends most of their time in. primary_office_id not null references offices, start_date date default sysdate not null, end_date date, salary number(9,2), years_experience number, -- Has the employee expressed a desire and aptitude for management training? management_track_p char(1) default 'f' check(management_track_p in ('t','f')), short_bio varchar(4000), life_story clob, photo blob, check(short_bio is not null or life_story is not null) ); 6. Indexes SELECT INDEX_NAME, COLUMN_NAME FROM USER_IND_COLUMNS WHERE TABLE_NAME='TABLE_NAME' ORDER BY INDEX_NAME The indexes on EMPLOYEES: INDEX_NAME COLUMN_NAME EMPLOYEE_DATES_IDX START_DATE EMPLOYEE_DATES_IDX END_DATE EMPLOYEE_YE_IDX YEARS_EXPERIENCE SYS_C0057021 EMPLOYEE_ID SYS_C0057022 EMAIL EMPLOYEE_DATES_IDX appears twice because it is a multi-column index. Oracle automatically created the index on EMPLOYEE_ID because it is a primary key. Oracle automatically created the index on EMAIL because that column has a unique constraint. From this, we can see that the original index definitions were: create index employee_dates_idx on employees(start_date, end_date); create index employee_ye_idx on employees(years_experience); 7. Views select UV.VIEW_NAME, UV.TEXT, UTC.COMMENTS from USER_VIEWS UV, USER_TAB_COMMENTS UTC where UV.VIEW_NAME = UTC.TABLE_NAME(+) In our example data model, we have the following views defined: VIEW_NAME EMPLOYEES_C URRENT TEXT COMME NTS All employe select es "EMPLOYEE_ID","LAST_NAME","FIRST_NA who've ME", already "EMAIL","PRIMARY_OFFICE_ID","START_D started ATE","END_DATE", working "SALARY","YEARS_EXPERIENCE","MANAG here and EMENT_TRACK_P", who "SHORT_BIO","LIFE_STORY","PHOTO" have not from employees yet where start_date <= sysdate ended and end_date >= sysdate their employ ment. select "EMPLOYEE_ID","LAST_NAME","FIRST_NA ME", "EMAIL","PRIMARY_OFFICE_ID","START_D EMPLOYEES_FU ATE","END_DATE", TURE "SALARY","YEARS_EXPERIENCE","MANAG EMENT_TRACK_P", "SHORT_BIO","LIFE_STORY","PHOTO" from employees where start_date > sysdate select "EMPLOYEE_ID","LAST_NAME","FIRST_NA ME", "EMAIL","PRIMARY_OFFICE_ID","START_D EMPLOYEES_PA ATE","END_DATE", ST "SALARY","YEARS_EXPERIENCE","MANAG EMENT_TRACK_P", "SHORT_BIO","LIFE_STORY","PHOTO" from employees where end_date < sysdate select OFFICES_REGIO "OFFICE_ID","OFFICE_NAME","STATE_OR_ N_I PROVINCE" from offices where state_or_province in ('CA','WA','OR','HI','AZ') Based on this, we know that the view OFFICES_REGION_I was created with the following statement: create or replace view OFFICES_REGION_I as select "OFFICE_ID","OFFICE_NAME","STATE_OR_PROVINCE" from offices where state_or_province in ('CA','WA','OR','HI','AZ') 8. Sequences select SEQUENCE_NAME, MIN_VALUE, MAX_VALUE, INCREMENT_BY, CYCLE_FLAG, ORDER_FLAG, CACHE_SIZE from USER_SEQUENCES Here are the sequences in our example system: SEQUENCE_N MIN_VA MAX_VA INCREMEN CYCLE_F ORDER_F CACHE_ AME LUE LUE T_BY LAG LAG SIZE EMPLOYEE 1 _SEQ 1.0000E 1 +27 N N 20 MISC_SEQ 1.0000E 2 +13 Y N 10 N N 20 1 MISC2_SEQ 1.000E+ -1 26 -1 Let's decipher these values. All of the values in the EMPLOYEE_SEQ row above are Oracle's default values, so we know it was created with the simple statement "create sequence employee_seq". The other two sequences had optional arguments specified. We can deduce that the original sequence definitions were: create sequence employee_seq; create sequence misc_seq increment by 2 start with 314 maxvalue 10000000000000 cycle cache 10; create sequence misc2_seq increment by -1; As an aside, notice that the max value for EMPLOYEE_SEQ is 1.0000E+27 (or 1,000,000,000,000,000,000,000,000,000). Sometimes novice Oracle programmers feel uncomfortable using sequences to generate primary keys because they fear the sequences might "run out" of values. But even if each of the six billion people in the world orders a quadrillion items from your online store, there will still be plenty of sequence values left for their future purchases. 9. Triggers select TRIGGER_NAME, TRIGGER_TYPE, TRIGGERING_EVENT, TABLE_NAME, WHEN_CLAUSE, DESCRIPTION, TRIGGER_BODY from USER_TRIGGERS In the example system, we have three triggers defined: DESCRIPTION softball_teams_tr after insert on offices for each row WHEN_CLAUSE TRIGGER_BODY begin insert into softball_teams ( team_id, team_name ) values ( misc_seq.nextval, :new.office_name ); end; begin update softball_teams softball_teams_update_tr old.office_name set team_name = after update on offices != :new.office_name for each row new.office_name where team_name = :old.office_name; end; employees_audit_tr before update or delete on employees for each row begin insert into employees_audit ( employee_id, last_name, first_name, email, primary_office_id, start_date, end_date, salary, years_experience, management_track_p, short_bio, life_story, photo ) values ( :old.employee_id, :old.last_name, :old.first_name, :old.email, :old.primary_office_id, :old.start_date, :old.end_date, :old.salary, :old.years_experience, :old.management_track_p, :old.short_bio, :old.life_story, :old.photo ); end; From this, it's easy to put together the original trigger definitions, for example: create or replace trigger softball_teams_tr after insert on offices for each row begin insert into softball_teams ( team_id, team_name ) values ( misc_seq.nextval, :new.office_name ); end; / show errors; and: create or replace trigger softball_teams_update_tr after update on offices for each row when (old.office_name != new.office_name) begin update softball_teams set team_name = :new.office_name where team_name = :old.office_name; end; / show errors; The general form is: create or replace trigger TRIGGER_DESCRIPTION when (WHEN_CLAUSE) [leave out this line if the WHEN_CLAUSE is null] TRIGGER_BODY / show errors; 10. PL/SQL Objects (functions, procedures, packages, and package bodies) select NAME, TYPE, LINE, TEXT from USER_SOURCE order by NAME, TYPE, LINE Our example results show that we have four PL/SQL objects defined: NAME TYPE LI NE TEXT HUMAN_RESOUR PACKAGE CES 1 package human_resources HUMAN_RESOUR PACKAGE CES 2 is HUMAN_RESOUR PACKAGE CES 3 HUMAN_RESOUR PACKAGE CES 4 HUMAN_RESOUR PACKAGE CES 5 HUMAN_RESOUR PACKAGE CES 6 HUMAN_RESOUR PACKAGE CES 7 HUMAN_RESOUR PACKAGE CES 8 end human_resources; HUMAN_RESOUR PACKAGE CES BODY 1 package body human_resources HUMAN_RESOUR PACKAGE CES BODY 2 is HUMAN_RESOUR PACKAGE CES BODY 3 HUMAN_RESOUR PACKAGE CES BODY 4 HUMAN_RESOUR PACKAGE CES BODY 5 HUMAN_RESOUR PACKAGE CES BODY 6 HUMAN_RESOUR PACKAGE CES BODY 7 HUMAN_RESOUR PACKAGE CES BODY 8 HUMAN_RESOUR PACKAGE CES BODY 9 function add_office ( v_office_name IN varchar ) return number; function add_office ( v_office_name IN varchar ) return number is v_office_id number; begin select misc_seq.nextval into v_o ffice_id from dual; HUMAN_RESOUR PACKAGE CES BODY 10 HUMAN_RESOUR PACKAGE CES BODY 11 insert into offices HUMAN_RESOUR PACKAGE CES BODY 12 (office_id, office_name) HUMAN_RESOUR PACKAGE CES BODY 13 values HUMAN_RESOUR PACKAGE CES BODY 14 (v_office_id, v_office_name); HUMAN_RESOUR PACKAGE CES BODY 15 HUMAN_RESOUR PACKAGE CES BODY 16 HUMAN_RESOUR PACKAGE CES BODY 17 HUMAN_RESOUR PACKAGE CES BODY 18 HUMAN_RESOUR PACKAGE CES BODY 19 HUMAN_RESOUR PACKAGE CES BODY 20 end human_resources; SOFTBALL_TEAM PROCEDUR 1 _DELETE E return v_office_id; end add_office; procedure softball_team_delete ( SOFTBALL_TEAM PROCEDUR 2 _DELETE E v_team_id IN number SOFTBALL_TEAM PROCEDUR 3 _DELETE E ) SOFTBALL_TEAM PROCEDUR 4 _DELETE E is SOFTBALL_TEAM PROCEDUR 5 _DELETE E begin SOFTBALL_TEAM PROCEDUR 6 _DELETE E delete from softball_teams SOFTBALL_TEAM PROCEDUR 7 _DELETE E where team_id = v_team_id; SOFTBALL_TEAM PROCEDUR 8 _DELETE E end softball_team_delete; To query the source for just one PL/SQL object, do the following: select TEXT from USER_SOURCE where name='OBJECT_NAME' and type='OBJECT_TYPE' order by LINE If we do this for our procedure SOFTBALL_TEAM_DELETE, we get procedure softball_team_delete ( v_team_id IN number ) is begin delete from softball_teams where team_id = v_team_id; end softball_team_delete; Based on this, we know that the original procedure definition would have been: create or replace procedure softball_team_delete ( v_team_id IN number ) is begin delete from softball_teams where team_id = v_team_id; end softball_team_delete; / show errors; Putting It Into a Script While you certainly can do all of the above queries by hand, it may be more useful to put them into a script which you can run at your convenience. Here is an example script. If you are running the ArsDigita Community System 3.x, you can run this script and other useful functions by downloading the small Schema and Data Browser module that I wrote. If you're using a different architecture, you can use this script as an example to write your own. Please let me know if you write such a script; I'll add a link to it from here.