Getting data

advertisement

The first step in data analysis

Learning Objective

Using SAS/BASE® to connect to third-party relational data base software to extract data needed for

 program evaluation research using administrative data operational reports e.g. routine surveillance

1.

What is a relational database?

2.

Contact your DBA for how to connect to your database(s)?

3.

How to write queries using PROC SQL

SHRUG, 2014-05-02 1

What is a relational database?

 Set of tables

 tables made up of rows and columns

 Trade names of relational databases (RDB):

 Oracle, Teradata, SQL Server, DB2, Access

 RDB is software which is designed to retain large amounts of data

 transactional DB

 reporting/warehousing DB

SHRUG, 2014-05-02 2

What is a relational database?

 Transactional DB designed to increase the speed for frontend users

 complex table and table join structures

 Warehousing DB designed for efficient storage and retrieval for reporting

 simpler table designs and table join structures

 Queries for either design use same syntax (code)

 queries for warehouses will be simpler to write

SHRUG, 2014-05-02 3

What is a relational database?

 Why use relational databases?

 relational databases use a concept called “normalization”

Normalization reduces the amount of redundant data and allows for updates to data with less error

There are degrees of normalization

 first degree second degree third degree and higher degrees

SHRUG, 2014-05-02 4

First degree normalization

 First degree normalization

 each row pertains to a single entity: a patient, an encounter, a physician each column pertains to a characteristic of the entity: e.g. date of birth, sex, date of encounter, etc

Table 1: Subjects with demographic information

ID

0001

0002

FirstName

John

Devbani

Gender

M

F

BirthCity

Moncton

Kolkata

BirthCountry

Canada

India

SHRUG, 2014-05-02 5

Violation of first degree normalization

Table 1: Subjects with improper 1NF

SubjID

0001

FirstName

John

Gender

43

BirthCity

Moncton

BirthCountry

New Brunswick

0002 Raha F West Bengal India

 What impact does violating the first degree normalization have on your query

 if you want all patients born in Canada?

 if you want all male patients?

SHRUG, 2014-05-02 6

Second degree normalization

 Table 2 has employer information about rows in Table 1

Table 2: Business addresses

Name City Prov PostalCode

John

Devbani

Halifax

Halifax

NS

NS

B3K 6R8

B3H 2Y9

 The table above has some redundant information:

 name is repeated from Table 1, province is embedded in the postal code

 Better design – two or even 3 tables

SHRUG, 2014-05-02 7

Second degree normalization

Table 2: Revised with 2NF

SubjID PostalCode

0001

0002

B3K 6R8

B3H 2Y9

Table 3: Creating a secondary table for 2NF

PostalCode

B3K 6R8

B3H 2Y9

City

Halifax

Halifax

Prov

NS

NS

SHRUG, 2014-05-02 8

Second degree normalization

 Table 2 now no longer contains name – it’s replaced with the subject ID

 to get the subject’s name we link the table to the table in the first example, using SUBJID/ID column

 we get the province and city by linking Table 2 and 3 using the POSTALCODE column

 SUBJID is a primary key in Tables 1 and 2

 POSTALCODE is a foreign key in Table 2, but a primary key in

Table 3

SHRUG, 2014-05-02 9

Primary/Foreign Keys

 primary key – a column or combination of columns that uniquely identify each row in the table

 e.g. patient medical record needs at least 3 columns to identify a unique record: patient ID, date of encounter, and provider ID

 foreign key – a column or combination of columns that is used to link data between two tables

SHRUG, 2014-05-02 10

Questions about 2NF?

 Can you see the advantage of splitting the data into different tables?

 share examples of your data where normalization is used

 higher degrees of normalization work similarly to the examples above

 you have to go through more tables for higher levels of normalization in order to link to the data that you need

SHRUG, 2014-05-02 11

Getting access to data: What do you need from DBA?

 Explain to DBA that you need to query data, but have no need to write to the database

 this helps them to determine where you belong on a user matrix

 DBA or IT install necessary software on your machine

 Google has lots of information on SAS Connect

 SAS Connect documentation

SHRUG, 2014-05-02 12

How SAS authenticates

proc sql;

• User name is provided by DBA/IT

In this example the password is held in the macro DBPASS

• Statement to have Oracle print any messages to the SAS log connect to oracle

(user = <userid> password="&dbpass” path = prod );

%put &sqlxmsg;

This is an example of “pass-through” code

SHRUG, 2014-05-02 13

Using a LIBNAME to connect

 Recall that slide 13 showed pass-through facility in SAS

 most of the query is done on the database

 Can use libname statement to connect instead of passthrough

 advantage to this method is that you are programming in

SAS (using SAS functions and formats)

 SAS determines which program (SAS or RDB) will handle statements more efficiently

SHRUG, 2014-05-02 14

Using a LIBNAME to connect

Example using a libname statement:

1.

2.

3.

libname onco odbc dsn='Oncolog' schema=dbo;

1.

2.

The name of the library

Tells SAS that you are using an ODBC engine

3.

DSN – use the name of the database that was used to set up the odbc connection

NOTE: schema statement is not always required

SHRUG, 2014-05-02 15

Seeing your data - Views

 Once view is created, you use the EXPLORER tab in SAS and use as normal dataset

SHRUG, 2014-05-02 16

Seeing your data - Views

Using the “view columns” in SAS EXPLORER

SHRUG, 2014-05-02 17

Seeing your data - Views

• Double click on table to get to see the data

• NOTE: columns that identify personal information have been removed from this screen shot

SHRUG, 2014-05-02 18

Other ways to view data

 You may have software from the RDB:

 TOAD (for Oracle)

SQL Developer (for Oracle)

SQL Server

 Teradata

 All vendors may have some limited function

“development” software that allows:

Viewing data

Viewing the “type” of a column: char, num, date, etc.

 Writing SQL queries

SHRUG, 2014-05-02 19

Sample view from SQL Developer

SHRUG, 2014-05-02 20

Syntax: Single table - 1 of 2

PROC SQL DATA STEP proc sql; create <table/view> <name> as select <var1>

, <var2>

, etc from <table/view> where <apply data filters> quit; data <dataset name>; set <dataset> ( keep= <list of variables> where=(<apply filters>)); run;

Example: Create a dataset (table) with men aged 50 to 74.

Assume the source table is called “demographics” and contains variables: subjectID, age and sex

SHRUG, 2014-05-02 21

Syntax: Single table – 2 of 2

PROC SQL proc sql; create table men5074 as select subjectID

, age from work.demographics

where sex=‘M’ and age

DATA STEP data men5074

(drop=sex); set work.demographics

(keep=subjectid sex age where=(sex='M' and

50<=age<=74));

; quit;

SHRUG, 2014-05-02 22

Syntax: Multiple tables

Request for report received

Realize as you look through data elements needed for complete the request that relevant columns reside in two or more tables

Background Information

1.

Need to know which tables and which columns are relevant. Useful to have data dictionary, otherwise 3 rd party software very helpful

2.

Need to know what filters to apply: sex, time period of interest, diagnosis codes, etc are all commonly applied filters

SHRUG, 2014-05-02 23

Map: Multiple tables

• Create a map to guide your query

• names of tables that go in ‘FROM’ statement of

SQL or ‘SET’ statement in DATA step

• names of columns that you need

• use meaningful arrows to connect work. table1 uniqueID filter

<other columns> work.table2

uniqueID filter 1 filter 2

<other columns>

SHRUG, 2014-05-02 24

Syntax: Multiple table DATA step –

1 of 2

*** sort the first dataset; proc sort data=<dataset1>; by <var(s)>; run;

*** sort the second dataset; proc sort data=<dataset2>; by <var(s)>; *** same var(s) as first sort; run;

SHRUG, 2014-05-02 25

Syntax: Multiple table DATA step -

2 of 2

*** find records common to both tables; data <result dataset>; merge <dataset1> (in=in_a)

<dataset2> (in=in_b); by <var(s)>;

*** we only want a list of records with data in table a AND in table b; if in_a and in_b; run;

SHRUG, 2014-05-02 26

Syntax: Multiple table PROC SQL using temporary tables proc sql feedback; create table <result table name> as select <columns from either or both tables below> from

*** temporary table from table1;

(select <column(s)> from <table 1> where <apply filter(s)>) a inner join

*** temporary table from table2;

(select <column(s)> from <table 2> where <apply filter(s)>) b on a.<pk>=b.<pk>

; quit;

SHRUG, 2014-05-02 27

Syntax: Multiple table PROC SQL. “Oraclestyle (PL/SQL)” proc sql feedback; create table <result table> as select <columns from one or more tables> from <table1> a

, <table2> b where a.pk=b.pk

<apply additional filters>

; quit;

SHRUG, 2014-05-02 28

Types of joins

RIGHT join – join Table A to B only if an observation exists in Table B

SHRUG, 2014-05-02 29

Compare syntax

PROC SQL

 no need to sort temporary tables needed to think about type of join

 in this case wanted patients only if they were in both tables join columns need to be same type but can have different names (slides 6 and 9)

DATA STEP

 needed to sort data by subjectID

 key variable to join demographic to cancersite table

 “by” variables need to have same name and type

What would you do if you found out that one record in table 1 matched to multiple records in table 2?

SHRUG, 2014-05-02 30

Issues arising from multiple table queries

 Table relationships are important:

 one-to-one: each record in first table has a maximum of one record in the second table (through primary key)

 one-to-many: each record in one table may have multiple rows in second table. Example:

Table 1 contains all patients with a flag indicating whether or not they are “active”

Table 2 contains all GP appointments for each patient

 many-to-many

SHRUG, 2014-05-02 31

Task 1 - single table

Task 2 – two tables

Task 3 – multiple tables

Task 4 – reusing a table multiple times

SHRUG, 2014-05-02 32

Task 1a - Participants

You are asked to provide a count of the female participants in a cancer screening program who are aged 50 years as of

May 31, 2013. Break down the birth dates by month

Approach 1

Create view of the table required and use SAS to analyze data

SHRUG, 2014-05-02 33

Task 1a – background information

 demographic data for participants is stored in table

“PARTICIPANTS”

 sex_cd is a coded variable: 222=F, 223=M, 240=U

 birth_dt is the column containing birth dates

 although birth_dt appears as a date type column, in SAS

Oracle dates are datetime types in SAS

 For a participant to be considered 50 years of age on May

31, 2013, their birthday must occur between June 1, 1962 and May 31, 1963

SHRUG, 2014-05-02 34

Task 1a – Create view using pass-through code proc sql feedback noprint; connect to oracle as myconn (user=&userid password=&pw path=&path); create view participant as select * from connection to myconn

(select * from csprod.participant

where sex_cd= 222 and trunc(birth_dt) between to_date('19620601','YYYYMMDD‘) and to_date('19630501','YYYYMMDD‘) and del_dt is null

); disconnect from myconn; quit;

SHRUG, 2014-05-02 35

Task 1a – Create view: Parsing the code. Slide 1 of 3

 Create view participant

 this syntax translates to “Create a view called ‘participant’

 select *

 ‘*’ is a wildcard and means select all

 where

 “%”, “_” – see Task 1b

 multiple (%) or single (_) byte of data, in contrast to the entire column. Only used to scan a column.

SHRUG, 2014-05-02 36

Task 1a – Create view: Parsing the code. Slide 2 of 3

 trunc()

 recall that SAS will treat Oracle, Teradata, SQL dates as

DATETIME

 trunc() is an Oracle function that looks only at the DATE part of the column

SHRUG, 2014-05-02 37

Task 1a – Create view: Parsing the code. Slide 3 of 3

 to_date(‘<yourdate>’, ‘<yourdate format>’)

 in this example to_date(‘19620601’, ‘YYYYMMDD’)

 take the string 19620601 and treat it as a date with the format YYYYMMDD

 could use other formats: YYYYMONDD, MM-DD-YYYY, etc

 BETWEEN operator – works as you would expect, includes both lower and upper limits specified

SHRUG, 2014-05-02 38

Task 1a – Contents of view

Variables in Creation Order

# Variable

1 PARTICIPANT_ID

2 FIRST_NAME

3 LAST_NAME

4 SEX_CD

5 BIRTH_DT

6 MIDDLE_NAME

20 DEL_USER_ID

21 DEL_REASON_CD

22 VERSION_NUM

23 RELIGION_CD

Type Len Flags Format

Num

Num

Num

Char

Num

Num

Num

8 P--

Char 100 P--

Char 100 P--

8 P--

8 P--

Char 100 P--

7 PRFD_NAME

8 PRFD_LANG_CD

9 NO_FUTURE_CONTACT_IND

Char

Num

Char

20 P--

8 P--

1 P--

10 EMAIL Char 100 P--

11 PRFD_CONTACT_METHOD_CD Num 8 P--

12 PART_STATUS_CD

13 STATUS_DT

14 STATUS_SOURCE_CD

Num

Num

Num

8 P--

8 P--

8 P--

Char 1 P-15 TWIN_IND

16 COMMENT_TXT

17 ROW_DT

18 ROW_USER_ID

19 DEL_DT

Char 200 P--

Num

Char

Num

8 P--

20 P--

8 P--

20 P--

8 P--

8 P--

8 P--

21.

$100.

$100.

21.

$100.

$20.

21.

$1.

$100.

21.

21.

21.

$1.

$200.

$20.

$20.

21.

21.

21.

Informat

21.

$100.

$100.

21.

Label

PARTICIPANT_ID

FIRST_NAME

LAST_NAME

SEX_CD

DATETIME20.

DATETIME20.

BIRTH_DT

$100.

$20.

21.

$1.

$100.

21.

21.

MIDDLE_NAME

PRFD_NAME

PRFD_LANG_CD

NO_FUTURE_CONTACT_IND

EMAIL

PRFD_CONTACT_METHOD_CD

PART_STATUS_CD

DATETIME20.

DATETIME20.

STATUS_DT

21.

$1.

$200.

STATUS_SOURCE_CD

TWIN_IND

COMMENT_TXT

DATETIME20.

DATETIME20.

ROW_DT

$20.

ROW_USER_ID

DATETIME20.

DATETIME20.

DEL_DT

$20.

21.

21.

21.

DEL_USER_ID

DEL_REASON_CD

VERSION_NUM

RELIGION_CD

SHRUG, 2014-05-02

NOTE: The contents show exactly the same columns as slide 31

39

Task 1a – Count of birth month

dob Frequency Percent

June 1184 7.56

July 1207 7.71

August

September

October

1241

1232

1232

7.92

7.87

7.87

November

December

January

February

March

April

May

1104

1053

3657

1113

1327

1266

45

7.05

6.72

23.35

7.11

8.47

8.08

0.29

Cumulative

Frequency

1184

2391

3632

4864

6096

7200

8253

11910

13023

14350

15616

15661

Cumulative

Percent

7.56

15.27

23.19

31.06

38.92

45.97

52.70

76.05

83.16

91.63

99.71

100.00

• More detailed analysis uncovered that missing month/day combinations were defaulted to January 1

SHRUG, 2014-05-02 40

Task 1b - Like operator (1 of 4)

REQUEST

Find a list of CCI codes for hysterectomy

 use of single table

 example of PROC SQL with SAS data

 filter using “like” operator

SHRUG, 2014-05-02 41

Task 1b – Like operator (2 of 4)

SHRUG, 2014-05-02 42

Task 1b – Like operator (3 of 4)

proc sql feedback; create table hystcd as select a.f1 as cci_code

, a.f3 as long_desc

, substr(a.f1,1,5) as rubric from work.cci_raw as a where upcase(a.f3) like '%HYSTERECTOMY%' or (upcase(a.f3) like '%EXCISION%' and upcase(a.f3) like '%UTERUS%') or (upcase(a.f3) like '%EXCISION%' and upcase(a.f3) like '%CERVIX%') order by a.f1

; quit;

SHRUG, 2014-05-02 43

Task 1b – Like operator (4 of 4)

RESULTING DATASET

SHRUG, 2014-05-02 44

Task 2 – Participants in the CRC program

 Report the number of men and women who turned 60 as of May 31, 2013, enrolled in the colorectal cancer screening program. Do not include participants with unknown sex

 participant table contains demographic information: sex, birth date

 participant_program table contains data for participants and screening program

 program_id=1 indicates colorectal cancer screening program program_status_cd=263 indicates that a participant is active in the program

SHRUG, 2014-05-02 45

Task 2 – Mapping your query

CSPROD.PARTICIPANT

participant_id sex_cd ≠ 240 birth_dt between 01Jun1952 and 31May1953 del_dt is null

CSPROD.PARTICIPANT_PROGRAM

participant_id program_status_cd=263 program_id=1 del_dt is null

SHRUG, 2014-05-02 46

Translating your map into sql code

- 1 of 3

*** METHOD 2 - Oracle pass through. Simple code; proc sql feedback noprint; connect to oracle as myconn (user=&userid password=&pw path=&path); create table part60 as select * from connection to myconn

(select ptc.gender

, count(*) from (select participant_id

, sex_cd

SHRUG, 2014-05-02 47

Translating your map into sql code

– 2 of 3

*** METHOD 2 - Oracle pass through. Simple code;

, case when sex_cd=222 then 'F' else 'M' end as gender from csprod.participant

where trunc(birth_dt) between to_date('19520601','YYYYMMDD') and to_date('19530531','YYYYMMDD') and sex_cd <> 240 and del_dt is null) ptc

SHRUG, 2014-05-02 48

Translating your map into sql code

– 3 of 3

inner join

(select participant_id from csprod.participant_program

where program_id=1 and program_status_cd=263 and del_dt is null) pp on ptc.participant_id=pp.participant_id

group by ptc.gender

; disconnect from myconn; quit;

SHRUG, 2014-05-02 49

Parsing Code – 1 of 3

proc sql feedback noprint; connect to oracle as myconn

(user=&userid password=&pw path=&path); create table part60 as select * from connection to myconn

Create a SAS dataset called “part60” select all columns from the query (seen after connection statement)

SHRUG, 2014-05-02 50

Parsing the code - 2 of 3

(select ptc.gender

, count(*) from (select participant_id

, sex_cd

, case when sex_cd=222 then 'F' else 'M' end as gender from csprod.participant

where trunc(birth_dt) between to_date('19520601','Y

YYYMMDD') and to_date('19530531','Y

YYYMMDD‘) and sex_cd <> 240 and del_dt is null) ptc

Put these columns in the SAS dataset part60

Create a temporary table called ‘ptc’

Table PTC contains columns as listed from the

PARTICIPANT table, with the restrictions shown in the

WHERE clause

SHRUG, 2014-05-02 51

Parsing the code – 3 of 3

inner join

(select participant_id from csprod.participant_program

where program_id=1 and program_status_cd=263 and del_dt is null)pp on ptc.participant_id= pp.participant_id

group by ptc.gender

; disconnect from myconn; quit;

Create temporary table, ‘PP’ from

PARTICIPANT_PROGRAM with restrictions defined in the

WHERE clause

SHRUG, 2014-05-02 52

Joins – for joining two or more tables

 This example shows an inner join: want participants, and the # males and females participating in CRC screening program age 60 as of May 31, 2013

PTC

C

PP

• Area C is the result of the inner join

• Temporary table PTC: a subset of csprod.participant

• Temporary table PP: a subset of csprod.part_program

SHRUG, 2014-05-02 53

Task 2 - Results

 What will be the query result?

 What’s the table/dataset name?

 How many rows?

 How many columns?

 What are the columns called?

SHRUG, 2014-05-02 54

Task 2 - Results

SHRUG, 2014-05-02 55

Task 3 – Patients with kidney cancer

REQUEST

 Find number of patients with invasive kidney cancer (ICD-O-

3=C64.9) diagnosed between 2008 and 2010. Breakdown counts by age and sex. Interested in age < 60 and age ≥ 60

BACKGROUND

 remove any patients who were deleted

 remove any tumors that were deleted

 diagnoses are in table called “oldiagnostic”

 sex is in table called “olpatient”

 birth date in table called “person”

SHRUG, 2014-05-02 56

Task 3 - Map

onco.oldiagnostic

personser ≠ deleted diagnosticser ≠ deleted dxstate=‘NS’ substr(icdohistocode,6,1)='3' year(dateinitial..) in (2008,

2009, 2010) onco.olpatient

personser olsex onco.person

personser persontype=‘patient’ datepart(dateofbirth)

SHRUG, 2014-05-02 57

Task 3 – Code (1 of 5)

proc sql feedback; create table onco_coh as select a.*

, b.olsex

, f.birth_dt

, floor(yrdif(f.birth_dt,a.initdx_dt,'act/act')) as ageatdx from

SHRUG, 2014-05-02 58

Task 3 – Code (2 of 5)

/*** get cases ***/

(select o.personser

, o.diagnosticser

, datepart(o.DateInitialDiagnosis) as initdx_dt format=date9.

, o.icdositecode

from onco.oldiagnostic

o where o.icdositecode in ('C64.9')

/*** only invasive cancers ***/ and substr(o.icdohistocode,6,1)='3' and 2010 and year(o.dateinitialdiagnosis) between 2008 and o.dxstate

='NS’

SHRUG, 2014-05-02 59

Task 3 – Code (3 of 5)

/*** patient not deleted ***/ and o.personser not in

(SELECT ps1.Personser

FROM onco.OlPatientSup ps1

WHERE ps1.PersonSer = o.PersonSer

and ps1.identifier = 'CCRPatientReportingStatu'

AND ps1.String IN ('04','05') and ps1.FieldSeq = 0)

SHRUG, 2014-05-02 60

Task 3 – Code (4 of 5)

/*** diagnosis not deleted ***/ and o.diagnosticser not in

(SELECT ds1.diagnosticser

FROM onco.OLdiagnosticsup ds1

WHERE ds1.PersonSer = o.PersonSer

and o.diagnosticser = ds1.diagnosticser

and ds1.identifier = 'CCRPrimaryReportingStatu'

AND ds1.String IN ('04','05') and ds1.FieldSeq = 0)) a

SHRUG, 2014-05-02 61

Task 3 – Code (5 of 5)

/*** get patient's sex ***/ left join

(select personser

, olsex from onco.olpatient) b on a.personser=b.personser

/*** get birth date ***/ left join

(select personser

, datepart(DateOfBirth) as birth_dt format=date9.

from onco.person

where lowcase(persontype)='patient') f on a.personser=f.personser

; quit;

SHRUG, 2014-05-02 62

Task 3 - Results

Sex

M

F

Total

Age at diagnosis

Under 60 60 and older Total

128

76

204

247

147

394

375

223

598

SHRUG, 2014-05-02 63

Self-join

Correlated sub-query

Outer from and where

UNION

SHRUG, 2014-05-02 64

What is the sound of one table joining?

77 /* select candidates for babes becoming mothers */

78 proc

79 ; sql

80 create table Candidates as

81 select B1.BrthDate

82 , B1.BirthID

83 , B2.DLMBDate

84 , B2.ContctID

85 from SASDM.DelnBrth as B1 /* babes */

86 , SASDM.DelnBrth as B2 /* mums */

87 where B1.BrthDate = B2.DLMBDate;

NOTE: Table WORK.CANDIDATES created, with 855040 rows and 4 columns.

SHRUG, 2014-05-02 65

Correlated Sub-query

 OB/Research has data in:

 Clinical ultrasound db

 Maternal serum screening db

 Objective: find all mothers with abnormal screening and see if the ultrasound indicated risk for restricted growth

(small baby)

SHRUG, 2014-05-02 66

Correlated Sub-Query

create table Work.WithAtlee as

/* VP data only available after 2003 not

2000 */ select One18.* /* 18-wk US */

, M.MO365/*perinatal data*/

, M.Wgt4Age

/* 45 lines omitted here */

SHRUG, 2014-05-02 67

The Sub-Query

, M.DLPrvNND

, M.DLPrvFTD /* no such variable as IUGR in */

, M.MotherID in /* previous pregnancy - back link */

( select Prev.MotherID

from SASDM.DelnBrth as Prev

, M.DLPrvLBW

, M.Prev_PTD

where Prev.MotherID = M.MotherID

and Prev.Wgt4Age in ( 1, 2)/* pick<5th*/ and Prev.BrthDate < M.BrthDate ) as Previous_IUGR

SHRUG, 2014-05-02 68

Outer From and Where

from Work.VP1USper18 as One18

, SASDM.Monster as M /*inner join-only want */ where (One18.ContctID = M.ContctID)

/* matches, in both databases, and linkable */ and ( M.DLDschD8 between

'01Oct2003'D and '30Sep2008'D ) )

/* 5 full years, 10/2003 on */ and ( not Major_Anom ); exclusion */

/* clarified this as complete

SHRUG, 2014-05-02 69

For the UNION makes us strong

create table Work.NoPlace as select VS.*

, . as GA

, . as CountyNum from Work.VS_2302 as VS where VS.VS_Deaths_ID not in

( select VS_Deaths_ID from Work.FADBPlace

union select VS_Deaths_ID from Work.NSAPDPlace ) and ( VS.Age_Code in ( 1, 2 ) ) and (BrthD8 between '01Jan2010'D and '31Dec2010'D);

SHRUG, 2014-05-02 70

 Review code, functions, operators

 Recap learning objectives

SHRUG, 2014-05-02 71

Code/Functions/Operators

 proc sql skeleton

 libname to connect to database

 pass-through to connect to database

 trunc(): an Oracle function

 to_date(): an Oracle function

 case-when: SQL statements

 is null/is not null

 wildcards: “*”, “_”

SHRUG, 2014-05-02 72

Recap of objectives

RDB tables normalization keys: primary/foreign joins

Connecting in SAS

ODBC

Oracle client server info from IT connecting: libname vs pass-through

PROC SQL compare to datastep single table multiple tables self-joins http://support.sas.com/documentation/cdl/en/acreldb/65247/PDF/default/acreldb.pdf

SHRUG, 2014-05-02 73

John Fahey

Biostatistician, Halifax shrug.president@gmail.com

Devbani Raha

Cancer Care Nova Scotia shrugexec@gmail.com

74 Photograph taken by Ken DeBacker, 2011

SHRUG, 2014-05-02

Download