Data Management

advertisement
Data Management
連賢明
政大財政
統計軟體

一般通用



個體計量


STATA
SAS
LIMDEP
高階軟體


MATLAB
GAUSS
2
STATA優點
容易上手
 執行速度快
 軟體可永久性使用
 網站建構相當完整




http://www.stata.com/
http://www.ats.ucla.edu/stat/stata/
電腦記憶體要多
3
Stat/Transfer
將其他檔案格式轉為STATA資料檔
 Stat/Transfer可支援的檔案類型






Excel
Limdep
SAS
SPSS
Many others
Stat/Transfer
Input File Type : 選取原始資料的檔案類型
 File Specification:輸入原始資料檔的路徑
 Output File Type:選取欲轉換之檔案類型
 File Specification:輸入轉換後資料檔欲儲存之路徑

Stat/Transfer

Variables 標籤下勾選需要的變數
STATA 介面
The command window:撰寫程式
 The result window :執行程式後之結果
 The review window :顯示執行過的程式
 The variable window :列出所有變數

1.1 Read the data

Read the ASCII file



Read the excel file



infile
must provide the variable name, width, and format
insheet
variable names need to be specified
Read the Stata file


use c:\regstata\elemapi
from the internet
cd
 dir
 use
 save

1.2 Describe the data

Describe the data




Data size
Observations
Variable name
Variable type (string, byte, float, etc)
直接按ok

Variables




api00/academic performance of the school
acs_k3/the average class size in kindergarten through 3rd
grade
meals/the percentage of students receiving free meals
full/the percentage of teachers who have full teaching
credentials

List



All observations
Some observations
Some variables
選取變數

Notice the missing values of meals.

Codebook



Number of values
Missing values
Distribution of values
選取變數後按ok

summarize




Provide concise information about variables
Observations
Basic statistics (mean, s.d., min, max)
Option: details
選取變數後按ok
1.3 Tab the data

Tabulate

Tabulate the size of class size

Look at the school and district number to check if they
are from the same district
1.4 Graph the data

Use graphs to examine the data


Histogram
Stem and leaf plot
A stem-and-leaf plot would also have helped to identify
these observations.
 This plot shows the exact values of the observations,
indicating that there were three -21s, two -20s, and one
-19.


Quiz 1: do a histogram on full
Let's look at the frequency distribution of full to see if
we can understand this better.
 The values go from 0.42 to 1.0, then jump to 37 and go
up from there. It appears as though some of the
percentages are actually entered as proportions, e.g.,
0.42 was entered instead of 42 or 0.96 which really
should have been 96.

Again, let's see which districts these data came from.
We note that all 104 observations in which full was less
than or equal to one came from district 401.
 Let's count how many observations there are in district
104 using the count command.

Two ways graphs
Scatterplot: show the joint distribution of two variables
 Let's look at the scatterplot matrix for the variables:

api
2000
20
avg
class
size
k-3
0
-20
100
pct
free
meals
50
0
100.00
pct full
credential
50.00
0.00
400
600
800
-20
1000
0
20 0
50
100
Correct the variable mistakes

acs_k3



Replace the negative values into the positive ones
replace acs_k3=-acs_k3 if acs_k3<0
Full


Change from the percentage to the proportion
replace full=full*100 if full<=1

save elemapi2, replace
Download