Estimate ratios

advertisement
Title
stata.com
ratio — Estimate ratios
Syntax
Remarks and examples
Also see
Menu
Stored results
Description
Methods and formulas
Options
References
Syntax
Basic syntax
ratio name: varname / varname
Full syntax
ratio ( name: varname / varname)
( name: varname / varname) . . .
if
in
weight
, options
options
Description
Model
stdize(varname)
stdweight(varname)
nostdrescale
variable identifying strata for standardization
weight variable for standardization
do not rescale the standard weight variable
if/in/over
over(varlist , nolabel ) group over subpopulations defined by varlist; optionally,
suppress group labels
SE/Cluster
vce(vcetype)
vcetype may be linearized, cluster clustvar, bootstrap, or
jackknife
Reporting
level(#)
noheader
nolegend
display options
set confidence level; default is level(95)
suppress table header
suppress table legend
control column formats and line width
coeflegend
display legend instead of statistics
bootstrap, jackknife, mi estimate, rolling, statsby, and svy are allowed; see [U] 11.1.10 Prefix commands.
vce(bootstrap) and vce(jackknife) are not allowed with the mi estimate prefix; see [MI] mi estimate.
Weights are not allowed with the bootstrap prefix; see [R] bootstrap.
vce() and weights are not allowed with the svy prefix; see [SVY] svy.
fweights, iweights, and pweights are allowed; see [U] 11.1.6 weight.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.
1
2
ratio — Estimate ratios
Menu
Statistics
>
Summaries, tables, and tests
>
Summary and descriptive statistics
>
Ratios
Description
ratio produces estimates of ratios, along with standard errors.
Options
Model
stdize(varname) specifies that the point estimates be adjusted by direct standardization across the
strata identified by varname. This option requires the stdweight() option.
stdweight(varname) specifies the weight variable associated with the standard strata identified in
the stdize() option. The standardization weights must be constant within the standard strata.
nostdrescale prevents the standardization weights from being rescaled within the over() groups.
This option requires stdize() but is ignored if the over() option is not specified.
if/in/over
over(varlist , nolabel ) specifies that estimates be computed for multiple subpopulations, which
are identified by the different values of the variables in varlist.
When this option is supplied with one variable name, such as over(varname), the value labels of
varname are used to identify the subpopulations. If varname does not have labeled values (or there
are unlabeled values), the values themselves are used, provided that they are nonnegative integers.
Noninteger values, negative values, and labels that are not valid Stata names are substituted with
a default identifier.
When over() is supplied with multiple variable names, each subpopulation is assigned a unique
default identifier.
nolabel requests that value labels attached to the variables identifying the subpopulations be
ignored.
SE/Cluster
vce(vcetype) specifies the type of standard error reported, which includes types that are derived from
asymptotic theory (linearized), that allow for intragroup correlation (cluster clustvar), and
that use bootstrap or jackknife methods (bootstrap, jackknife); see [R] vce option.
vce(linearized), the default, uses the linearized or sandwich estimator of variance.
Reporting
level(#); see [R] estimation options.
noheader prevents the table header from being displayed. This option implies nolegend.
nolegend prevents the table legend identifying the subpopulations from being displayed.
display options: cformat(% fmt) and nolstretch; see [R] estimation options.
The following option is available with ratio but is not shown in the dialog box:
coeflegend; see [R] estimation options.
ratio — Estimate ratios
Remarks and examples
3
stata.com
Example 1
Using the fuel data from example 3 of [R] ttest, we estimate the ratio of mileage for the cars
without the fuel treatment (mpg1) to those with the fuel treatment (mpg2).
. use http://www.stata-press.com/data/r13/fuel
. ratio myratio: mpg1/mpg2
Ratio estimation
myratio: mpg1/mpg2
Ratio
myratio
.9230769
Number of obs
Linearized
Std. Err.
.032493
=
12
[95% Conf. Interval]
.8515603
.9945936
Using these results, we can test to see if this ratio is significantly different from one.
. test _b[myratio] = 1
( 1) myratio = 1
F( 1,
11) =
Prob > F =
5.60
0.0373
We find that the ratio is different from one at the 5% significance level but not at the 1% significance
level.
Example 2
Using state-level census data, we want to test whether the marriage rate is equal to the death rate.
. use http://www.stata-press.com/data/r13/census2
(1980 Census data by state)
. ratio (deathrate: death/pop) (marrate: marriage/pop)
Ratio estimation
Number of obs
=
deathrate: death/pop
marrate: marriage/pop
Ratio
deathrate
marrate
.0087368
.0105577
Linearized
Std. Err.
.0002052
.0006184
. test _b[deathrate] = _b[marrate]
( 1) deathrate - marrate = 0
F( 1,
49) =
6.93
Prob > F =
0.0113
50
[95% Conf. Interval]
.0083244
.009315
.0091492
.0118005
4
ratio — Estimate ratios
Stored results
ratio stores the following in e():
Scalars
e(N)
e(N over)
e(N stdize)
e(N clust)
e(k eq)
e(df r)
e(rank)
Macros
e(cmd)
e(cmdline)
e(varlist)
e(stdize)
e(stdweight)
e(wtype)
e(wexp)
e(title)
e(cluster)
e(over)
e(over labels)
e(over namelist)
e(namelist)
e(vce)
e(vcetype)
e(properties)
e(estat cmd)
e(marginsnotok)
Matrices
e(b)
e(V)
e( N)
e( N stdsum)
e( p stdize)
e(error)
Functions
e(sample)
number of observations
number of subpopulations
number of standard strata
number of clusters
number of equations in e(b)
sample degrees of freedom
rank of e(V)
ratio
command as typed
varlist
varname from stdize()
varname from stdweight()
weight type
weight expression
title in estimation output
name of cluster variable
varlist from over()
labels from over() variables
names from e(over labels)
ratio identifiers
vcetype specified in vce()
title used to label Std. Err.
b V
program used to implement estat
predictions disallowed by margins
vector of ratio estimates
(co)variance estimates
vector of numbers of nonmissing observations
number of nonmissing observations within the standard strata
standardizing proportions
error code corresponding to e(b)
marks estimation sample
Methods and formulas
Methods and formulas are presented under the following headings:
The ratio estimator
Survey data
The survey ratio estimator
The standardized ratio estimator
The poststratified ratio estimator
The standardized poststratified ratio estimator
Subpopulation estimation
ratio — Estimate ratios
5
The ratio estimator
Let R = Y /X be the ratio to be estimated, where Y and X are totals; see [R] total. The estimate
b = Yb /X
b (the ratio of the sample totals). From the delta method (that is, a first-order
for R is R
b is
Taylor expansion), the approximate variance of the sampling distribution of the linearized R
b ≈
V (R)
o
1 n b
b + R2 V (X)
b
V (Y ) − 2RCov(Yb , X)
2
X
b, R
b, and the estimated variances and covariance of X
b and Yb leads to the
Direct substitution of X
following variance estimator:
n
o
bCov(
d Yb , X)
b +R
b2 Vb (X)
b
b = 1 Vb (Yb ) − 2R
Vb (R)
b2
X
(1)
Survey data
See [SVY] variance estimation, [SVY] direct standardization, and [SVY] poststratification for
discussions that provide background information for the following formulas.
The survey ratio estimator
Let Yj and Xj be survey items for the j th individual in the population, where j = 1, . . . , M and
M is the size of the population. The associated population ratio for the items of interest is R = Y /X
where
M
M
X
X
Y =
Yj
and
X=
Xj
j=1
j=1
Let yj and xj be the corresponding survey items for the j th sampled individual from the population,
where j = 1, . . . , m and m is the number of observations in the sample.
b for the population ratio R is R
b = Yb /X
b , where
The estimator R
Yb =
m
X
j=1
wj yj
and
b=
X
m
X
w j xj
j=1
and wj is a sampling weight. The score variable for the ratio estimator is
b
b
b
b = yj − Rxj = Xyj − Y xj
zj (R)
b
b2
X
X
6
ratio — Estimate ratios
The standardized ratio estimator
Let Dg denote the set of sampled observations that belong to the g th standard stratum and define
IDg (j) to indicate if the j th observation is a member of the g th standard stratum; where g = 1,
. . . , LD and LD is the number of standard strata. Also, let πg denote the fraction of the population
that belongs to the g th standard stratum, thus π1 + · · · + πLD = 1. Note that πg is derived from the
stdweight() option.
The estimator for the standardized ratio is
LD
X
bD =
R
πg
g=1
where
Ybg =
m
X
Ybg
bg
X
IDg (j) wj yj
j=1
bg is similarly defined. The score variable for the standardized ratio is
and X
bD ) =
zj (R
LD
X
πg IDg (j)
g=1
bg yj − Ybg xj
X
bg2
X
The poststratified ratio estimator
Let Pk denote the set of sampled observations that belong to poststratum k , and define IPk (j)
to indicate if the j th observation is a member of poststratum k , where k = 1, . . . , LP and LP is
the number of poststrata. Also, let Mk denote the population size for poststratum k . Pk and Mk are
identified by specifying the poststrata() and postweight() options on svyset; see [SVY] svyset.
The estimator for the poststratified ratio is
bP
bP = Y
R
bP
X
where
Yb P =
LP
X
Mk
k=1
ck
M
Ybk =
LP
m
X
Mk X
k=1
ck
M
IPk (j) wj yj
j=1
b P is similarly defined. The score variable for the poststratified ratio is
and X
bP ) =
zj (R
bP zj (X
bP )
b P zj (Yb P ) − Yb P zj (X
bP )
zj (Yb P ) − R
X
=
b P )2
bP
(X
X
where
bP
zj (Y ) =
LP
X
k=1
b P ) is similarly defined.
and zj (X
Mk
IPk (j)
ck
M
Ybk
yj −
ck
M
!
ratio — Estimate ratios
7
The standardized poststratified ratio estimator
The estimator for the standardized poststratified ratio is
bDP =
R
LD
X
πg
g
g=1
where
YbgP =
Lp
X
Mk
k=1
ck
M
Ybg,k =
YbgP
bP
X
Lp
m
X
Mk X
k=1
ck
M
IDg (j)IPk (j) wj yj
j=1
bgP is similarly defined. The score variable for the standardized poststratified ratio is
and X
bDP ) =
zj (R
LD
X
πg
g=1
where
zj (YbgP )
=
LP
X
k=1
and
bP )
zj (X
g
bgP )
bgP zj (YbgP ) − YbgP zj (X
X
bgP )2
(X
Mk
IPk (j)
ck
M
(
Ybg,k
IDg (j)yj −
ck
M
)
is similarly defined.
Subpopulation estimation
Let S denote the set of sampled observations that belong to the subpopulation of interest, and
define IS (j) to indicate if the j th observation falls within the subpopulation.
bS = Yb S /X
b S , where
The estimator for the subpopulation ratio is R
Yb S =
m
X
IS (j) wj yj
bS =
X
and
j=1
m
X
IS (j) wj xj
j=1
Its score variable is
bS
bS
bS
bS ) = IS (j) yj − R xj = IS (j) X yj − Y xj
zj (R
bS
b S )2
X
(X
The estimator for the standardized subpopulation ratio is
bDS =
R
LD
X
g=1
where
YbgS =
m
X
πg
YbgS
bS
X
g
IDg (j)IS (j) wj yj
j=1
b S is similarly defined. Its score variable is
and X
g
bDS ) =
zj (R
LD
X
g=1
πg IDg (j)IS (j)
bgS yj − YbgS xj
X
bgS )2
(X
8
ratio — Estimate ratios
The estimator for the poststratified subpopulation ratio is
bPS
bP S = Y
R
bPS
X
where
Yb P S =
LP
X
Mk
k=1
ck
M
YbkS =
LP
m
X
Mk X
ck
M
k=1
IPk (j)IS (j) wj yj
j=1
b P S is similarly defined. Its score variable is
and X
bPS b PS
bPS
bPS
bP S ) = X zj (Y ) − Y zj (X )
zj (R
b P S )2
(X
where
bPS
zj ( Y
)=
LP
X
k=1
Mk
IPk (j)
ck
M
(
Yb S
IS (j) yj − k
ck
M
)
b P S ) is similarly defined.
and zj (X
The estimator for the standardized poststratified subpopulation ratio is
bDP S =
R
LD
X
πg
g=1
where
YbgP S =
Lp
X
Mk
k=1
ck
M
S
Ybg,k
=
Lp
m
X
Mk X
k=1
ck
M
YbgP S
bgP S
X
IDg (j)IPk (j)IS (j) wj yj
j=1
bgP S is similarly defined. Its score variable is
and X
bDP S ) =
zj ( R
LD
X
g=1
where
zj (YbgP S ) =
LP
X
k=1
and
bgP S )
zj (X
πg
bgP S zj (YbgP S ) − YbgP S zj (X
bgP S )
X
bgP S )2
(X
Mk
IPk (j)
ck
M
(
S
Ybg,k
IDg (j)IS (j) yj −
ck
M
)
is similarly defined.
References
Cochran, W. G. 1977. Sampling Techniques. 3rd ed. New York: Wiley.
Stuart, A., and J. K. Ord. 1994. Kendall’s Advanced Theory of Statistics: Distribution Theory, Vol I. 6th ed. London:
Arnold.
ratio — Estimate ratios
Also see
[R] ratio postestimation — Postestimation tools for ratio
[R] mean — Estimate means
[R] proportion — Estimate proportions
[R] total — Estimate totals
[MI] estimation — Estimation commands for use with mi estimate
[SVY] direct standardization — Direct standardization of means, proportions, and ratios
[SVY] poststratification — Poststratification for survey data
[SVY] subpopulation estimation — Subpopulation estimation for survey data
[SVY] svy estimation — Estimation commands for survey data
[SVY] variance estimation — Variance estimation for survey data
[U] 20 Estimation and postestimation commands
9
Download