Automating the Production of Descriptive Tables at Statistics Canada

advertisement
Automating the Production of
Descriptive Tables at
Statistics Canada
mog.ado, a user-written program with
quality controls
Questions and comments may be sent to the author at matt.hurst@statcan.gc.ca
Contents
 Environment of where mog was developed—
Statistics Canada
 Purpose of mog
 Examples
 Options: present and future
Statistics Canada • Statistique Canada
Statistics Canada




Statistics Canada produces statistics that help Canadians better understand
their country—its population, resources, economy, society and culture
Objective statistical information is vital to an open and democratic society. It
provides a solid foundation for informed decisions by elected representatives,
businesses, unions and non-profit organizations, as well as individual Canadians
As Canada’s central statistical agency, Statistics Canada is legislated to serve
this function for the whole of Canada and each of the provinces
In addition to conducting a Census every five years, there are about 350 active
surveys on virtually all aspects of Canadian life
•


Data uses include: GDP, CPI, unemployment rate; health, social and education
statistics
We at Statistics Canada are committed to protecting the confidentiality of all
information entrusted to us and to ensuring that the information we deliver is
timely and relevant to Canadians
Visit us at www.statcan.gc.ca for more information
Source: http://www.statcan.gc.ca/about-apercu/overview-apercu-eng.htm
Statistics Canada • Statistique Canada
Collection and Dissemination
 Collecting data (census, administrative data and surveys)
• Questionnaire development, testing, collection, and data
processing
 Check data
• Verification (errors in processing, coding mistakes)
• Certification (compare estimates to other data sources)
 Preparations for dissemination (e.g. for an analysis made
on the data)
• Reliability of the estimates is acceptable
• Suppression (confidentiality of respondents is being protected)
• Significance testing between estimates
Statistics Canada • Statistique Canada
Purpose of mog
 mog designed to automate the dissemination
quality control steps of: reliability, suppression,
and significance testing
 As well, it displays estimates by up to two other
classification variables in tabular form
 Result: a table giving estimates (mean or total) of
one variable over one or two other categorical
variables
 Useful for simple, descriptive statistics
Statistics Canada • Statistique Canada
Example I
Make a table showing the mean of “retired” by age and education categories
(similar to “table education age, c(m retired)”), but with quality control checks
mog retired education age, nodetail survey dec(0)
Means of retired by education and age
Estimation technique for standard errors: linearized
Table
doctorate/maste~
diploma/certifi~
some university~
high school dip~
some secondary/~
45 to 65
20
16
18
18
26
66 to 75
87^
86^
83^
88^
76*^
Over 75
88^
97^
92^
79^
76*^
Notes
* significantly different from the reference group of the variable
educ5, category number 1, p < .05
^ significantly different from the reference group of the variable
age3, category number 1, p < .05
The data in the table is not real.
Statistics Canada • Statistique Canada
Example II
Same as example I with additional options
mog retired education age, nodetail ///
survey dec(0) ref2(2) pubs pubdichot underscores varwidth(40)
Means of retired by education and age
Estimation technique for standard errors: linearized
Table
doctorate/masters/bachelor's_degree
diploma/certificate_from_community_colle~
some_university/community_college
high_school_diploma
some_secondary/elementary/no_schooling
45_to_65
20^E
16^
18^E
18^
26^
66_to_75
87X
86
83X
88X
76*
Notes
* significantly different from the reference group of the variable
educ5, category number 1, p < .05
^ significantly different from the reference group of the variable
age3, category number 2, p < .05
Over_75
88X
97^X
92X
79
76*
The data in the table is not real.
Statistics Canada • Statistique Canada
Example I: the Long Way
 At Statistics Canada, to create the table in our example that meets
key confidentiality and quality requirements (there are others) would
need the following commands to be run:
• One table command to create a table of estimates
• One mean command and one estimates table command to examine
individual significance of the 15 estimates
• 22 test or lincom commands requiring visual inspection of results
• One tabulate command and a visual inspection of 15 cell counts
• In total, 26 lines of code and 52 numbers that need to be visually
inspected, as opposed to 1 line of code to run mog and inspecting
the 15 estimates it produces, all in one place
 The work multiplies for each table you have
 All of the above needs to be done again if the sample changes
Statistics Canada • Statistique Canada
Copying Process
 Select the table rows from the mog output
 Right click and select:
• “copy table” if copying to a spreadsheet or word processor (in a
Word table, select enough rows and columns in the table into
which you are copying)
• Other options include:
“copy text” if copying to a word processor where you will use a fixed
width font
“copy html” if copying to a location where you want a table to be
automatically generated
 mog’s underscore option useful when value labels have
spaces—ensures the correct number of columns are
created
Statistics Canada • Statistique Canada
Other Options

Display Options:
•
•
•
•

Can show quality control symbols that indicate:
•
•
•
•

Number of decimal places displayed; number rounding
Control of column width (although columns will automatically enlarge if large numbers/many
decimal places are to be displayed)
Reshow table by typing mog with no arguments
Reshow table with different reference groups (or other display options) without re-estimating
the variances (time saver when bootstrapping)
individual statistical significance of results at two user-defined thresholds (e.g. F = do not
publish if cv > 1/3, E = publish with warning if 1/3 >= cv >= 1/6); and
whether the estimate is based on enough observations (e.g. X if too few)
The cut-offs and symbols can be changed as per the user’s needs
Statistics Canada surveys have “User Guides” that indicate these values
Analysis
•
•
•
Significance level used for tests between classification levels can be changed (.05, .01, …)
mog is “byable”
Will use svyset information in variance estimation via “survey” option (not through svy prefix)
Statistics Canada • Statistique Canada
Future Options
 Save table as a csv file
 Show standard errors/t-ratios under estimates
 Harmonize syntax with Stata—use over() option
to specify classification variables
 Use estimates based on different populations by
one classification variable
 Use with proportion command
 Find alternative to the underscores option
Statistics Canada • Statistique Canada
Requests for the Program
 Contact me directly at matt.hurst@statcan.gc.ca
and I will send you the program
 Please provide me with any comments you may
have on bugs, wording, inconsistencies, etc.
 After receiving enough feedback, I will update
the program and make it available online at one
of the stata program archive sites
Statistics Canada • Statistique Canada
Download