Automating the Production of Descriptive Tables at Statistics Canada mog.ado, a user-written program with quality controls Questions and comments may be sent to the author at matt.hurst@statcan.gc.ca Contents Environment of where mog was developed— Statistics Canada Purpose of mog Examples Options: present and future Statistics Canada • Statistique Canada Statistics Canada Statistics Canada produces statistics that help Canadians better understand their country—its population, resources, economy, society and culture Objective statistical information is vital to an open and democratic society. It provides a solid foundation for informed decisions by elected representatives, businesses, unions and non-profit organizations, as well as individual Canadians As Canada’s central statistical agency, Statistics Canada is legislated to serve this function for the whole of Canada and each of the provinces In addition to conducting a Census every five years, there are about 350 active surveys on virtually all aspects of Canadian life • Data uses include: GDP, CPI, unemployment rate; health, social and education statistics We at Statistics Canada are committed to protecting the confidentiality of all information entrusted to us and to ensuring that the information we deliver is timely and relevant to Canadians Visit us at www.statcan.gc.ca for more information Source: http://www.statcan.gc.ca/about-apercu/overview-apercu-eng.htm Statistics Canada • Statistique Canada Collection and Dissemination Collecting data (census, administrative data and surveys) • Questionnaire development, testing, collection, and data processing Check data • Verification (errors in processing, coding mistakes) • Certification (compare estimates to other data sources) Preparations for dissemination (e.g. for an analysis made on the data) • Reliability of the estimates is acceptable • Suppression (confidentiality of respondents is being protected) • Significance testing between estimates Statistics Canada • Statistique Canada Purpose of mog mog designed to automate the dissemination quality control steps of: reliability, suppression, and significance testing As well, it displays estimates by up to two other classification variables in tabular form Result: a table giving estimates (mean or total) of one variable over one or two other categorical variables Useful for simple, descriptive statistics Statistics Canada • Statistique Canada Example I Make a table showing the mean of “retired” by age and education categories (similar to “table education age, c(m retired)”), but with quality control checks mog retired education age, nodetail survey dec(0) Means of retired by education and age Estimation technique for standard errors: linearized Table doctorate/maste~ diploma/certifi~ some university~ high school dip~ some secondary/~ 45 to 65 20 16 18 18 26 66 to 75 87^ 86^ 83^ 88^ 76*^ Over 75 88^ 97^ 92^ 79^ 76*^ Notes * significantly different from the reference group of the variable educ5, category number 1, p < .05 ^ significantly different from the reference group of the variable age3, category number 1, p < .05 The data in the table is not real. Statistics Canada • Statistique Canada Example II Same as example I with additional options mog retired education age, nodetail /// survey dec(0) ref2(2) pubs pubdichot underscores varwidth(40) Means of retired by education and age Estimation technique for standard errors: linearized Table doctorate/masters/bachelor's_degree diploma/certificate_from_community_colle~ some_university/community_college high_school_diploma some_secondary/elementary/no_schooling 45_to_65 20^E 16^ 18^E 18^ 26^ 66_to_75 87X 86 83X 88X 76* Notes * significantly different from the reference group of the variable educ5, category number 1, p < .05 ^ significantly different from the reference group of the variable age3, category number 2, p < .05 Over_75 88X 97^X 92X 79 76* The data in the table is not real. Statistics Canada • Statistique Canada Example I: the Long Way At Statistics Canada, to create the table in our example that meets key confidentiality and quality requirements (there are others) would need the following commands to be run: • One table command to create a table of estimates • One mean command and one estimates table command to examine individual significance of the 15 estimates • 22 test or lincom commands requiring visual inspection of results • One tabulate command and a visual inspection of 15 cell counts • In total, 26 lines of code and 52 numbers that need to be visually inspected, as opposed to 1 line of code to run mog and inspecting the 15 estimates it produces, all in one place The work multiplies for each table you have All of the above needs to be done again if the sample changes Statistics Canada • Statistique Canada Copying Process Select the table rows from the mog output Right click and select: • “copy table” if copying to a spreadsheet or word processor (in a Word table, select enough rows and columns in the table into which you are copying) • Other options include: “copy text” if copying to a word processor where you will use a fixed width font “copy html” if copying to a location where you want a table to be automatically generated mog’s underscore option useful when value labels have spaces—ensures the correct number of columns are created Statistics Canada • Statistique Canada Other Options Display Options: • • • • Can show quality control symbols that indicate: • • • • Number of decimal places displayed; number rounding Control of column width (although columns will automatically enlarge if large numbers/many decimal places are to be displayed) Reshow table by typing mog with no arguments Reshow table with different reference groups (or other display options) without re-estimating the variances (time saver when bootstrapping) individual statistical significance of results at two user-defined thresholds (e.g. F = do not publish if cv > 1/3, E = publish with warning if 1/3 >= cv >= 1/6); and whether the estimate is based on enough observations (e.g. X if too few) The cut-offs and symbols can be changed as per the user’s needs Statistics Canada surveys have “User Guides” that indicate these values Analysis • • • Significance level used for tests between classification levels can be changed (.05, .01, …) mog is “byable” Will use svyset information in variance estimation via “survey” option (not through svy prefix) Statistics Canada • Statistique Canada Future Options Save table as a csv file Show standard errors/t-ratios under estimates Harmonize syntax with Stata—use over() option to specify classification variables Use estimates based on different populations by one classification variable Use with proportion command Find alternative to the underscores option Statistics Canada • Statistique Canada Requests for the Program Contact me directly at matt.hurst@statcan.gc.ca and I will send you the program Please provide me with any comments you may have on bugs, wording, inconsistencies, etc. After receiving enough feedback, I will update the program and make it available online at one of the stata program archive sites Statistics Canada • Statistique Canada