Data entry – principles and practices Module 2 Session 2 1 Overview This session is concerned with the principles and practices of data entry so that participants can: i. advise others on how to do effective data entry ii. explain principles of good data entry through practice with a small set of data 2 Data management cycle Design questionnaire Design survey Conception Reporting of results Data analysis Enumerators collect data in the field Manual checking, editing etc. Data entered onto computer Now we start looking at entering data Computer data management 3 Contents Review different types of questions that can be found on questionnaires Review different data types Enter a small dataset onto the computer Summarise steps in data entry, and principles of good data entry The Epi Info software is used for data entry. 4 Learning Objectives At the end of this session participants should be able to: enter questionnaire data onto the computer summarise the steps in the data entry process produce a checklist of data entry principles describe double data entry 5 Questions and data types Preliminary review of the questions on a questionnaire gives the data entry person an idea of: the types of data to be entered the complexity of the data to be entered quality of the data on the questionnaires. It is also essential for designing the computer data entry screens. Here we look at some example questions. 6 Types of questions What is These are examples of numeric data. (NAME'S) age in completed years? For how long has (NAME) stayed in in the household during the last 12 months? (In months) First one take values of 1, 2, 3, etc. Units are in years. What is maximum value? Second one can take values of 0, 1, 2, 3, etc. Units are in months. Duration cannot be more than 12 months. 7 Types of questions Sex Male.................... 1 Female................ 2 This is an example of categorical data. It has two possible values – male and female. Coded as 1 and 2. The codes are entered onto the computer. Other similar examples are Yes/No types of response. Coding is often Yes = 1 No = 0; or Yes = 1 No = 2. [Should be consistent throughout.] 8 Types of questions What is the relationship of of (NAME) to the head of household? This is also categorical data. There are 12 possible values; coded 1 to 12. Need sufficient space in computer system to be able to enter up to 2-digit numbers. Head...................................………………………… 1 Spouse....................................………………………… 2 Son/daughter……………………………………………. 3 Grand child..................................………………… 4 Step child..................................………………….. 5 Parent of head or spouse..................................... 6 Sister/Brother of head or spouse..................................... 7 Nephew/Niece…………………………………………. 8 Other relatives...........................................… 9 Servant...........................................…………………… 10 Non Relative..............................................… 11 Others..............................................………….. 12 9 Types of questions What is [NAME'S] current schooling This is also a categorical variable. Are the categories in any particular order? Are the categories mutually exclusive? Status? Never attended............................................................ 01 Left school.......................………………………… 02 Currently attending: Nursery................................................................... 03 Primary........................................................... 04 Post primary....................................................... 05 Secondary.......................................................... 06 Post secondary**.............................................. 07 A diploma course............................................. 08 University..................................................... 09 Apprenticeship................................................... 10 10 Multiple response questions Multiple response questions can be in the form of: Multiple dichotomy Responses listed but not ordered Ranked e.g. List 1st, 2nd, 3rd. How should these be entered? 11 Example: Multiple dichotomy Question from UNHS. S5b10. Does this household own any of the following? Yes =1 No= 2 Motor vehicle 1 Motor cycle 2 Bicycle 1 Boat/canoe 2 Donkey 2 12 Example: Listed but not ordered multiple responses UNHS S3a3. What sort of sickness/injury did [x] suffer? (column.3) Malaria Respiratory Measles Diarrhoea Aids Pregnancy related problems Dental Accident Intestinal worms Sick infections Others 01 02 03 04 05 06 07 08 09 10 11 If code 01 (malaria) did in column (3) 5. What type of drug did [X] take? None ………………………………1 Chroloquine 2 Fansidar…………………………… 3 Camaquine ……………………….. 4 Quinine ……………………………..5 Panadol …………………………….6 Aspirin ……………………………...7 Herbs ……………………………... 8 Others …………………………….. 9 (5a) (5b) 2 (5c) 5 3 13 Example: Ranked multiple responses UNHS S3bq3: What are the main channels of communication from which you receive AIDS/HIV information and Education? (Note that the channels should be ranked in order of the three most important) (use codes at the bottom of page) 1st (3) 2nd 08 (4) 01 3rd (5) 07 Channels of communication (codes for col. (3), (4), and (5) Radio 01 Posters 05 Teachers 09 TV 02 Billboards 06 Political leaders 10 Film 03 Family 07 Trad. Leaders 11 Drama 04 Friends 08 Religious leaders 12 14 Computerisation The dichotomous Multiple Response questions require one column for each Yes/No (or 1/0) response each one indicating whether respondent ticked / did not tick item in the list. In the ordered or ranked multiple responses, can have as many columns as there are alternatives in the question, but the first records the most important etc.. 15 More complex questions Did [NAME] fall sick or get injured during the last 30 days? If code 1 in col [2] What sort of How many If code 01 (Malaria) days were lost (suffered) by [NAME] due to the 01 illness/Injury? Malaria............................................... What type of drugs did [NAME] take? 09 Intestinal Infections................................ 7 Asprin...……………………………………………. 10 Skin Infections.......................................... 8 Herbs…………..………………………………………… sickness/injury did [NAME] suffer? in col (3) How 02 1 Respiratory.......................................................... Yes............................................................ should 03 2 Measles................................................................. No....................................................... 1 None…………………………………………………. 04 3 Diarrohea..............................................................… Don't know..................... these 2 Chloroquine…………………………………….. 05 AIDS...................................................................... 3 Fansidar…………………………………………….. Pregnancy (if no or data be 4 Camaquine.…………………………………….. 06 Related Problems.................................................................. don't know 5 Quinine...………………………………………………… 07 Dental.................................................................. skip to entered? 6 Panadol...……………………………………………. 08 Accident............................................................... col. (11) 9 Other (specify)…………..……………. 11 Hyper - tension.................................................. 12 Ulcers.................................................................... 13 Mental Illness................................................................... 14 Other fever............................................... 15 Others................................................................... 16 Missing values Surveys will always have missing data Data can be missing for a variety of reasons: respondent did not know the answer; respondent refused to answer; question was not applicable; question was missed by the fieldworker; response was not recorded clearly; etc. 17 Coding missing data Assigning codes to missing data – avoids blanks in the data. Code must not be a possible value. For numeric data (e.g. Age) negative value often used (e.g. -99) For categorical data use a code higher than any valid code for the question (e.g. 99) 18 Missing value codes Different codes could be used for different types of missing data. 99 or -99 = question missed by fieldworker 88 or -88 = question not applicable 77 or -77 = don’t know or refused to answer Should be consistent throughout 19 Unique Identifier Each set of data should have a unique identifier. Often referred to as a Primary Key. In household surveys for example you often have a Household ID. This would be unique for each household and enables you to easily find the data for the household. 20 Activity 2 In pairs. Look at questionnaires. Identify types of questions, and types of data. Class discussion. 21 Brief introduction to Epi Info… Epi Info is a series of freely distributable programs for Microsoft Windows, for managing databases (especially public health ones) can customize the data entry process (layout similar to questionnaire), enter and analyse data. 22 Brief introduction to Epi Info… Epi Info contains: Projects (file, .mdb) which have Data Tables stores the data View: info about the screen appearance, or how the survey looks, and how data is entered into the data table. It has fields (variables) which are created to hold data. 23 Data entry in Epi Info Points to note: View can span several “pages” Space assigned for “Other, specify” text Questions can be skipped if not relevant Demonstrate data entry using the Household Survey data 24 Activity 4 & 5 Entering a small dataset into Epi Info. Record some principles of good data entry. Record the steps in the data entry process. 25 Double data entry Data entry needs to be checked. If data set is small, can print out and check manually. If dataset is large, this can be resourceintensive and time consuming, - How many records do you need to check? Double data entry = dataset is entered twice (by different people) and datasets compared. Discrepancies are checked and corrected. 26 Data Compare Utility Utilities -> Data Compare File -> New Script Step 1: Epi Info View – select the files to compare Step 2: Checks that structure of the files is the same Step 3: Select the unique identifier Step 4: Select the fields to compare (all) View -> Read-Only Demonstration of Data Compare using data1 and data2 27 Activity 7 Use the Data Compare utility to compare data entered in Activity 4 with data entered by another group 28