Remove Records - Customer and Business Analytics

Page 78: If the selected records are not removed, the chi-square values are not calculated. Unfortunately, the Record Removal method described in step 10 is painful. Here’s A Useful Bit of Code for Removing records from the data efficiently First, the R Commander limitation to be aware of: In the Data Clean menu, the two remove records options: Remove records with missing data Remove selected records are problematic: The “--Remove records with missing data” menu in the Rcmdr GUI only operates properly when the “include all variables” box is checked. The “Remove selected records” menu, introduced on page 78 in step 10 of the Contingency Table tutorial requires a messy manual process. When you want to remove a record (i.e. a row) from a data set if a variable takes on a certain value, it can be done with two lines of script easily. Suppose you want to remove a row from a dataset (e.g., jack.jill) when a variable (e.g. Age) takes a particular value (e.g. “No female head”). Enter the line below into the script window of Rcmdr, highlight it, and press submit. It’generally a good idea to write a new data set with a new name, e.g., newdataset, and keep the old data rather than overwriting when you make changes that might be painful to undo. First line newdataset <- subset(dataset, subset = (variable != value)) If the value is character, as is common with categorical variables, remember to put the value in quotes to tell R that these are to be treated as characters. A continuous variable does not need quotes around the value. For example, if you want to remove all the cases in the dataset jack.jill where the Age variable is “No female head”: NfhAgejack.jill <- subset( jack.jill, Age != "No female head") View the data set to see that this has removed the cases. (!= means ‘not equal to’) . This writes all cases where there is no female head of the household to the new dataset, NfhAgejack.jill. When you are working with a continuous variable, you are done. However, as in this case where you have a categorical “factor” variable, the information that there was once a level called “No female head” is still associated with the new data set. This should be removed with a second line of code (make sure that the active data set NfhAgejack.jill is selected on the Data set button) Second line newdataset$variable <- factor(newdataset$variable) For example, for the Age variable, NfhAgejack.jill$Age <- factor(NfhAgejack.jill $Age) Using the Rcmdr Explore and Test Summarize Active data set, you should see that Age no longer has the “No female head” category. (R still thinks it exists for the other categorical variables, though). If so, now you can use Age and Spend.Cat in a contingency table and get the chi-square calculation and p-value. “No female head” occurs in other factor variables (Employment, Education, etc.) as well, but they are the same cases so they have been removed. However, you will have to execute the second line of code for each to get rid of the meta-information before you can use them.  Note1 Aside; If you want the newdataset to keep records with a value rather than remove, use the double equals comparison == rather than the not equals != newdataset <- subset(dataset, subset = (variable == value))  Note 2 NA, the missing value code, is not valid when using comparison operators, so this won’t work when you want to remove cases when the value of specific variable is missing (that’s what you would expect the “Remove cases with missing data” command to do when the “include all variables” box is unchecked). One way around this is to recode the NA to a different value, say “Miss” for categorical variables, or 99999 for continuous variables, then use the script above.

Remove Records - Customer and Business Analytics

Related documents

Products

Support

Remove Records - Customer and Business Analytics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib