Chapter 8 - R - Department of Statistics and Analytical Sciences

advertisement
Chapter 8 – R
What is R?
Unlike MS Excel, SPSS, and Minitab, yet similar to SAS, R is a commands-driven programming environment to
execute statistical analysis. Unlike all of the other software packages we have discussed which are proprietary 1
(including SAS), R is an open-source program that is free and readily available via download from the internet.
Of all the packages, we acknowledge that both R and SAS represent substantial challenges for students.
However, like SAS, R is among the most analytically comprehensive and most flexible of the statistical software
applications. Furthermore, R is becoming quite popular in quantitative analysis in many fields including
statistics, social science research (Psychology, Sociology, Education, etc.), marketing research, business
intelligence, etc.
R is an implementation of the S-Plus programming language that was originally developed by Bell Labs in the
1970s. Therefore, S-Plus and R code are most often interchangeable and instructions for one program will be
applicable to the other.
Obtaining R
Before importing the WidgeOne example data into R and subsequently tackling the core STAT 3010 statistical
concepts, we will first discuss how to obtain R.
R is available on the citrix server, however, it is also available for free download from the internet 2. Follow these
steps to download and install R.
And therefore very expensive.
We recommend that you download R and install it on your local computer so you will not be negatively impacted by large demand
on the server during peak times.
1
2
Step 1.
3
The official R website is called CRAN (The Comprehensive R Archive Network). Therefore, search for
CRAN3 in your favorite internet search engine (see Figure 8.1).
The URL for CRAN is http://cran.r-project.org/
2
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.1: Searching for CRAN, the R Website.
3
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Step 2.
4
From the main CRAN page, select "Download R for MacOS X" or "Download R for Windows".
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Step 3.
5
Next, select "base" to download the basic R program.
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Step 4.
6
Next, select "Download R XX for XX" to download the R installation program (where XX is the version
number).
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Step 5.
7
Save and then run the R-XX.exe file (where XX is the version number).
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Step 6.
Follow the steps of the R setup wizard.
R Basics & Orientation
After installing and launching R for the first time, the user is presented with the basic R interface (see Figure 8.2).
The main component of this interface is the R Console (see Figure 8.2). This is where the user submits commands
to the program AND where R prints the results of those commands. However, typing commands directly into the
console is often not done because it is easy to make an error and difficult to re-create what you did at a later
time. Therefore, one can also write, develop (debug), and submit R code from a separate savable file called a
script. If you are a SAS user, an R script is very much like your SAS programming file that you develop in the
Enhanced Editor (i.e., the .sas file).
One can create a new script or open an existing script by going to the File option in the R main menu (see
Figure 8.2). A few important facts about R scripts include: 1) Files with a .R file extension are associated (easily
recognizable) by the R program, 2) R script files, regardless of the file extension, are simple text files (so, you
could open and view them with any text editing software, however, they will only run in R), 3) when you go to
save a R script (it is highly recommended that you SAVE OFTEN, no matter what software package you are
using), unlike most software packages, R does NOT automatically save the script with the .R extension. The user
actually has to type in the .R extension at the end of the file name in the File name field when saving the script
file. We will discuss script files and using them in more detail later on.
8
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.2: The R Interface.
The R Console
9
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.3: The R Interface.
10
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
R as a Calculator
For some, it may be easy to feel overwhelmed with the R environment at first. However, it is simple. At first, just
think of R as a super graphing-calculator, much like your old Texas Instruments TI-83, but more. Notice that one
can simply type mathematical expressions into the R console, hit "Enter" and the result is printed in the R console
for your viewing (see Figure 8.4).
Figure 8.4: R as a Calculator.
11
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Much like any good calculator, there are a large number of mathematical and statistical functions that are
available to the user. Table 8.1 presents a few of these. Figure 8.5 shows some of these examples implemented
in R. Try them out.
Notice that the main argument in the mathematical functions in Table 8.1 is a single real number. However, the
statistical functions have multiple real numbers as the main argument. This brings up a very important point in
understanding how R operates and/or "thinks", if you will. R is often called an object oriented programming
language. This means that all of the data are stored in objects and that all R functions operate on objects.
Objects can be single numbers or character strings, a list of numbers or character strings (conceptualized as a
vector, but you can simply think of it as a column in a data set), or multiple lists of numbers or character strings
(conceptualized as a matrix, very much like a data set in SPSS or SAS and a worksheet in MS Excel). Put these
ideas on "hold" for the moment and we will return to them shortly.
For now, realize that, like any good graphing calculator, R can be used to make variable assignments. These
assignments allow the user to generalize and re-use code (less typing for us!!). Variable assignment is done
using an assignment statement with the "<-" (pronounced "gets") operator. The gets operator is nothing special:
it literally is the less than sign (<) followed immediately by the hyphen (-). Therefore, the statement:
is read/pronounced "a gets 4".
12
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Table 8.1: Some Basic Functions in R.
Function/
Operation/
Symbol
Description
Example
Result
3+4
3-4
3*4
3/4
7
-1
12
0.75
2^2
sqrt(4)
4
2
Mathematical
+
*
/
x^2
sqrt(x)
log(x)
log(x,base=10)
exp(x)
sin(x)
cos(x)
tan(x)
asin(x)
round(x)
Addition
Subtraction
Multiplication
Division
The power function.
The square root of x.
The natural logarithm of x (default base
of e = 2.718281…)
The logarithm of x (base of 10)
The exponential of x.
The sine function of x.
The cosine function of x.
The tangent function of x.
The arc-sine function of x.
The rounding function.
log(100) 4.60517
log(100,base=10)
2
exp(10) 22026.47
sin(100) -0.50637
cos(100) 0.862319
tan(100) -0.58721
asin(.5) 0.523599
round(4.60517)
5
Statistical
mean(x)
median(x)
sd(x)
var(x)
min(x)
max(x)
The mean of x.
The median of x.
The standard deviation of x.
The variance of x.
The minimum of x.
The maximum of x.
mean(c(3,4,5))
median(c(3,4,5))
sd(c(3,4,5))
var(c(3,4,5))
min(c(3,4,5))
max(c(3,4,5))
4
4
1
1
3
5
Figure 8.5: Basic expressions and functions in R.
14
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now, anytime "a" is used in an expression or function call, 4 is substituted for a.
So, "a" is an object. In particular, it is called a scalar (a special mathematical name for a single real number).
Now let's create a list of numbers (a vector). Once again, this is done with the gets operator, however, now we
need to tell R that the variable (or object) is a list of numbers. We do this using, think about... yes!: a function
(Everything in R is performed using functions). In this case it is the concatenate function or simply "c" for short.
For an example, let's enter the first five values for the years on the job (YRONJOB) variable from the example
WidgeOne data set into a new variable simply named "b".
Essentially, this statement reads: "b gets the list of values of 11.10, 11.00...". We hit the "Enter" button on our
keyboard after typing this. Notice that we do not get any feedback from R. Nothing happens. This is actually a
good thing. If we did it wrong, we would get an error. For example, if we forgot the concatenate function (the
"c") then we would get something like:
Not good. So, the fact that we did not get any feedback earlier when we entered the statement in correctly is
ok. The object (or vector or list of values or variable) has been properly saved in R's working memory with the
name "b". If we want to actually see it, we must type its name.
As a side note, the user can always get a list of all objects currently saved in the work space using the ls()
function.
16
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
So, right now, we have two objects saved in the work space: a & b. Once again, they are saved in R's
temporary working memory. If we were to close the program, these are erased. We will talk about saving a
session permanently later on. As another side note, the user can always click on the R console and press "Ctrl+L"
to clear the console (when it gets cluttered).
Now, realize since we have defined b as a list of numbers, we can use the statistical functions in Table 8.1 and
specify "b" as the main argument. This saves us from having to type all the data again!
We can also save these values as variables and then use them in subsequent expressions. Here we save the
mean of the vector b as a new variable called simply "m" for short.
17
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Ok, it is not part of the STAT 3010 curriculum, but you most likely remember Z-scores from elementary statistics
(one of the prerequisites for 3010, check your transcripts!). For a refresher, remember we subtract the mean
from each value of the variable of interest and then divide by its standard deviation to get a Z-score for each
value. Here is the formula (that I'm sure you know and love):
We are using this example because it is SO EASY to do in R and really showcases R's power and utility. Check
this out:
It really is that simple. Once again, the first statement reads "a new vector (variable) called z gets the value of b
minus the mean of b divided by the standard deviation of b". You would not believe how difficult this is to do in
a SAS DATA step...(of course, there is a special SAS procedure for this, however, it is still WAY too complicated to
do in a DATA step...).
Also, this showcases how R performs operations element-wise. This means that R performs a given operation on
each value of a vector separately and produces an entire vector of results whose length (the number of values
or elements in a vector) is equal to the length of the input vector (this is true unless specific matrix algebra
operations are called (you all do not need to worry about this for purposes of this course)).
18
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now pretend that we had already saved both the mean and standard deviation of b before we wanted to
calculate the Z-scores.
Then, the statement to calculate the Z-scores is even simpler:
Note: R is case-sensitive. That means that objects named "m" and "M" are different. For example:
19
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
The object m was previously defined as the mean of the vector b. However, the object M has not been
previously defined, therefore R produces an error message.
Working with Scripts
As mentioned previously, it is often not easiest to continue to type R code directly into the console for a number
of reasons. Therefore, we use scripts. Scripts allow the user to develop, debug, and save code for later use
during an R session. To open a brand new script, select New Script from the File drop down menu. A new
window within the R session will appear entitled simply "Untitled - R Editor" (see Figure 8.6).
We suggest resizing the script window and placing it side by side with the console (see Figure 8.7: the blank
script is on the right). Now you can write R code and double-check it before submitting it to the console. To
submit code to the console from the script, you have two options: 1) highlight the desired piece of code (most
often one does not want to submit a whole script at once) and copy and paste it into the console(we suggest
using "Ctrl+C" and "Ctrl+V") or 2) highlight the desired piece of code and press "Ctrl+R". We recommend option
#2.
Often times, R users will not write brand-new code for a new project, but instead work from existing code that
they developed in the past. For example, there is a sample script entitled stat.3010.R that contains all the code
necessary to perform a full STAT 3010-style analysis of the WidgeOne data. In order to open an existing script,
select Open script... from the File drop down menu, navigate to where the desired script is saved, and either
double-click on the file or single click on the file and then select Open (see Figure 8.8).
Note: you probably noticed that there are several lines in the stat.3010.R file that begin with the hash mark (#).
The hash mark in R signifies the beginning of a comment. A comment in typical computer programming is a
note to the human-users that aids in understanding the purpose of code. These comments are not processed
by the computer. In R, comments begin with a hash and continue for the rest of that line.
20
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.6: A New Script in R.
21
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.7: A Resized Script in R.
22
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.8: Opening an Existing Script.
23
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Getting Help in R
Obtaining help documentation in R is rather simple; however, the usefulness of that documentation is
debatable. Because most everything in R is accomplished using functions, the typical R user will have questions
about the use of one or more functions. In order to obtain the help page for a given function submit one of the
two options below to the R console:
help(function-name)
?function-name
In the following example, we obtain the R help page for the log function.
or
When either of these commands is submitted to the R console, the appropriate help page is opened in your
primary internet browser (however, you do not have to be currently connected to the internet. R just uses the
browser as a document viewing protocol). Our example of the log help page is presented in Figure 8.9.
24
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.9: A Typical R Help Page.
25
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now, as hinted at earlier, the utility of these help pages is debatable. It has been our experience that they
often are written for people who already know a great deal about R, and therefore are not very useful to the
nascent user. Consequently, it is a good idea to have more "help resources" in your toolbox. The most powerful
of these is the official R Help list serv.
We highly recommend that you use the R-Help list serv. You can either browser the existing discussions for a
situation like the one you are encountering (see https://stat.ethz.ch/pipermail/r-help/) or you can email the list
serv a specific question/issue that you are dealing with. Most often when you email the list serv, you will obtain
top-notch assistance for your specific problem from half a dozen "R professionals" within a short amount of time.
For more information about the list serv, go to http://www.r-project.org/mail.html. Warning: be sure to read the
posting guide before emailing the list serv (see http://www.r-project.org/posting-guide.html): There are
standards for online etiquette.
26
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Saving and Loading a Workspace
There are actually some options here; however, we have a lot to cover, so we are only going to present the
easiest approach to saving your work in R and returning to it at a later date.
To save your work, left-click on the R console in a null space so that it becomes active. Next, from the File drop
down menu, select Save Workspace... Now, specify the desired physical location and file name to which you
want to save the file and select Save. This is a very nice function: it saves all objects (data) in the current
working memory as well as your script and any changes to settings that you have made in the console.
27
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
To load or re-start a previously saved R session, yup, you guessed it: Launch a new session of R, select Load
Workspace... from the File drop-down menu, navigate to the appropriate sub-directory (folder), select the
desired file, and select Open.
Getting Data into R (Importing Data)
From what we have discussed so far, it would be possible for us to enter our data into R one column at a time
using the concatenate function. However, we have better things to do with our time (Besides, if you graduate
with a minor in statistics, you will be over-qualified for simple data entry...). So, what do we do? This is R: We use
a function!
The base R package has a number of functions that can be used to import data. The most common one is
read.table(). However, our data (i.e., the WidgeOne example data set) are saved in MS Excel. Admittedly,
importing data from Excel to R is something that R does not do very well. There are some special add-on
packages (see xlsx & xlsReadWrite) for this task, however, it is our experience that they are not very reliable (in
other words, sometimes they work and sometimes they don't...). However, R is very good at importing nonproprietary file formats (*.txt, *.csv, *.dbf, etc.). Therefore, the most reliable and stable method for importing MS
Excel data into R is to open the file in MS Excel, save it as a .csv file (comma-separated file), and then use the
proper function in R to import the .csv file.
To save a MS Excel file as a .csv file, do the following:
1) Open the file in MS Excel.
2) Select the Office Button.
3) Select "Save As..." (see Figure 8.10).
4) In the resulting dialog box, select CSV (Comma delimited) (*.csv) from the Save as type drop down menu
(see Figure 8.11).
28
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.10: The Save As Option in MS Excel.
29
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.11: The CSV File Type in the Save As Type Field.
30
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now we are ready to submit the proper function call to R to import these data. We could use the read.table()
function, however, we would need to customize the function call and it is easier to use the read.csv() function
which is already customized to import comma delimited data. The call to read.csv() is presented in the STAT
3010 example script entitled stat.3010.R. We highly recommend that you open that file and follow along with
this discussion on your own computer from here on out.
Notice that in the example script, we have specified the pathway (the physical location of where the CSV file
resides) that is specific and unique to each computer setup. You will need to customize this pathway to your
situation. The easiest way to do this is to open My Computer, navigate to the sub-directory (i.e., the folder)
where you saved the CSV file, then copy the pathway from the address bar in that dialog box (see Figure 8.12)
and paste it into the proper location in the R script. Note: You do have to specify both the pathway and the file
name in the call to read.csv(), so when pasting the pathway into the R script, do not highlight the file name.
That way you only replace the pathway during this process, not the file name (this is desired). Next, and this is
VERY IMPORTANT: the backslash character (\) in R is a special character, so after you copy and paste the
pathway, you WILL NEED to add a second backslash for the pathway to be correctly specified in R parlance.
Therefore, every \ in the pathway needs to become \\.
Once you have made the necessary changes to the call to read.csv(), notice what it does: you are giving R
instructions to read in data from the WidgeOne.csv file and save it to an R object named widge (remember, R
is case-sensitive, so widge is not the same as Widge or WIDGE). When you are ready, highlight the code and
press "Ctrl-R" to submit it to the R console (see Figure 8.13). Notice that the command is copied to the console,
however, nothing else happens. This is ok. Most often during assignment statements, no feedback from the
console is good news.
Your next step should be to verify that the data were correctly imported into R. The easiest way to do this is to
simply view the data. As mentioned previously, we view objects in R by typing their name and pressing the
"Enter" key (see Figure 8.14). So far, everything looks good!
31
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.12: Copying the Pathway to the Raw Data using the Address Bar in My Computer.
The address bar with the pathway
highlighted
32
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.13: Importing data in to R.
33
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Figure 8.14: Viewing the WidgeOne Data After Importation in the R Console.
34
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Often times, when you are working with very large datasets, it is not useful to print the entire data set at once. R
has a very nice function called head() that prints only the first 5 rows of data with the corresponding column
names.
A few side notes here are important to be aware of from here on out:
1) The read.table() and read.csv() functions return a special kind of R object: the data frame. In other words,
the widge data as currently saved in R's working memory is a data frame. A data frame is a special kind of
matrix. A matrix can be thought of as a collection of column vectors (or simply columns of data). However, in R,
a matrix must consist of all numeric or all character vectors. Statistical data, however, is most often a
combination of both numeric and character data. As mentioned a moment ago, a data frame is a special
kind of matrix: it is a matrix that may consist of a mixture of numeric and character column vectors: Exactly
what we need for most statistical applications.
2) Often times we need to work with only parts of a data frame (or matrix, or vector). There are a number of
ways to subset objects in R.
a) We may want to perform an operation (using a function!) on just one column of the widge data frame (in
others, just one variable in the WidgeOne data). We may do this using a combination of the data frame name
and the column name. The two are delimited by the special character $. For example, earlier we obtained the
35
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
mean of the first five observations of the variable years on the job (YRONJOB). Now, let's obtain the mean for all
N = 40 observations of that variable:
Once again, notice that we delimited the object name from the column name by a $. Try this with any numeric
variable in the WidgeOne data.
b) We can perform the same operation using explicit subsetting of the parent data frame (the source of the
data, in this case the widge data frame). For example, in order to perform the exact same operation using
subsetting, we specify the widge data frame name with the square brackets [ ]. R expects two arguments with
the square brackets: the rows to be used and the columns to be used. These are delimited within the brackets
with a single comma (,). Furthermore, if we leave one (or both) of these blank, R assumes we want to select all
rows and/or columns. Let's look at some examples:
YRONJOB is the eighth column or variable in the widge data frame (counting from left to right). Therefore, in
order to select (in this case print) all N = 40 observations of YRONJOB, we submit the following to the R console:
36
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
If we want the mean of YRONJOB for all N = 40, then:
Notice we obtain the same exact result that we obtained when we specified YRONJOB using names in item #1
above.
Now, perhaps we want the mean of only the first five observations of YRONJOB. We could use either of the
following:
This instructs R to obtain the mean of YRONJOB for observations 1 through (:) 5. Notice that there is only one
argument within the square brackets (there is no comma separating the rows and columns. In other words, 1:5
is considered as a single row specification by R. Furthermore, because widge$YRONJOB is a column, we do not
need to specify a column number like the example above where the object to subset (the widge data frame)
had multiple columns).
37
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Alternatively, we could subset the data frame. Here we will need to supply both a row and column argument
within the square brackets:
Notice that here we have specified the first five observations (1:5) of the 8th column of widge. We obtain the
same results.
c) Now, we often want to work with variables in R and let's face it, typing the data frame name along with the
$ character is a pain. We can make temporary copies of all columns in an object (either a data frame or
matrix) to R's working memory. Then, we could refer to them just by the column name. This is easily done using
the attach() function.
If we attempt to access the YRONJOB variable BEFORE attaching the widge data frame, R essentially tells us
that it does not exist:
38
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now, let's attach it and attempt to access the data using the exact same call to the column name:
Now, we can obtain the mean of YRONJOB with the following AFTER attaching the widge data frame:
39
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
3) We talked about subsetting a moment ago. In a similar vein, you can always obtain the total number of rows
and the total number of columns of a data frame or matrix by using the dim() function:
The dim function returns an object (i.e., a vector) of length 2: The first element is the total number of rows, the
second the total number of columns. Therefore, we now know that the widge data consists of N = 40
employees and 9 characteristics (traits, variables, columns, etc.) for those individuals. The dim() function is
appropriate for multi-dimensional arrays (i.e., matrices and data frames).
In order to obtain the length of a single column (vector), we use the length() function in like manner:
4) Before moving on, you should be aware that R uses the missing place holder "NA" for missing data. This is
much like a period for missing numeric data in SAS or SPSS. Therefore, do not be alarmed if you see "NA" values
peppered throughout your data.
40
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
5) We have already discussed how to create a new object using the gets operator. FYI: In order to remove or
delete an object from R's working memory, we use the remove() or rm() (either one works!) functions:
If we want to remove multiple objects at once, we delimit their names by commas in the reference to them in
the remove() function:
Free R Documentation 7 Manuals
There are a number of free, readily-available manuals for R on the internet. We recommend the following:
1)
2)
3)
41
This manual!
R for SAS and SPSS Users by Bob Muenchen at: http://oit.utk.edu/scc/RforSAS&SPSSusers.pdf
The Quick-R website at: http://www.statmethods.net/
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
8.1 Using R for: Measurements of Central Tendency
We have already seen a demonstration of the mean() function. We can obtain the median using the median()
function in similar fashion.
We can also obtain the mean or median (or any other summary function4) for multiple variables at once. To do
this we simply specify the appropriate columns from the widge data frame using subsetting operations we
discussed previously:
Generally in statistics, a summary function is any statistical function that "summarizes" a random variable of length N in N-1values. In
other words, a summary function summarizes a random variable in usually 1, but at the very least N-1 or fewer values than the length
of the random variable. Essentially, it is a dimension reduction. Examples include the mean, median, standard deviation, range,
quartiles, etc. The use of the term summary function here should not be confused with the actual summary function in R (The next
topic of discussion).
4
42
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
However, if we want the measures of central tendency AND other distributional information for several columns
at once, then this approach is inefficient. Alternatively, we can use the summary() function:
This is very nice: We get the mean, median, first and third quartiles, and the minimum and maximum for all
numeric variables in the data set and a basic frequency count for all character variables. IMPORTANT:
WARNING: CAUTION: Notice that R analyzes the Employee ID numbers. Is this an appropriate/meaningful/useful
analysis? Obviously the computer does not know any better, however, you, as the analyst, are held to a higher
standard. If this is confusing, you need to read Part 2 of the STAT 3010 Supplemental Text for a discussion of
identifier variables.
Notice that the TRUE quantitative variables in the WidgeOne data reside in columns 5 through 9 in the widge
data frame. Therefore, using what we learned about concerning subsetting objects in the last section, we can
obtain summary results for ONLY the quantitative variables with the following call to the summary() function:
43
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
44
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
A Stratified Analysis in R
Remember we had a detailed discussion of a stratified analysis in SAS in the STAT 3010 Supplemental Text.
Stratified analyses can also be obtained in R using the by() function:
The by() function has many (4) arguments here. The first argument is the numeric vector to be analyzed. In this
example, we are interested in estimating the average number of years on the job (YRONJOB). Therefore, we
specify the 8th column of the widge data frame which is the variable YRONJOB. Next, we specify the
stratification factor. Here we want a separate analysis for each of two groups, males and females. Therefore,
we specify Gender as the stratification factor. We could have also typed widge[,3] because Gender is the third
column vector in the widge data frame. Here, Gender works because we previously attached the widge data
frame (we would have received an error otherwise!). Next, we specify which summary function is of interest.
Here we instruct R to return the mean. Last, the na.rm argument instructs R how to deal with missing values. This
argument take two values: TRUE or T will remove any rows with missing values on either the analysis variable or
the stratification factor while FALSE or F will not remove rows with missing values (In this case, if missing values do
exist, R returns NA (missing) for the value of the function). FALSE is the default.
45
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Consider the following call to the by() function. What is being asked?
See the end of this chapter for the answer.
Now, in order to obtain frequency tables for categorical variables outside of the summary function, we use the
table() function:
46
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
8.2 Using R for: Measurements of Dispersion
Much like the mean() and median() functions, we can obtain measures of dispersion in R. The standard
deviation and the variance of a variable are obtained with the sd() and var() functions.
Just like with the other summary functions, we can obtain the measures of dispersions for multiple variables at
once using subsetting operations on the data frame of interest:
47
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Using R to Categorize a Continuous Variable
Often it is of interest to categorize or create meaningful groups or "bins" out of a continuous variable. This is
often done in applied biomedical and social science research. For example, researchers often take continuous
attributes like age, income, etc. and create groups from them. This can easily be accomplished in R using
assignment statements with the subsetting operator [ ]. See the example code below.
Here is what is happening with these lines of code. First, we are creating a new column in the widge data
frame called Jobten. Moreover, the values of Jobten are conditional on the value of YRONJOB. The first
statement essentially says, any row (and remember rows in this data frame represent employees) with a value
less than 5 for YRONJOB gets the value "New" for Jobten. Next, any row with a value of YRONJOB between 5
and just less than 10 gets the value "Experienced" for Jobten. Finally, any row with a value of greater than or
equal to 10 for YRONJOB gets the value of "Mature" for Jobten. (REMEMBER, we can reference YRONJOB
directly here because we attached the parent data frame (widge), otherwise we would need to specify
widge$YRONJOB).
48
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now, we double-check our work by printing the data frame:
Notice that the new variable Jobten was added as the 10th column to the widge data frame and the values
of Jobten are conditional on the corresponding values of YRONJOB. Look back at the code: We didn't have to
type much code in order to do this: R is very efficient at operations like this.
49
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Since we added a new column to widge, we re-attach the data frame so that Jobten is available via column
name only and then we obtain a frequency table of the newly created variable in order to summarize the
amount of professional experience of these 40 employees.
Notice, as we re-attach the data frame, R gives us a warning that it is copying over the old attached versions
of the column vectors.
8.3 Using R for: Visualization/Organization of Univariate Data
Unlike all of the other software packages discussed (with the possible exception of Minitab), R has excellent
graphing capabilities and allows the user to create and customize presentation-quality graphics.
50
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
To replicate the pie chart developed in Chapter 2, execute the following code:
Notice now nothing happens in the R console, but another graphics window opens up and the pie chart is
printed to the new window.
51
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
52
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
To replicate the bar chart in Chapter 2, execute the following code. Here we add an informative x-axis label
using the xlab argument. This argument is can be used in almost every call to an R graphing function.
53
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
54
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now, what do you think Tufte would say about this graphic? Is it appropriate? Is it correct? NO!! Why not? The
answer is because the variable Jobten is an ordinal variable and this graphic does not reflect the natural order
of the categories. Therefore, more revision is necessary in order to get this right. BTW, read Part 5 of the STAT
3010 Supplemental Text for more information on this topic.
In order to specify any variable as an ordinal variable in R, we specify it as an ordered factor. A factor is a
special variable type that instructs R that a variable is categorical by nature. We specify a variable as an
ordered factor using the ordered() function:
Notice that the old reference to the variable Jobten is now replaced by:
ordered(Jobten,c("New","Experienced","Mature"))
This is the beauty of R: you don't even have to create a new variable in order to do this (although you could...)
and because functions can be called within other functions (this is called nesting or nested functions) you can
do all of this in a few simple lines of code5. For the ordered() function, the first argument is the input variable
that you want to be treated as an ordinal variable. The second argument is a character vector (notice the
values are enclosed in quotes and delimited by commas) using the concatenate function (c()). This character
Calling a function within another function call is often done in more advanced R programming. When one function call resides
within another function call these are "called" (HA!) nested functions.
5
55
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
vector communicates the proper order of the ordinal variable values to R. See the resulting figure on the next
page.
56
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
57
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
The histogram is generated in R using the hist() function.
58
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
59
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
A simple box plot is generated using the boxplot() function.
60
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
61
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Side by side box plots are also generated using the boxplot() function. However, the structure of the arguments
is quite different here. If you want side by side box plots, boxplot() expects that you specify an expression in the
form of: "a quantitative variable is modeled as (the tilde (~) in R is read as "is modeled as") the categorical
variable (or stratification factor)". So, in the example below, we are obtaining side by side box plots of job
satisfaction stratified by job position. JOBSAT~POSITION is read as "job satisfaction is modeled as (or by) job
position". Notice now we must include the data= argument in the call to boxplot().
62
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
63
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now what would Tufte think about this graphic? Does it stand on its own? No! There are abbreviations (for
Hourly & Management for the x-axis tick mark labels)! These abbreviations are an unnecessary source of
confusion that should be avoided at all costs. Professional presentation quality statistical evidence (usually in
the form of tables and graphs) should not be confusing. Instead they should be clear, concise, easily-digestible
for the audience, and informative! We can correct this graphic using the following where we explicitly tell R
what we want printed as the x-axis tick mark labels.
64
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
65
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
8.4 Using R for: Visualization/Organization of Multivariate Data
We can also obtain 2-way contingency tables using the table() function: we simply add another column name
as a second argument (and, of course, arguments are delimited by commas). Remember, N-way contingency
tables are appropriate for summarizing the joint and marginal distributions of 2 or more categorical variables.
Here notice that the first column will be the row variable (Plant) and the second column will be the column
variable (Gender) in the resulting contingency table:
Likewise, we can obtain total percents6 for the 2-way table above by specifying the table() function as the
argument to the prop.table() function. This is an excellent example of nested functions, which we introduced
earlier.
We still call them percents even though prop.table() returns proportions. REMEMBER: In order to transform a proportion into a percent
simply multiple it by 100.
6
66
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
In order to obtain row percents for this table, we add an optional second argument to the prop.table()
function (REMEMBER: You can learn more about prop.table() by submitting either: help(prop.table) or
?prop.table to the R console).
67
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Alternatively, you could assign the results of the table() function to a matrix called t1, for example, and then
submit the call to prop.table() using t1 as the first argument:
68
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
You can also obtain the column percents in like manner:
The stacked bar chart is an excellent visualization of a 2-way contingency table. Like the simple bar chart, the
stacked bar chart is also generated in R using the barplot() function. Notice here that the first argument to this
call to barplot() is not the raw widge data, but rather the results of the table() function: Another example of
nested functions. Notice, also, that a legend is necessary for this graphic to be meaningful and we are
supplying information for the legend to be extracted from the row names of the results of the table() function.
69
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
70
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Notice we have abbreviation issues again. Therefore, we do it again, and explicitly tell R what we want printed
in the legend using the concatenate function (c()). Realize, however, it is helpful to generate the incorrect
graph once so we know for sure the order of the groups in the legend. Then, we refine it and generate a final
product appropriate for our audience. Now, we redo it:
71
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
72
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now, this graphic is incorrect for the same reason that our first univariate bar chart was incorrect: It suffers from
misrepresenting the ordered nature of the ordinal variable Jobten. Just like before, we use the ordered()
function nested within barplot() to instruct R how to order the categories:
Notice that the old reference to the variable Jobten is now replaced by:
ordered(Jobten,c("New","Experienced","Mature"))
The resulting graphic is printed on the next page.
73
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
74
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now, we are not as draconian about this, but you will notice that the printing of the legend looks a little less
than ideal here. We can actually tell R where to print the legend (do this in your assignments and REALLY
impress us!). Consider the following code:
Here we are telling R to suppress the printing of the legend through the barplot() function and using a separate
call to the legend() function where we have more control. Obtain the R help page on legend() for more details
on how this works. BTW, we figured out the appropriate x and y coordinates for the placement of the legend
here just by trial and error. The final graphic is printed on the next page.
75
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
76
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now, we can easily generate a 100% stacked bar chart simply by nesting the table() function within the
prop.table() function in the call to barplot() (Yes, there is a lot of nesting going on here. Don't forget a
parenthesis!!).
So, essentially what we are doing is generating our 100% stacked bar chart from the column percents. The only
problem is that prop.table() returns these in the form of proportions, not percents. As a result, the y-axis of our
resulting graphic ranges between 0 and 1.0. See the next page.
77
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
78
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now, here is how cool R is: You can actually specify a mathematical expression within the call to barplot().
Therefore, all we have to do to correct this is to multiple the column proportions from prop.table() by 100 WITHIN
the call to barplot(). Notice we also added a y-axis label using the ylab argument and we forced to change
the y coordinate specification (the 2nd argument) in the call to the legend() function.
79
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
80
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
A scatterplot is generated using the plot() function. The first argument is the x-axis variable, the second the yaxis variable. In order to obviate abbreviations from the start, we use the xlab and ylab arguments to provide
proper labeling for the audience.
81
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
82
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
8.5 Using R for: Random Number Generation and Simple Random Sampling
As we have seen through our previous STAT 3010 studies, there is great utility in the ability to generate random
numbers ranging from sampling applications to random assignment of observations and developing computer
simulations (ok, simulations are beyond the scope of 3010, but you will encounter these if you continue on your
journey in studying statistics). R is extraordinarily effective and efficient as a random number generator. Like the
other packages, R uses the computer clock time as the default seed for all random number functions. To
generate uniformly distributed random numbers, we use the runif() function:
In the example above, we generate 40 random numbers and store them in the vector named Ran and then
print them to the console7. The runif() function has one mandatory argument, the number of random numbers
to generate. The default is to generate numbers between 0 and 1 (which is nice). Pretend for a moment that
we really wanted a set of N = 40 random whole numbers that varied between 0 and 100. We could obtain this
by multiplying Ran by 100 and using the round function in order to round the numbers to the nearest whole
number (here we named the result R100, but this is completely arbitrary):
7
Obviously, you should not expect to obtain the same exact results as we do here do to the use of the default seed.
83
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Verify that the seed is set to the clock time by re-submitting the same code. You should obtain different values
for your N = 40 generate numbers. Next, use the set.seed() function to set the seed so that you can obtain the
same exact results at a later date (this is often desirable):
84
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Note: it is necessary to call the set.seed() function with the same initial value (here the value 974) before each
new call to the random number generating function. Also, it is important to keep in mind that set.seed() only
uses the integer portion of the initial seed value. Therefore, if a fractional value is supplied to the function,
set.seed() automatically rounds it to an integer (be mindful!).
Now if we desired to create statistically independent groups from the WidgeOne data, we use simple
assignment statements much like we did when we categorized a continuous variable.
Notice that the first assignment statement in the example above reads "the new variable Group appended to
the widge data frame gets a value of1 if the associated random number is less than .5". The second statement
is read in similar manner. We then print the results in order to confirm the effectiveness of our code.
After performing random group assignment, it is often desirable to sort the data by the new group membership.
This is easily done in R using the order() function specified within the square bracket operators:
85
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
86
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Notice that we began by re-attaching the data frame (otherwise we would have to specify widge$Group
instead of simply Group when referencing the new group membership). In this example, we not only sorted the
data by group membership, but then within groups we sorted by employee ID. Notice that we created a new
version of the widge data frame (widge2) that is sorted. The operative statement reads something like "a new
data frame named widge2 gets the old version of widge after it is sorted in ascending order (the default) by
Group and then by employee ID within Group". Also, it is important to realize that the order() function is called
in the area within the square brackets that is associated with rows. Therefore, we are sorting rows, not columns.
Packages like MS Excel, SPSS, and SAS only allow for sorting of this nature of rows, however, R is much more
flexible in this regard.
In order to obtain a simple random sample of the WidgeOne data, we use: Yes! the sample() function! In the
example below, we desire to sample the rows of the parent data frame, so the sample() function is specified
just like the order() function in the example above (i.e., in the row area within the square brackets):
87
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
88
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Here we create a new data frame named sam1. The first argument of the sample() function is the row numbers
of the parent object to sample from. Therefore, we want to sample from 1 through 40 (the nrow() function
returns the maximum number of rows of a 2-dimensional R object (like a matrix of data frame)). The second
argument is the size of the sample. So in this example, we want a sample of 30 employees from the original
data containing N = 40 employees. Finally, we specify not to perform sampling with replacement so that the
same employee cannot be chosen twice for inclusion in the sample.
8.6 Using R for: Confidence Intervals
Unlike SPSS and SAS, we are not aware of a "canned" (i.e., ready-made) function in R that calculates
confidence intervals (CIs)for the user. HOWEVER, this is a great opportunity to showcase how easily this kind of
thing can be done with a little bit of user generated code. The following code performs the CI calculation and
generates a little report:
There is a lot going on here. First, we set the alpha level to .05 which, of course, corresponds with a 95%
confidence level. Notice that alpha is not a function or an argument to a function. Here it is a simple userdefined (which means that we made it up...) R object (in this case, it is a scalar). Then, we count the number of
89
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
non-missing values of the vector JOBSAT. FYI: The is.na() function returns a logical vector of the same length of
the input vector with a TRUE or FALSE for each element answering the question "is this value/element missing
(NA)?":
We obtained 40 FALSE's because there are no missing values of the variable JOBSAT. Now, the sum function
works here because, just like SAS, R interprets TRUE as 1 and FALSE as 0. So, sum(is.na(JOBSAT)) counts the
number of missing values in the JOBSAT vector.
Now, we want the number of non-missing values, so we add the ! operator to the expression. The ! operator
means NOT in R.
90
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
As a result, we are now counting the number of non-missing values of the input vector. Of course, we use this
information to determine the degrees of freedom in the calculation of the margin of error of these CIs. Next, we
calculate both the lower confidence limit for the mean (lclm) using a number of R functions (e.g., round(),
mean(), sd(), sqrt(), and qt()). Thus far, we have discussed all of these except qt(). Like any good statistical
package, R contains a number of functions to obtain values of reference statistical distributions like the normal,
t, chi-square, and F-distributions). The qt() function returns the appropriate quantile from Student's t-distribution
given a probability value (here, 1-alpha/2) and the correct degrees of freedom (here, n-1). We then do the
same for the upper limit of this interval. Next, we calculate the associated sample mean value. Finally, we use
the cbind() (short for column bind) function to "paste" or bind the four computed scalars into a little matrix (with
only 1 row, sort of like a row vector) for ease of printing and viewing. This is very much like the output one would
obtain from SAS, however, we customized it to exactly the information we wanted.
REMEMBER: When reporting CIs ALWAYS, ALWAYS, ALWAYS provide the appropriate interpretation of the results.
For example, “Based on a representative sample of 40 employees, we are 95% confident that job satisfaction
for all employees is between 6.53 and 7.17”.
91
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
8.7 R Lagniappe
Part 1: Writing Your Own Functions
As we have seen so far, R's utility and power is a result of its efficiency and ease in customizing your own
programs and results. Let's take this a step further.
We introduced and discussed several functions that are available to the user through the base package.
Additionally, there are a number of add-on packages that allow you to use functions that other users have
written and developed (see http://www.statmethods.net/interface/packages.html for more information on R
packages). Now, we can also write functions of our own...cool.
Let's use our code for generating CIs in the previous section. What if we could generalize and package that
code so that all the user had to do is type 1 line of code to call all of our source code and compute and print
the CIs for any variable they want? It's actually pretty easy to do in R (If you are a SAS user, this would be like
writing your own procedure, however, that is not an option in SAS).
How do we write our own function? This is R!: We use a function! And, in this case, it is actually called function
(ok, we did not mean to be confusing here...). Check this out:
92
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
So, here CI gets or is defined as a function (it's not an object!) with 2 arguments: x (which we assume is a
continuous random variable8) and alpha, the significance level associated with the desired confidence level.
Then the curly braces are used to instruct R that everything within the braces is the body of the function. Notice
we made some slight changes (added a field for the variable name, the confidence level, and the margin of
error (me)). Now after we define the function, from now on all we or anyone else with this function loaded into
their R session has to do is call the CI function while supplying the appropriate information for the 2 arguments,
and the function returns the desired confidence limits and all the information associated with them. We provide
3 instances of calling the function and obtaining the results in the example below. Pretty sweet!
Here the term "random variable" is used as it is used in statistical theory: "random variable" or stochastic variable refers to a variable
whose value results from a measurement on some type of random process. It should not be confused with random number
generation, the topic of the previous section.
8
93
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
94
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Now, let's put this in hyper-drive. Let's add a default value to the alpha argument and another argument, an
optional argument, that allows the user to specify the decimal precision of the results (i.e., the number of
decimal places used in the results).
Here we add alpha=.05. Then .05 becomes the default value of alpha. The user can change it, however, if they
don't specify anything, they get 95% CIs (just like SAS!). Also we add the dec=3 specification in the call to the
function() function (HA!) and replace the value of the digits argument with dec. Now look at the sample calls
to this function on the next page.
95
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Part 2: Outputting Results from R
Ok, if one were working on a... homework, for example, one may desire to output the results they receive to a
format that can easily be used in a homework document. In that case, we will discuss 2 options for outputting R
results for 1) tabular output and 2) graphics.
Outputting Tabular Output in R
Arguably, this is another major shortcoming of R: There is no function at the present time that allows the user to
easily create properly formatted tables from R output9. The best way to create presentation-quality tables from
R output is to copy and paste the results from the console into MS EXCEL and then properly format the tabular
information in EXCEL (e.g., adding titles, table lines, replacing abbreviations, etc.). Unfortunately, even this
approach requires several steps.
9
In other words, there is no analog to SAS's ODS RTF statement in R.
96
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
1) Highlight and copy output from the R console.
97
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
2) Paste R output into MS EXCEL. Unfortunately, these "pastes" are often pasted into a single cell in EXCEL.
Therefore, the user will often have to use the Text to Columns function in the Data tab.
98
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
3) Select Finish from the resulting dialog box.
99
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
4) The information is now separated into separate columns.
100
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
4) Next, use basic MS EXCEL functionality to properly format the table.
101
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
5) Finally, copy and paste this formatted table into a word processing document.
102
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Outputting Graphics in R
The easiest way (this is R, there are a number of ways to do this) to incorporate a graphic generated in R into a
word processing document (we are going to use MS Word for this example) is to instruct R to generate a JPEG
(*.jpg) image file of the graphic of interest and then insert that image into MS Word.
First, to instruct R to generate the JPEG file, we use the jpeg() function:
Notice that we specify only 1 argument in the jpeg function: the physical location (pathway) and filename in
quotes. If you copy and paste the pathway from My Computer (as discussed at the very beginning of this
chapter), REMEMBER you will need to change the single backslashes (\) to double backslashes (\\) for R to
read the pathway correctly. Next, specify the desired call to an R graphing function (we highly recommend
that you develop, debug, and confirm that this function call is error-free BEFORE attempting to use it to
generate the JPEG file). It is best to specify only 1 graphics call with each call to the jpeg() function. Next, we
turn off the jpeg output stream using the dev.off() function.
103
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
One can view the contents of the sub-directory (folder) where the file was saved using My Computer (in
Windows) (This is optional):
104
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
The *.jpg file is a simple image file which is accessible via a number of
software packages. When we double-click on the file, the file opens using
HP MediaSmart Photo (this depends on the particular software that is set as
the default image-viewing software on your machine).
To insert this image into MS Word, navigate to the location within the Word document where you want the
image and select the Insert tab.
From the Insert tab, select Picture in the Illustrations group:
107
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
In the resulting Insert Picture dialog box, navigate to the sub-directory where you instructed R to save the JPEG
file, select the file, and select the Insert button.
108
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
The image is inserted into the document.
109
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
It is most appropriate to center these usually. With the graphic highlighted (the default after inserting it),
navigate to the Home tab.
110
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Select the Center button in the Paragraph group. Note: The lines around the graphic will disappear when you
select a null space in the document. Nice = presentation quality.
111
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
8.8 R Chapter Answers (Actually there is only one...)
This call to by() requests a stratified analysis of the standard deviation of the Productivity Scores by Plant while
removing rows with missing values.
112
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Congratulations. You are now a Geek. Take a bow.
113
Developed and maintained by the Mathematics and Statistics Department of Kennesaw State University
Download