Supplemental Materials: Two methods are used in this paper to

advertisement
Supplemental Materials:
Two methods are used in this paper to analyze the effect of various hypothetical influences on these
patterns. In the first, used mostly for qualitative analysis, a normal distribution with a population size of
1000, a mean of 100, and a standard deviation of 15 was generated (using Wessa 2008); the populations
so-generated were then manipulated as described in the text using Excel with differences in the
population variances tested for statistical significance (at the .05 level) via the F-Test in Excel.
In the second, R (with the “car” package installed) was used to provide better quantitative estimates of
the power of the statistical tests to detect differences in the variance; here, Levene's test of
homoscedasticity of variances was used to test for statistical significance (.05); samples were rerun for
10,000 loops in these cases.
References:
R Development Core Team. 2013. R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria.
Wessa P., (2008), Random Number Generator for the Normal Distribution (v1.0.8) in Free Statistics
Software (v1.1.23-r7), Office for Research Development and Education, URL
http://www.wessa.net/rwasp_rngnorm.wasp/
R-Scripts for Simulations:
#Script for treating X-Factors as multiple independent factors each of which has a uniform distribution:
library(car)
setwd("your working directory here")
#set the number of loops you want.
loops <- 10000
#set means for each population
m1 <- 100
m2 <- 100
#set standard deviations of each population
s1 <- 15
s2 <- 15
#set the size of the samples of each population
size1 <- 5000
size2 <- 5000
# subtract some normally distributed population of numbers from sample 2
#set the upper limit of the range of numbers you want to subtract from each number in sample 2
number.limit <- 10
# set the number of numbers, each randomly selected from this range, that you want you want to
subtract from sample 1
number.numbers <- 3
results <- NULL
for(i in 1:loops){
#generates the samples for each run in the loop
sample1 <- rnorm(size1, mean = m1, sd = s1)
sample2 <- rnorm(size2, mean = m2, sd = s2)
#generates the population(s) of numbers to be subtracted from each number in sample2 (which
will be re-generated for each run in the loop)
number.pop <- matrix(nrow = size2, ncol = number.numbers)
#subtracts the population(s) of numbers from sample1 (the population(s) will be re-generated
for each run in the loop)
for(i in 1:ncol(number.pop)){
for(j in 1:nrow(number.pop)){
number.pop[j,i] <- sample(0:number.limit, 1)
}
}
#subtracts the population of numbers from sample2
for(i in 1:ncol(number.pop)){
sample3 <- sample2
sample3 <- sample3 - number.pop[,i]
}
#Levene's test of homoscedasticity of variances of sample1 versus sample3
y <- c(sample1, sample3)
group <- as.factor(c(rep(1, length(sample1)), rep(2, length(sample3))))
results <- c(results, leveneTest(y, group)[1,3])
}
#set the alpha for determining significance
alpha = 0.05
#reports how many results were significant
length(which(results < alpha))
#notes the total number of results as a reminder
length(results)
#reports proportion of the results are less than the significance threshold set above
length(which(results < alpha))/length(results)
#Script for treating X-Factors as a single factor with a normal distribution with a particular normal
distribution.
library(car)
setwd("your working directory here")
#set the number of loops
loops <- 10000
#set means for each population
m1 <- 100
m2 <- 100
#set standard deviations of each population
s1 <- 15
s2 <- 15
#set the size of the samples of each population
size1 <- 5000
size2 <- 5000
#now to subtract some normally distributed population of numbers from sample 2
#set the mean of the population of numbers to subtract from each number in sample 2
number.mean <- 15
#set the standard deviation of the population of numbers to subtract from each number in sample 2
number.sd <- 2.25
results <- NULL
for(i in 1:loops){
#generates the samples for each run in the loop
sample1 <- rnorm(size1, mean = m1, sd = s1)
sample2 <- rnorm(size2, mean = m2, sd = s2)
#generates the population of numbers to be subtracted from each number in sample2 (which
will be re-generated for each run in the loop)
number.pop <- rnorm(size2, mean = number.mean, sd = number.sd)
#subtracts the population of numbers, centered around the mean specified above, from
sample2
sample3 <- sample2 - number.pop
#Levene's test of homoscedasticity of variances of sample1 versus sample3
y <- c(sample1, sample3)
group <- as.factor(c(rep(1, length(sample1)), rep(2, length(sample3))))
results <- c(results, leveneTest(y, group)[1,3])
}
percentrank <- function(table, x = table) {
table <- sort(table)
ties <- ifelse(match(x, table, nomatch = 0), "min", "ordered")
len <- length(table)
f <- function(x, ties)
(approx(table, seq(0, len = len), x, ties = ties)$y) / (len - 1)
mapply(f, x, ties)
}
#set the alpha for determining significance
alpha = 0.05
#reports how many results were significant
length(which(results < alpha))
#notes the total number of results as a reminder
length(results)
#reports what proportion of the results are less than the significance threshold set above
length(which(results < alpha))/length(results)
Download