See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/359161855 Understanding Research in Second Language Learning~James Dean Brown by Zahra Farajnezhad Presentation · March 2022 DOI: 10.13140/RG.2.2.24711.27048 CITATIONS READS 0 819 2 authors, including: Zahra Farajnezhad Islamic Azad University, Najafabad Branch 24 PUBLICATIONS 605 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Working Memory and Second Language Learning View project All content following this page was uploaded by Zahra Farajnezhad on 11 March 2022. The user has requested enhancement of the downloaded file. Understanding Research in Second Language Learning Dean Brown James Chapter1 what is research? types of research Research defines as a systematic inquiry to describe, explain, predict, and control the observed phenomenon. research involves inductive and deductive methods. Inductive research methods are used to analyze an observed event. Deductive methods are used to verify the observed event. Inductive approaches are associated with qualitative research and deductive methods are more commonly associated with quantitative research. Characteristics of Research A systematic approach must be followed for accurate data. Rules and procedures are an integral part of the process Research is based on logical reasoning and involves both inductive and deductive methods. The data or knowledge that is derived is in real-time from actual observations in natural settings. There is an in-depth analysis of all data collected so that there are no anomalies associated with it. research creates a path for generating new questions. Existing data helps create more opportunities for research. Research is analytical in nature. It makes use of all the available data so that there is no ambiguity in inference. Accuracy is one of the most important aspects of research. The information that is obtained should be accurate and true to its nature. Primary Research A primary source provides direct or firsthand evidence about an event, object, person, or work of art. Primary sources include historical and legal documents, eyewitness accounts, results of experiments, statistical data, pieces of creative writing, audio and video recordings, speeches, and art objects. It consists of primary, firsthand information. Primary data refers to the first hand data gathered by the researcher himself. Secondary research Unlike primary research, secondary research is developed with information from secondary sources, which are generally based on scientific literature and other documents compiled by another researcher. Secondary Page | 1 data means data collected by someone else earlier. Surveys, observations, experiments, questionnaire, personal interview, etc. Government publications, websites, books, journal articles, internal records etc. Primary research Secondary research Data was collected By you (or a company hire you) By someone else Examples Surveys Focus groups Interviews Observations Experiments N/A – the act of looking for existing data is secondary research Qualitative/Quantitative Can be either Can be either Key Benefits Specific to your needs and you control the quality Usually cheap and quick Disadvantages Usually costs more and take a lot of time Data can be too old Or not specific enough for your needs Primary research It subdivided in to: 1. Case studies 2. Statistical studies Statistical studies fall into two additional subcategories: Survey studies and Experimental studies Survey studies is a quantitative and qualitative method with two important characteristics. First, the variables of interest are measured using self-reports. Survey researchers ask their participants (who are often called respondents in survey research) to report directly on their own thoughts, feelings, and behaviour. Second, considerable attention is paid to the issue of sampling. Survey researchers have a strong preference for large random samples because they provide the most accurate estimates of what is true in the population. Surveys can be long or short. They can be conducted in person, by telephone, through the mail, or over the Internet. Survey Research is defined as the process of conducting research using surveys that researchers send to survey respondents. The data collected from surveys is then statistically analyzed to draw meaningful research conclusions. Experimental studies are study in which a treatment, procedure, or program is intentionally introduced and a result or outcome is observed. True experiments have four elements: manipulation, control, random assignment, and random selection. The most important of these elements are manipulation and control. Manipulation means that something is purposefully changed by the researcher in the environment. Control is used to prevent outside factors from influencing the study outcome. experiments involve highly controlled and systematic procedures in an effort to minimize error and bias which also increases our confidence that the manipulation “caused” the outcome. You can conduct experimental research in the following situations: Time is a vital factor in establishing a relationship between cause and effect. Page | 2 Invariable behavior between cause and effect. You wish to understand the importance of the cause and effect. Characteristics of Statistical studies [1] Systematic research: it has a clear structure with definite procedural rules that must be followed. These rules can help you read, interpret, and critique statistical studies. [2] Logical research: the rules and procedures underlying these studies form a straightforward, logical pattern- a step by step progression of buildings blocks, each of which is necessary for the logic to succeed. If the procedures are violated one or more building blocks may be missing and the logic will break down. [3] Tangible research: It is based on the collection and manipulation of data from the real world. The set of data may take the form of test scores, students’ ranks on course grades, the number of language learners who have certain characteristics, and so on. The must be quantifiable, each data must be number that represents some well-defined quantity, rank, or category. It is the manipulation or processing of these data that links the study to the real world. [4] Replicable research: the researcher’s proper presentation and explanation of the system, logic, data collection and data manipulation in a study should make it possible for the reader to replicate the study (do it again under the same condition). [5] Reductive research: statistical research can reduce the confusion of facts that language and language teaching frequently present sometimes on a daily basis. You may discover new patterns in the facts or through these investigations and the eventual agreement among many researchers, general patterns and relationships may emerge that clarify the field as whole. It is these qualities that make statistical research reductive. The value of Statistical Research Surveys and experimental studies can provide important on individuals and groups that is not available in other types of research. 12345- Systematically structured with definite procedural rules Based on a step-by-step logical patterns Based on tangible, quantified information, called data Replicable in that it should be possible to do them again Reductive in that they can help form patterns in the seeming confusion of facts that surround us. ____________________________________________________________________________ Chapter2 Variables Definition: an element, feature, or factor that is liable to vary or change. A variable is any characteristics, number, or quantity that can be measured or counted. A variable may also be called a data item. Age, sex, business income and expenses, country of birth, capital expenditure, class grades, eye colour and vehicle type are examples of variables. Page | 3 operationalization of variables Operationalization means turning abstract concepts into measurable observations. Although some concepts, like height or age, are easily measured, others, like spirituality or anxiety, are not. Through operationalization, you can systematically collect data on processes and phenomena that aren’t directly observable. In quantitative research, it’s important to precisely define the variables that you want to study. Without specific operational definitions, researchers may measure irrelevant concepts or inconsistently apply methods. Operationalization reduces subjectivity and increases the reliability of your study. Operationalization example: The concept of social anxiety can’t be directly measured, but it can be operationalized in many different ways. For example: self-rating scores on a social anxiety scale number of recent behavioral incidents of avoidance of crowded places intensity of physical anxiety symptoms in social situations Independent vs. Dependent Variables In research, variables are any characteristics that can take on different values, such as height, age, temperature, or test scores. Researchers often manipulate or measure independent and dependent variables in studies to test cause-and-effect relationships. The independent variable is the cause. Its value is independent of other variables in your study. An independent variable is the variable you manipulate or vary in an experimental study to explore its effects. It’s called “independent” because it’s not influenced by any other variables in the study. The dependent variable is the effect. Its value depends on changes in the independent variable. A dependent variable is the variable that changes as a result of the independent variable manipulation. It’s the outcome you’re interested in measuring, and it “depends” on your independent variable. Example: Independent and dependent variables You design a study to test whether changes in room temperature have an effect on math test scores. Your independent variable is the temperature of the room. You vary the room temperature by making it cooler for half the participants, and warmer for the other half. Your dependent variable is math test scores. You measure the math skills of all participants using a standardized test and check whether they differ based on room temperature. Mediator vs Moderator Variables A mediating variable (or mediator) explains the process through which two variables are related, while a moderating variable (or moderator) affects the strength and direction of that relationship. Including mediators and moderators in your research helps you go beyond studying a simple relationship between two variables for a fuller picture of the real world. These variables are important to consider when studying complex correlational or causal relationships between variables. Mediating variables Page | 4 A mediator is a way in which an independent variable impacts a dependent variable. It’s part of the causal pathway of an effect, and it tells you how or why an effect takes place. If something is a mediator: It’s caused by the independent variable. It influences the dependent variable When it’s taken into account, the statistical correlation between the independent and dependent variables is higher than when it isn’t considered. Mediation analysis is a way of statistically testing whether a variable is a mediator using linear regression analyses or ANOVAs. Moderating variables A moderator influences the level, direction, or presence of a relationship between variables. It shows you for whom, when, or under what circumstances a relationship will hold. Moderators usually help you judge the external validity of your study by identifying the limitations of when the relationship between variables holds. For example, while social media use can predict levels of loneliness, this relationship may be stronger for adolescents than for older adults. Age is a moderator here. Moderators can be: Categorical variables such as ethnicity, race, religion, favorite colors, health status, or stimulus type, Quantitative variables such as age, weight, height, income, or visual stimulus size. Extraneous Variables In an experiment, an extraneous variable is any variable that you’re not investigating that can potentially affect the outcomes of your research study. If left uncontrolled, extraneous variables can lead to inaccurate conclusions about the relationship between independent and dependent variables. Extraneous variables can threaten the internal validity of your study by providing alternative explanations for your results. In an experiment, you manipulate an independent variable to study its effects on a dependent variable. Confounding variables A confounding variable is a type of extraneous variable that is associated with both the independent and dependent variables. A confounding variable influences the dependent variable, and also correlates with or causally affects the independent variable. To ensure the internal validity of your research, you must account for confounding variables. If you fail to do so, your results may not reflect the actual relationship between the variables that you are interested in. A variable must meet two conditions to be a confounder: It must be correlated with the independent variable. This may be a causal relationship, but it does not have to be. It must be causally related to the dependent variable. Control Variables A control variable is anything that is held constant or limited in a research study. It’s a variable that is not of interest to the study’s aims, but is controlled because it could influence the outcomes. Variables may be controlled directly by holding them constant throughout a study (e.g., by controlling the room temperature Page | 5 in an experiment), or they may be controlled indirectly through methods like randomization or statistical control (e.g., to account for participant characteristics like age in statistical tests). Control variables enhance the internal validity of a study by limiting the influence of confounding and other extraneous variables. This helps you establish a correlational or causal relationship between your variables of interest. Aside from the independent and dependent variables, all variables that can impact the results should be controlled. If you don’t control relevant variables, you may not be able to demonstrate that they didn’t influence your results. Uncontrolled variables are alternative explanations for your results. In an experiment, a researcher is interested in understanding the effect of an independent variable on a dependent variable. Control variables help you ensure that your results are solely caused by your experimental manipulation. _______________________________________________________________________ Chapter3 Scales What is the Scale? A scale is a device or an object used to measure or quantify any event or another object. Levels of Measurements The data can be defined as being one of the four scales. The four types of scales are: Nominal Scale Ordinal Scale Interval Scale Ratio Scale Page | 6 Nominal Scale A nominal scale is the 1st level of measurement scale in which the numbers serve as “tags” or “labels” to classify or identify the objects. A nominal scale usually deals with the non-numeric variables or the numbers that do not have any value. Characteristics of Nominal Scale A nominal scale variable is classified into two or more categories. In this measurement mechanism, the answer should fall into either of the classes. It is qualitative. The numbers are used here to identify the objects. The numbers don’t define the object characteristics. The only permissible aspect of numbers in the nominal scale is “counting.” Example: An example of a nominal scale measurement is given below: What is your gender? M- Male F- Female Here, the variables are used as tags, and the answer to this question should be either M or F. Ordinal Scale The ordinal scale reports the ordering and ranking of data without establishing the degree of variation between them. Ordinal represents the “order.” Ordinal data is known as qualitative data or categorical data. It can be grouped, named and also ranked. Characteristics of the Ordinal Scale The ordinal scale shows the relative ranking of the variables It identifies and describes the magnitude of a variable Along with the information provided by the nominal scale, ordinal scales give the rankings of those variables The interval properties are not known The surveyors can quickly analyse the degree of agreement concerning the identified order of variables Example: Ranking of school students – 1st, 2nd, 3rd, etc. Ratings in restaurants Evaluating the frequency of occurrences Very often Often Not often Not at all Assessing the degree of agreement Totally agree Agree Neutral Page | 7 Disagree Totally disagree Interval Scale The interval scale is the 3rd level of measurement scale. It is defined as a quantitative measurement scale in which the difference between the two variables is meaningful. In other words, the variables are measured in an exact manner, not as in a relative way in which the presence of zero is arbitrary. Characteristics of Interval Scale: The interval scale is quantitative as it can quantify the difference between the values It allows calculating the mean and median of the variables To understand the difference between the variables, you can subtract the values between the variables The interval scale is the preferred scale in Statistics as it helps to assign any numerical values to arbitrary assessment such as feelings, calendar types, etc. Example: Likert Scale Net Promoter Score (NPS) Bipolar Matrix Table Ratio Scale The ratio scale is the 4th level of measurement scale, which is quantitative. It is a type of variable measurement scale. It allows researchers to compare the differences or intervals. The ratio scale has a unique feature. It possesses the character of the origin or zero points. Characteristics of Ratio Scale: Ratio scale has a feature of absolute zero It doesn’t have negative numbers, because of its zero-point feature It affords unique opportunities for statistical analysis. The variables can be orderly added, subtracted, multiplied, divided. Mean, median, and mode can be calculated using the ratio scale. Ratio scale has unique and useful properties. One such feature is that it allows unit conversions like kilogram – calories, gram – calories, etc. Example: An example of a ratio scale is: What is your weight in Kgs? Less than 55 kgs 55 – 75 kgs 76 – 85 kgs 86 – 95 kgs More than 95 kgs _________________________________________________________________________________ Page | 8 Chapter4 Controlling Extraneous Variables Validity refers to how accurately a method measures what it is intended to measure. If research has high validity, that means it produces results that correspond to real properties, characteristics, and variations in the physical or social world. High reliability is one indicator that a measurement is valid. The validity of a study will be approached form 4 perspectives: Environmental issue Grouping issue People issue Measurement issue Environmental issue: 1_ Naturally occurring variables: some variables can affect the results. Primary among the environmental variables that may be encountered in studies of language teaching are noise, temperature, adequacy of light, time of day and seating arrangements. All such potential environmental variables should be considered by the researcher as well as by reader. 2_ Artificiality: another environmental issue that may alter the intentions of a study is the artificiality of the arrangements within the study. This issue is analogous to the issue of whether experimental mice will perform the same way in a laboratory, in which the conditions are artificial as they would in the real world. Grouping issue 1_Self-selection 2_Mortality 3_Maturation Self-selection It generally refers to the practice of letting the subjects decide which group to join. There are certain dangers inherent in this practice. Volunteering is just one way that self-selection may occur. Another notso-obvious form of self-selection may happen when two existing classes are compared. Inequality of between groups may occur. Mortality Mortality in the sample refers to a problem that arises when employing a longitudinal design and participants who start the research process are unable to complete the process. Essentially, the persons begin the research process but then fail to complete the entire set of research procedures and measurements. The impact of the mortality, or dropout, rate of participants poses a threat to any empirical evaluation because the reasons for the lack of completion may relate to some underlying process that undermines any claim. This entry describes the process of mortality and the implications for the conduct of research as well as different means to assess or prevent the occurrence. Maturation The maturation effect is defined as when any biological or psychological process within an individual that occurs with the passage of time has an impact on research findings. When a study focuses on people, maturation is likely to threaten the internal validity of findings. Internal validity is concerned with correctly concluding that an independent variable and not some extraneous variable is responsible for a change in the Page | 9 dependent variable. Over time, people change and these maturity processes can affect findings. Most participants can, over time, improve their performance regardless of treatment. This can apply to many types of studies in the physical or social sciences, psychology, management, education, and many other fields of study. People issue 1_Hawthorne effect 2_Halo effect 3_Subject expectancy 3_Researcher expectancy Hawthorne effect The Hawthorne effect refers to the increase in performance of individuals who are noticed, watched, and paid attention to by researchers or supervisors. The Hawthorne effect refers to a tendency in some individuals to alter their behavior in response to their awareness of being observed. This phenomenon implies that when people become aware that they are subjects in an experiment, the attention they receive from the experimenters may cause them to change their conduct. Halo effect The halo effect is a well-documented social-psychology phenomenon that causes people to be biased in their judgments by transferring their feelings about one attribute of something to other, unrelated, attributes. For example, a tall or good-looking person will be perceived as being intelligent and trustworthy, even though there is no logical reason to believe that height or looks correlate with smarts and honesty. The halo effect works both in both positive and negative directions: If you like one aspect of something, you'll have a positive predisposition toward everything about it. If you dislike one aspect of something, you'll have a negative predisposition toward everything about it. Subject expectancy In scientific research the subject-expectancy effect, is a form of reactivity that occurs when a research subject expects a given result and therefore unconsciously affects the outcome, or reports the expected result. Because this effect can significantly bias the results of experiments (especially on human subjects), double-blind methodology is used to eliminate the effect. Researcher expectancy It refers to how the perceived expectations of an observer can influence the people being observed. This term is usually used in the context of research, to describe how the presence of a researcher can influence the behavior of participants in their study. This may lead researchers to draw inaccurate conclusions. Specifically, since the observer expectancy effect is characterized by participants being influenced by the researcher’s expectations, it may lead the research team to conclude that their hypothesis was correct. False positives in research can have serious implications. The observer expectancy effect arises due to demand characteristics, which are subtle cues given by the researcher to the participant about the nature of the study, as well as confirmation bias, which is when the researcher collects and interprets data in a way that confirms their hypothesis and ignores information that contradicts it. Page | 10 Measurement issue 1_Practice effect 2_Reactivity effect 3_Instability of measures and instruments Practice effect Practice effects, defined as improvements in cognitive test performance due to repeated evaluation with the same test materials, have traditionally been viewed as sources of error variance rather than diagnostically useful information. Recently, however, it has been suggested that this psychometric phenomenon might prove useful in predicting cognitive outcome. any change or improvement that results from practice or repetition of task items or activities. The practice effect is of particular concern in experimentation involving within-subjects’ designs, as participants’ performance on the variable of interest may improve simply from repeating the activity rather than from any study manipulation imposed by the researcher. Reactivity effect Reactivity, also known as the observer effect, takes place when the act of doing the research changes the behavior of participants, thereby making the findings of the research subject to error. Reactivity occurs when the subject of the study (e.g. survey respondent) is affected either by the instruments of the study or the individuals conducting the study in a way that changes whatever is being measured. In survey research, the term reactivity applies when the individual's response is influenced by some part of the survey instrument (e.g. an item on a questionnaire); the interviewer; the survey organization sponsor conducting the study, or both; or the environment where the survey is taking place. For example, the respondent may respond positively or negatively based on the interviewer's reactions to the answer. A smile, nod, frown, or laugh may alter how the subject chooses to respond to subsequent questions. Instability of measures and instruments It refers to the degree to which the results on the measures are consistent. Instability of the results of a study refers to the degree to which the results would be likely to recur if the study was replicated. It is not possible to control both types of instability; statistical procedures provide the researcher with tools to determine the degree to which measures are stable or consistent. Double-Blind Study In experimental research, subjects are randomly assigned to either a treatment or control group. A doubleblind study withholds each subject’s group assignment from both the participant and the researcher performing the experiment. If participants know which group they are assigned to, there is a risk that they might change their behavior in a way that would influence the results. If researchers know which group a participant is assigned to, they might act in a way that reveals the assignment or directly influences the results. Double blinding guards against these risks, ensuring that any difference between the groups can be attributed to the treatment. When the researchers administering the experimental treatment are aware of each participant’s group assignment, they may inadvertently treat those in the control group differently from those in the treatment group. This could reveal to participants their group assignment, or even directly influence the outcome itself. In double-blind experiments, the group assignment is hidden from both the participant and the person administering the experiment. Page | 11 Controlling extraneous variables Four Perspectives Potential problem Steps toward control 1. Natural variables Prearrangement of conditions 2. Artificiality Approximation of “natural” conditions 1. Self-selection Random, matched-pair, or stratified assignment 2. Mortality Short duration, track down missing subjects 3. Maturation Short duration or built in moderator or control variables 1. Hawthorne effect Double-blind technique 2. Halo effect Built in general attitude as moderator or control variables 3. Subject expectancy Minimize obviousness of aims, Distraction from aims provided. 4. Researcher expectancy Double-blind technique 1. Practice effect Counterbalancing 2. Reactivity Careful study of measures 3. Instability of measures and results Statistical estimates of stability and probability A. Environment B. Grouping C. People D. Measurement Counterbalancing Counterbalancing is a procedure that allows a researcher to control the effects of nuisance variables in designs where the same participants are repeatedly subjected to conditions, treatments, or stimuli (e.g., within-subjects or repeated-measures designs). Counterbalancing refers to the systematic variation of the order of conditions in a study, which enhances the study’s interval validity. In the context of experimental designs, the most common nuisance factors (confounds) to be counterbalanced are procedural variables (i.e., temporal or spatial position) that can create order and sequence effects. In quasi-experimental designs, blocking variables (e.g., age, gender) can also be counterbalanced to control their effects on the dependent variable of interest, thus compensating for the lack of random assignment and the potential confounds due to systematic selection bias. _____________________________________________________________________________________ Page | 12 Chapter6 Mean, Mode, Median What is frequency in Research? A frequency is the number of times a data value occurs. For example, if four people have an IQ of between 118 and 125, then an IQ of 118 to 125 has a frequency of 4. Frequency is often represented by the letter f. Frequency distribution The frequency (f) of a particular value is the number of times the value occurs in the data. The distribution of a variable is the pattern of frequencies, meaning the set of all possible values and the frequencies associated with these values. Frequency distributions are portrayed as frequency tables or charts. Frequency distributions can show either the actual number of observations falling in each range or the percentage of observations. In the latter instance, the distribution is called a relative frequency distribution. Frequency distribution tables can be used for both categorical and numeric variables. Continuous variables should only be used with class intervals, which will be explained shortly. Frequency distribution can be visualized using: a pie chart (nominal variable), a bar chart (nominal or ordinal variable), a line chart (ordinal or discrete variable), or a histogram (continuous variable). Measures of central tendency The best way to summarize a data set with a single value is to find the most representative value, the one that indicates where the centre of the distribution is. This is called the central tendency. The three most commonly used measures of central tendency are The arithmetic mean, which is the sum of all values divided by the number of values, The median, which is the middle value when all values are arranged in increasing order, The mode, which is the most typical value, the one that appears the most often in the data set. Calculating the mean The mean can be calculated only for numeric variables, no matter if they are discrete or continuous. It’s obtained by simply dividing the sum of all values in a data set by the number of values. The calculation can be done from raw data or for data aggregated in a frequency table. Example: Mount Rival hosts a soccer tournament each year. This season, in 10 games, the lead scorer for the home team scored 7, 5, 0, 7, 8, 5, 5, 4, 1 and 5 goals. What is the mean score of this player? The sum of all values is 47 and there are 10 values. Therefore, the mean is 47 ÷ 10 = 4.7 goals per game. Calculating the median The median is the value in the middle of a data set, meaning that 50% of data points have a value smaller or equal to the median and 50% of data points have a value higher or equal to the median. For a small data set, you first count the number of data points (n) and arrange the data points in increasing order. If the number of data points is uneven, you add 1 to the number of points and divide the results by 2 to get the rank of the Page | 13 data point whose value is the median. The rank is the position of the data point after the data set has been arranged in increasing order: the smallest value is rank 1, the second-smallest value is rank 2, etc. The advantage of using the median instead of the mean is that the median is more robust, which means that an extreme value added to one extremity of the distribution don’t have an impact on the median as big as the impact on the mean. Therefore, it is important to check if the data set includes extreme values before choosing a measure of central tendency. Example: Imagine that a top running athlete in a typical 200-metre training session runs in the following times: 26.1 seconds, 25.6 seconds, 25.7 seconds, 25.2 seconds, 25.0 seconds, 27.8 seconds and 24.1 seconds. How would you calculate his median time? There are n = 7 data points, which is an uneven number. The median will be the value of the data points of rank: (n + 1) ÷ 2 = (7 + 1) ÷ 2 = 4.///// The median time is 25.6 seconds. Calculating the mode the mode is the value that appears the most often in a data set and it can be used as a measure of central tendency, like the median and mean. But sometimes, there is no mode or there is more than one mode. There is no mode when all observed values appear the same number of times in a data set. There is more than one mode when the highest frequency was observed for more than one value in a data set. In both of these cases, the mode can’t be used to locate the centre of the distribution. The mode can be used to summarize categorical variables, while the mean and median can be calculated only for numeric variables. This is the main advantage of the mode as a measure of central tendency. It’s also useful for discrete variables and for continuous variables when they are expressed as intervals. The mode is not used as much for continuous variables because with this type of variable, it is likely that no value will appear more than once. For example, if you ask 20 people their personal income in the previous year, it’s possible that many will have amounts of income that are very close, but that you will never get exactly the same value for two people. In such case, it is useful to group the values in mutually exclusive intervals and to visualize the results with a histogram to identify the modal-class interval. Measures of dispersion Measures of central tendency aim to identify the most representative value of a data set, that is, the centre of a distribution. To better describe the data, it is also good to have a measure of the spread of the data Page | 14 around the centre of the distribution. This measure is called a measure of dispersion. The most commonly used measures of dispersion are The range, which is the difference between the highest value and the smallest value; The interquartile range, which is the range of the 50% of data that is central to the distribution; The variance, which is the mean squared distance between each point and the centre of the distribution; The standard deviation, which is the square root of variance. KEY notes Frequency distribution in statistics is a representation that displays the number of observations within a given interval. The representation of a frequency distribution can be graphical or tabular so that it is easier to understand. Frequency distributions are particularly useful for normal distributions, which show the observations of probabilities divided among standard deviations. In finance, traders use frequency distributions to take note of price action and identify trends. _______________________________________________________________________________________ Chapter8 Reliability, types and Validity The 4 Types of Reliability Reliability tells you how consistently a method measures something. When you apply the same method to the same sample under the same conditions, you should get the same results. If not, the method of measurement may be unreliable. There are four main types of reliability. Each can be estimated by comparing different sets of results produced by the same method. Test-retest reliability Test-retest reliability measures the consistency of results when you repeat the same test on the same sample at a different point in time. You use it when you are measuring something that you expect to stay constant in your sample. Many factors can influence your results at different points in time: for example, respondents might experience different moods, or external conditions might affect their ability to respond accurately. Page | 15 Test-retest reliability can be used to assess how well a method resists these factors over time. The smaller the difference between the two sets of results, the higher the test-retest reliability. How to measure it To measure test-retest reliability, you conduct the same test on the same group of people at two different points in time. Then you calculate the correlation between the two sets of results. Interrater reliability Interrater reliability measures the degree of agreement between different people observing or assessing the same thing. You use it when data is collected by researchers assigning ratings, scores or categories to one or more variables. People are subjective, so different observers’ perceptions of situations and phenomena naturally differ. Reliable research aims to minimize subjectivity as much as possible so that a different researcher could replicate the same results. When designing the scale and criteria for data collection, it’s important to make sure that different people will rate the same variable consistently with minimal bias. This is especially important when there are multiple researchers involved in data collection or analysis. How to measure it To measure interrater reliability, different researchers conduct the same measurement or observation on the same sample. Then you calculate the correlation between their different sets of results. If all the researchers give similar ratings, the test has high interrater reliability. Internal consistency Internal consistency assesses the correlation between multiple items in a test that are intended to measure the same construct. You can calculate internal consistency without repeating the test or involving other researchers, so it’s a good way of assessing reliability when you only have one data set. When you devise a set of questions or ratings that will be combined into an overall score, you have to make sure that all of the items really do reflect the same thing. If responses to different items contradict one another, the test might be unreliable. How to measure it Two common methods are used to measure internal consistency. Average inter-item correlation: For a set of measures designed to assess the same construct, you calculate the correlation between the results of all possible pairs of items and then calculate the average. Split-half reliability: You randomly split a set of measures into two sets. After testing the entire set on the respondents, you calculate the correlation between the two sets of responses. Which type of reliability applies to my research? It’s important to consider reliability when planning your research design, collecting and analyzing your data, and writing up your research. The type of reliability you should calculate depends on the type of research and your methodology. Page | 16 The 4 Types of Validity Validity tells you how accurately a method measures something. If a method measures what it claims to measure, and the results closely correspond to real-world values, then it can be considered valid. There are four main types of validity: Construct validity: Does the test measure the concept that it’s intended to measure? Content validity: Is the test fully representative of what it aims to measure? Face validity: Does the content of the test appear to be suitable to its aims? Criterion validity: Do the results accurately measure the concrete outcome they are designed to measure? Note that this article deals with types of test validity, which determine the accuracy of the actual components of a measure. If you are doing experimental research, you also need to consider internal and external validity, which deal with the experimental design and the generalizability of results. Construct validity Construct validity evaluates whether a measurement tool really represents the thing we are interested in measuring. It’s central to establishing the overall validity of a method. What is a construct? A construct refers to a concept or characteristic that can’t be directly observed, but can be measured by observing other indicators that are associated with it. Constructs can be characteristics of individuals, such as intelligence, obesity, job satisfaction, or depression; they can also be broader concepts applied to organizations or social groups, such as gender equality, corporate social responsibility, or freedom of speech. Example There is no objective, observable entity called “depression” that we can measure directly. But based on existing psychological research and theory, we can measure depression based on a collection of symptoms and indicators, such as low self-confidence and low energy levels Page | 17 Construct validity is about ensuring that the method of measurement matches the construct you want to measure. If you develop a questionnaire to diagnose depression, you need to know: does the questionnaire really measure the construct of depression? Or is it actually measuring the respondent’s mood, self-esteem, or some other construct? To achieve construct validity, you have to ensure that your indicators and measurements are carefully developed based on relevant existing knowledge. The questionnaire must include only relevant questions that measure known indicators of depression. Content validity Content validity assesses whether a test is representative of all aspects of the construct. To produce valid results, the content of a test, survey or measurement method must cover all relevant parts of the subject it aims to measure. If some aspects are missing from the measurement (or if irrelevant aspects are included), the validity is threatened. Example A mathematics teacher develops an end-of-semester algebra test for her class. The test should cover every form of algebra that was taught in the class. If some types of algebra are left out, then the results may not be an accurate indication of students’ understanding of the subject. Similarly, if she includes questions that are not related to algebra, the results are no longer a valid measure of algebra knowledge. Face validity Face validity considers how suitable the content of a test seems to be on the surface. It’s similar to content validity, but face validity is a more informal and subjective assessment. Example You create a survey to measure the regularity of people’s dietary habits. You review the survey items, which ask questions about every meal of the day and snacks eaten in between for every day of the week. On its surface, the survey seems like a good representation of what you want to test, so you consider it to have high face validity. As face validity is a subjective measure, it’s often considered the weakest form of validity. However, it can be useful in the initial stages of developing a method. Criterion validity Criterion validity evaluates how well a test can predict a concrete outcome, or how well the results of your test approximate the results of another test. To evaluate criterion validity, you calculate the correlation between the results of your measurement and the results of the criterion measurement. If there is a high correlation, this gives a good indication that your test is measuring what it intends to measure. Example A university professor creates a new test to measure applicants’ English writing ability. To assess how well the test really does measure students’ writing ability, she finds an existing test that is considered a valid measurement of English writing ability, and compares the results when the same group of students take both tests. If the outcomes are very similar, the new test has high criterion validity. Page | 18 View publication stats