Summated rating scales

advertisement
SUMBER KESALAHAN DALAM ……….
SUMBER KESALAHAN
• Social desirability
• Response order
– Recency - Respondent stops reading
once s/he gets to the response s/he
likes
– Primacy - Remember better the initial
choices
– Fatigue
– Giving politically correct
answers
• Response sets
– All yes, or all no
responses
• Item order
• Acquiescence
– Telling you what you
want to hear
• Personal bias
– Answers to later items may be affected
by earlier items (simple, factual items
first)
– Respondent may not know how to
answer earlier questions
– Wants to send a
message
Diunduh dari:
………….. 23/8/2012
MENILAI INSTRUMENT
Three issues to consider
– Validity: Does the instrument measure
what its supposed to measure
– Reliability: Does it consistently repeat the
same measurement
– Practicality: Is this a practical instrument
Sumber: Dr.Ir. Pudji Muljono, Msi. Disampaikan pada Lokakarya Peningkatan Suasana Akademik
Jurusan Ekonomi FIS-UNJ tanggal 5 sampai dengan 9 Agustus 2002
Diunduh dari: https://docs.google.com/viewer?a=v&q=cache:k1SsN7H88fAJ:repository.ipb.ac.id/bitstream/handle/
MENILAI INSTRUMENT
Proses Validasi Konsep Melalui Panel
1. Memeriksa instrumen mulai dari konstruk sampai penyusunan butir
Dalam kaitan ini, beberapa hal yang perlu diperhatikan antara lain :
1. Apakah dimensi yang dirumuskan sudah merupakan jabaran yang tepat dari konstruk yang
telah dirumuskan dan sesuai untuk mengukur konstruk dari variabel yang hendak diukur ?
2. Apakah indikator yang dirumuskan sudah merupakan jabaran yang tepat dari dimensi yang
telah dirumuskan dan sesuai untuk mengukur konstruk dari variabel yang hendak diukur ?
3. Apakah butir-butir instrumen yang dibuat telah sesuai untuk mengukur indikator-indikator dari
variabel yang hendak diukur ?
2. Menilai butir Item
Butir yang sudah dibuat diberikan kepada sekelompok panel untuk dinilai dengan
tetap mengacu pada tolok ukur di atas.
Metode penilaian butir dapat dilakukan dengan beberapa cara, misalnya dengan Metode Thurstone dan
Pair Comparison.
Sumber: Dr.Ir. Pudji Muljono, Msi. Disampaikan pada Lokakarya Peningkatan Suasana Akademik
Jurusan Ekonomi FIS-UNJ tanggal 5 sampai dengan 9 Agustus 2002
Diunduh dari: https://docs.google.com/viewer?a=v&q=cache:k1SsN7H88fAJ:repository.ipb.ac.id/bitstream/handle/
TIPE-TIPE VALIDITAS
• Face validity
– Does the instrument, on its face, appear to
measure what it is supposed to measure
• Content validity
– Degree to which the content of the items
adequately represent the universe of all relevant
items under study
– Generally arrived at through a panel of experts
TIPE-TIPE VALIDITAS
Content validity
“Validity refers to the degree to which evidence and theory support the
interpretations of test scores entailed by proposed uses of tests
(AERA/APA/NCME, 1999).
Content validity refers to the degree to which the content of the items reflects
the content domain of interest (APA, 1954)
Content validity addresses the adequacy and representativeness of the items to
the domain of testing purposes
Content validity is not usually quantified possibly due to :
1.) subsuming it within construct validity;
2.) ignoring it as important; and/or
3.) relying on accepted expert agreement procedures
Diunduh dari: plaza.ufl.edu/.../CONTENT%20VALIDITY.p... 25/8/2012
TIPE-TIPE VALIDITAS
• Criterion related
• Criterion related
– Degree to which the predictor
is adequate in capturing the
relevant aspects of criterion
– Uses Correlation analysis
– Concurrent validity
– Degree to which the predictor is
adequate in capturing the relevant
aspects of criterion
– Uses Correlation analysis
– Concurrent validity
• Criterion data is available at
the same time as predictor
score- requires high
correlation between the two
• Criterion data is available at the
same time as predictor scorerequires high correlation between
the two
– Predictive validity
– Predictive validity
• Criterion is measured after the
passage of time
• Retrospective look at the
validity of the measurement
• Known-groups
Diunduh dari:
• Criterion is measured after the
passage of time
• Retrospective look at the validity of
the measurement
• Known-groups
………….. 23/8/2012
TIPE-TIPE RELIABILITAS
•
Stability
– Test-retest: Same test is administered twice to the same subjects over a short
interval (3 weeks to 6 months)
– Look for high correlation between the test and retest
– Situational factors must be minimized
•
Equivalence
– Degree to which alternative forms of the same measure produce same or similar
results
– Give parallel forms of the same test to the same group with a short delay to avoid
fatigue
– Look for high correlation between the scores of the two forms of the test
– Inter-rater reliability
• Internal Consistency
– Degree to which instrument items are homogeneous and reflect the same
underlying constructs
– Split-half testing where the test is split into two halves that contain the same
types of questions
– Uses Cronbach’s alpha to determine internal consistency. Only one
administration of the test is required
– Kuder-Richardson (KR20) for items with right and wrong answers
Diunduh dari:
………….. 23/8/2012
PRAKTIKALITAS
•
•
•
Is the survey economical
• Cost of producing and administering the survey
• Time requirement
• Common sense!
Convenience
• Adequacy of instructions
• Easy to administer
Can the measurement be interpreted by others
• Scoring keys
• Evidence of validity and reliability
• Established norms
A comparison of Likert scale and traditional measures of self-efficacy.
By Maurer, Todd J.; Pierce, Heather R.
Journal of Applied Psychology, Vol 83(2), Apr 1998, 324-329.
This study addressed whether a Likert-type measurement format can be used as an alternative to the
traditional format for measuring self-efficacy. Classical reliability, observed correlations with relevant criteria,
and confirmatory factor analyses were used to assess the similarity of the two formats in a sample of 128
college students.
The results indicated that Likert-type and traditional measures of self-efficacy have similar reliability–error
variance, provide equivalent levels of prediction, and have similar factor structure and similar discriminability.
Overall, considering both practicality and the apparent similarity of empirical results from the two methods, a
Likert scale seems to offer an acceptable alternative method of measuring self-efficacy.
LimitationsDiunduh
and suggestions
for future research
are discussed.
dari: …………..
23/8/2012
Development of a Multi-item Scale
Develop Theory
Generate Initial Pool of Items: Theory, Secondary Data, and
Qualitative Research
Select a Reduced Set of Items Based on Qualitative Judgement
Collect Data from a Large Pretest Sample
Statistical Analysis
Develop Purified Scale
Collect More Data from a Different Sample
Evaluate Scale Reliability, Validity, and Generalizability
Final Scale
Diunduh dari:
………….. 23/8/2012
EVALUASI SEKALA
Scale Evaluation
Reliability
Test/
Retest
Alternative
Forms
Validity
Internal
Consistency
Content
Criterion
Convergent
Diunduh dari:
Generalizability
………….. 23/8/2012
Construct
Discriminant
Nomological
Transformasi data ordinal ke interval dengan Method of Succesive Interval
(MSI)
Untuk dapat diolah menjadi analisis regresi, data ordinal yang biasanya didapat dengan menggunakan skala likert, dll (skor
kuesioner), maka terlebih dahulu data ini harus ditrasformasikan menjadi data interval salah satu cara yang dapat
digunakan adalah Method of Succesive Interval (MSI).
Sepintas memang terlihat sangat susah karena kita harus membuat frekuensi, kemudian menentukan proporsi, membuat
proporsi komulatif dst.
Langkah-langkah Method of Succesive Interval (MSI).sebagai berikut:
1.
2.
3.
4.
5.
Membuat ferkuensi dari setiap butir jawaban pada masing-masing kategori pertanyaan.
Membuat proporsi dengan cara membagi frekuensi dari setiap butir jawaban dengan seluruh jumlah responden.
Membuat proporsi kumulatif
Menentukan nilai z untuk setiap butir jawaban berdasarkan nilai frekuensi yang telah diperoleh dengan bantuan tabel z
riil.
Menghitung nilai skala, dengan rumus:
6. Penyertaan nilai skala
Nilai penyertaan inilah yang disebut skala interval dan dapat digunakan dalam perhitungan analisis regresi.
Diunduh dari: http://jurnal-sdm.blogspot.com/2007/12/transformasi-data-ordinal-ke-interval.html ………….. 24/8/2012
TRANSFORMASI DATA ORDINAL MENJADI INTERVAL
Data primer adalah data yang direspon langsung oleh responden berdasarkan wawancara ataup daftar
pertanyaan yang dirancang, disusun, dan disajikan dalam bentuk skala; baik skala nominal, ordinal, interval
maupun ratio.
Teknik pengumpulan data seperti ini lazim digunakan karena selain bisa langsung menentukan skala
pengukuranya, juga dapat melengkapi hasil wawancara yang dilakukan dengan responden.
Melakukan manipulasi data dengan cara transformasi “skala” dari ordinal menjadi interval, selain bertujuan untuk
tidak melanggar kelaziman, juga untuk mengubah agar syarat distribusi normal dapat dipenuhi ketika
menggunakan statistika parametrik. Menurut Sambas Ali Muhidin dan Maman Abdurahman, “salah satu metode
transformasi yang sering digunakan adalah metode succesive interval (MSI)”.
Ada dua pendapat berbeda tentang bagaimana skor-skor yang diberikan terhadap alternatif jawaban pada skala
pengukuran Likert.
Pendapat pertama mengatakan bahwa skor 1, 2, 3, 4, dan 5 adalah data interval.
Pendapat ke dua, menyatakan bahwa jenis skala pengukuran Likert adalah ordinal.
Alasannya skala Likert merupakan Skala Interval adalah karena skala sikap merupakan dan menempatkan
kedudukan sikap seseorang pada kesatuan perasaan kontinum yang berkisar dari sikap “sangat positif”, artinya
mendukung terhadap suatu objek psikologis terhadap objek penelitian, dan sikap “sangat negatif”, yang tidak
mendukung sama sekali terhadap objek penelitian.
Ciri spesifik yang dimiliki oleh data yang diperoleh dengan skala pengukuran ordinal, adalah bahwa, data ordinal
merupakan jenis data kualitatif, bukan numerik, berupa kata-kata atau kalimat, seperti misalnya sangat setuju,
kurang setuju, dan tidak setuju, jika pertanyaannya ditujukan terhadap persetujuan tentang suatu event. Atau
bisa juga respon terhadap keberadaan suatu Bank “PQR” dalam suatu daerah yang bisa dimulai dari sangat tidak
setuju, tidak setuju, ragu-ragu, Setuju, dan sangat setuju.
Data interval adalah termasuk data kuantitatif, berbentuk numerik, berupa angka, bukan terdiri dari kata-kata,
atau kalimat. Peneliti
melakukan penelitian dengan menggunakan pendekatan kuantitatif, termasuk di dalamnya
Diunduh dari: myunanto.staff.gunadarma.ac.id/.../Transformasi+Data+Ordinal+Men...………….. 24/8/2012
PERLUKAH DATA ORDINAL DI TRANSFORMASI KE INTERVAL DENGAN MSI?
Posted by: Muji Gunarto on: 25 Desember 2008
Data ordinal dengan Skala Likert STS(1), TS(2), R(3), S(4), SS(5) jika diubah
skalanya menjadi interval maka skore interval akan mirip sama urutannya dengan skore asli ordinal dan berkorelasi sebesar
99%.
Jadi data asli ordinal sama dengan interval dan dapat dianggap interval.
Hal yang membedakan adalah interpretasi model dari hasil analisis anatara data ordinal dengan data interval.
Misalkan ada model regresi sebagai berikut:
Y = a + b1X1 +b2X2
Y = 0.50 +0.25X1 +0.30X2
Jika data interval misal Y = Produksi padi (ton/Ha), X1 = Pupuk UREA (kg/Ha) dan X2 = Bibit (kg/Ha), maka interpretasinya
adalah kalau pupuk dinaikan 10% maka produksi padi akan naik 2.5%, kalau bibit naik 10%, maka produksi padi naik 3%.
Kalau data ordinal (kualitatif) misalnya Y= kepuasan kerja, X1=Komitmen, X2=motivasi, maka tidak bisa diinterpretasikan jika
komitmen naik 10% maka kepuasan naik 2.5% (karena datanya kualitatif) jadi hanya bisa dikatakan bahwa komitmen
berpengaruh (signifikan) terhadap kepuasan kerja seberapa besar pengaruhnya tidak tahu (karena kualiatif).
Walaupun data ordinal tadi sudah menjadi interval tetap saja kita tidak bisa interpretasi seperti data kuantitatif karena data
aslinya adalah kualitatif.
Diunduh dari:
http://mujigunarto.wordpress.com/2008/12/25/perlukah-data-ordinal-di-transformasi-ke-interval-dengan-
Questionnaire design
For a questionnaire to fulfill a researcher’s purposes, the questions
must meet the basic criteria of relevance and accuracy.
To achieve these ends, a researcher who is systematically planning a
questionnaire’s design will be required to make several decisions—
typically, but not necessarily, in the following order:
1.
2.
3.
4.
5.
What should be asked?
How should questions be phrased?
In what sequence should the questions be arranged?
What questionnaire layout will best serve the research objectives?
How should the questionnaire be pretested? Does the
questionnaire need to be revised?
Diunduh dari: http://www.cengage.com/marketing/book_content/1439080674_zikmund/book/ch15.pdf ………….. 25/8/2012
Questionnaire design
What Should Be Asked?
Certain decisions made during the early stages of the research process will influence the
questionnaire design. The preceding chapters stressed good problem definition and clear
research questions.
This leads to specific research hypotheses that, in turn, clearly indicate what must be
measured.
Different types of questions may be better at measuring certain things than are others. In
addition, the communication medium used for data collection—that is, telephone interview,
personal interview, or self-administered questionnaire—must be determined.
This decision is another forward linkage that influences the structure and content of the
questionnaire. Therefore, the specific questions to be asked will be a function of previous
decisions made in the research process.
At the same time, the latter stages of the research process will also have an important
impact on questionnaire wording and measurement.
For example, when designing the questionnaire, the researcher should consider the types
of statistical analysis that will be conducted.
Diunduh dari: http://www.cengage.com/marketing/book_content/1439080674_zikmund/book/ch15.pdf ………….. 25/8/2012
Questionnaire design
A survey is only as good as the questions it asks
Langkah-Langkah Pembuatan Quesioner:
Langkah 1:
• Menentukan Hipotesis
• Menentukan tipe survey yang akan digunakan
• Menentukan pertanyaan-pertanyaan survey
• Menentukan kategori jawaban
• mendesain letak survey
Langkah 2:
• Rencanakan bagaimana data akan dikumpulkan
• Uji awal alat pengukuran
Langkah 3:
• tentukan target populasi
• tentukan teknik sampling (random sampling, non random sampling)
• tentukan ukuran sampel
• pilih sampel
Langkah 4:
• Temukan responden
• lakukan interview/wawancara
• kumpulkan data dengan teliti
Diunduh dari:
http://jurnal-sdm.blogspot.com/2011/06/penyusunan-kuesioner-penelitian.html………….. 24/8/2012
What should you ask?
• The questions asked are a function of
previous decisions
• The questions asked are a function of future
decisions (such as statistical analysis)
Ecosystem services (also called environmental services or nature’s services) are
benefits provided by ecosystems to humans, that contribute to making human life
both possible and worth living. Many of these goods and services are traditionally
viewed as free benefits to society, or "public goods" - wildlife habitat and diversity,
watershed services, carbon storage, and scenic landscapes, for example. Lacking a
formal market, these natural assets are traditionally absent from society’s balance
sheet; their critical contributions are often overlooked in public, corporate, and
individual decision-making.
Diunduh dari: http://www.trunity.net/oceanresource/topics/view/55385/ ………….. 25/8/2012
Key criteria
•
Questionnaire relevancy
– No unnecessary information is collected and only information needed to solve the
problem is obtained. Be specific about your data needs; tie each question to an
objective
•
Questionnaire accuracy
– Information is both reliable and valid
What is LCA?
In the context of environmental challenges and the need for more sustainable production modes, Life
Cycle Assessment (LCA) has been brought forward as an important and comprehensive method for
analyzing the environmental impact of products and services. While its has long been used in the industry,
LCA has only been applied to agricultural systems for the last 10 years. (http://lcarice.cirad.fr/what_is_lca)
LCA is defined and framed by ISO standards. It involves 4 typical phases:
1. Goal and scope definition (where system is delineated, indicators are chosen, functional unit is
selected, ways of presenting results are decided upon, etc.)
2. Inventory analysis (where all inputs and resources used are inventoried and quantified, related to the
given functional unit; it is a kind of mass and energy balance, focused on environmentally relevant
flows)
3. Impact assessment (where environmental impact indicators are calculated, involving classification and
characterization stages)
4. Interpretation and presentation of results (with necessary caution regarding indicators -uncertainty
and errors should be considered, sensitivity analysis should be carried out).
Diunduh dari: http://www.cengage.com/marketing/book_content/1439080674_zikmund/book/ch15.pdf ………….. 25/8/2012
Key criteria
Diunduh dari: http://www.cengage.com/marketing/book_content/1439080674_zikmund/book/ch15.pdf ………….. 25/8/2012
Questionnaire Relevancy
A questionnaire is relevant to the extent that all information collected addresses a
research question that will help the decision maker address the current business
problem. Asking a wrong question or an irrelevant question is a common pitfall. If the
task is to pinpoint store image problems, questions asking for political opinions are likely
irrelevant.
The researcher should be specific about data needs and have a rationale for each item
requesting information. Irrelevant questions are more than a nuisance because they
make the survey needlessly long. In a study where two samples of the same group of
businesses received either a one-page or a three-page questionnaire, the response rate
was nearly twice as high for the one-page survey.
Conversely, many researchers, after conducting surveys, find that they omitted some
important questions. Therefore, when planning the questionnaire design, researchers
must think about possible omissions. Is information on the relevant demographic and
psychographic variables being collected?
Would certain questions help clarify the answers to other questions? Will the results of
the study provide the answer to the manager’s problem?
Diunduh dari: http://www.cengage.com/marketing/book_content/1439080674_zikmund/book/ch15.pdf ………….. 25/8/2012
Questionnaire Accuracy
Once a researcher decides what should be asked, the criterion of accuracy becomes the
primary concern. Accuracy means that the information is reliable and valid. While
experienced researchers generally believe that questionnaires should use simple,
understandable, unbiased, unambiguous, and nonirritating words, no step-by-step
procedure for ensuring accuracy in question writing can be generalized across projects.
Obtaining accurate answers from respondents depends strongly on the researcher’s
ability to design a questionnaire that will facilitate recall and motivate respondents to
cooperate. Respondents tend to be more cooperative when the subject of the research
interests them. When questions are not lengthy, difficult to answer, or ego threatening,
there is a higher probability of obtaining unbiased answers.
Question wording and sequence also substantially influence accuracy, which can be
particularly challenging when designing a survey for technical audiences. The
Department of Treasury commissioned a survey of insurance companies to evaluate
their offering of terrorism insurance as required by the government’s terrorism
reinsurance program. But industry members complained that the survey misused terms
such as “contract” and “high risk,” which have precise meanings for insurers, and asked
for policy information “to date,” without specifying which date. These questions caused
confusion and left room for interpretation, calling the survey results into question.
Diunduh dari: http://www.cengage.com/marketing/book_content/1439080674_zikmund/book/ch15.pdf ………….. 25/8/2012
Phrasing Questions
•
Open ended response versus fixed alternative questions
“?”
•
Decision criteria: type of research; time; method of delivery; budget; concerns regarding researcher bias
Open-ended response questions pose some problem or topic and ask respondents to answer in their own words. If
the question is asked in a personal interview, the interviewer may probe for more information, as in the following
examples:
1.
2.
3.
4.
5.
6.
7.
8.
What names of local banks can you think of?
What comes to mind when you look at this advertisement?
In what way, if any, could this product be changed or improved? I’d like you to tell me anything you can
think of, no matter how minor it seems.
What things do you like most about working for Federal Express? What do you like least?
Why do you buy more of your clothing in Nordstrom than in other stores?
How would you describe your supervisor’s management style?
Please tell us how our stores can better serve your needs.
The fixed-alternative questions—sometimes called closed-ended questions—which give respondents
Open-ended
response
are free-answer
questions.
specific limited-alternative
responses
and askquestions
them to choose
the one closest
to their own viewpoints.
For example:
Did you use any commercial feed or supplement for livestock or poultry in 2010?
Yes
No
Would you say that the labor quality in Japan is higher, about the same, or not as good as it was 10 years
ago?
Higher
About…………..
the same 25/8/2012
Diunduh dari:
Avoid
•
•
•
•
•
•
•
Leading questions (pertanyaan yang “menggiring”)
Overly complex questions
Use of jargon
Loaded questions (can use a counterbiasing statement)
Ambiguity
Double barreled questions
Making assumptions
Avoid Leading and Loaded Questions
leading question = A question that suggests or implies certain answers.
Leading and loaded questions are a major source of bias in question wording. A leading question suggests or implies
certain answers.
A study of the dry cleaning industry asked this question:
Many people are using dry cleaning less because of improved wash-and-wear clothes. How do you feel wash-andwear clothes have affected your use of dry cleaning facilities in the past 4 years?
Use less
No change
Use more
It should be clear that this question leads the respondent to report lower usage of dry cleaning. The potential
“bandwagon effect” implied in this question threatens the study’s validity.
loaded question = A question that suggests a socially desirable answer or is emotionally charged.
A loaded question suggests a socially desirable answer or is emotionally charged.
Consider the following question from a survey about media influence on politics:
What most influences your vote in major elections?
1. My own informed opinion
2. Major media outlets such as CNN
Diunduh dari:3.http://www.cengage.com/marketing/book_content/1439080674_zikmund/book/ch15.pdf
………….. 25/8/2012
Newspaper endorsements
4. Popular celebrity opinions
Order?
•
Order bias results from an alternative answer’s position in a set of answers or from
the sequencing of questions
– Funneling technique: general to specific helps understand the frame of reference first
•
Anchoring effect: the first concept measured tends to become a comparison point
from which subsequent evaluations are made
COUNTERBIASING STATEMENT
An introductory statement or preamble to a potentially embarrassing question that reduces a
respondent’s reluctance to answer by suggesting that certain behavior is not unusual.
An introductory counterbiasing statement or preamble to a question that reassures respondents that their
“embarrassing” behavior is not abnormal may yield truthful responses:
Some people have time to brush three times daily but others do not. How often did you brush your
teeth yesterday?
If a question embarrasses the respondent, it may elicit no answer or a biased response. This is particularly
true with respect to personal or classification data such as income or education. The problem may be
mitigated by introducing the section of the questionnaire with a statement such as this:
To help classify your answers, we’d like to ask you a few questions. Again, your answers will be kept in
strict confidence.
Diunduh dari: http://www.cengage.com/marketing/book_content/1439080674_zikmund/book/ch15.pdf ………….. 25/8/2012
AVOID AMBIGUITY: BE AS SPECIFIC AS POSSIBLE
Items on questionnaires often are ambiguous because they are too general. Consider such indefinite words
as often, occasionally, regularly, frequently, many, good, and poor. Each of these words has many different
For one consumer, frequent reading of Fortune magazine
may be reading all 25 issues in a year, while another might think
meanings.
12, or even 6 issues a year is frequent. Earlier, we used the following question as an example of a checklist question:
Please check which, if any, of the following sources of information about investments you regularly use.
What exactly does regularly mean? It can certainly vary from respondent to respondent. How exactly does hardly any differ
from occasionally? Where is the cutoff? It is much better to use specific time periods whenever possible.
A brewing industry study on point-of-purchase advertising (store displays) asked their distributors:
1.
2.
3.
4.
5.
How often does the company shut down production for sanitary maintenance?
Annually (once a year)
Semiannually (once every six months)
Quarterly (about every three months)
At least once monthly
Less frequently (less often than once a year)
Here the researchers clarified the terms permanent, semipermanent, and temporary by defining them for the respondent. However, the
question remained somewhat ambiguous. Beer marketers often use a variety of point-of-purchase devices to serve different purposes—in this
case, what is the purpose? In addition, analysis was difficult because respondents were merely asked to indicate a preference rather than a
degree of preference. Thus, the meaning of a question may not be clear because the frame of reference is inadequate for interpreting the
context of the question.
A student research group asked this question: What media do you rely on most?
1.
Television
2.
Radio
3.
Internet
Diunduh dari:
………….. 25/8/2012
4. http://www.cengage.com/marketing/book_content/1439080674_zikmund/book/ch15.pdf
Newspapers
This question is ambiguous because it does not provide information about the context. “Rely on most” for what—news,
Decisions
•
•
•
•
•
Ranking, sorting, rating or choice?
How many categories or response positions?
Balanced or unbalanced?
Forced choice or nonforced choice?
Single measure or index?
The Air Quality Index (AQI) is an index for reporting daily air quality. The Environmental Protection Agency calculates the
AQI for five major air pollutants regulated by the Clean Air Act: ground-level ozone, particle pollution (also known as
particulate matter), carbon monoxide, sulfur dioxide and nitrogen dioxide. The higher the AQI value, the greater the level of
air pollution and the greater the health concern.
Diunduh dari: http://aapnews.aappublications.org/content/25/6/279.2.full ………….. 25/8/2012
Types of fixed alternative questions…
• Single dichotomy or dichotomous-alternative questions
“Are you currently registered in a course at the University of
Lethbridge?
Yes____ No____”
• Respondent chooses one of two alternatives (yes/no;
male/female)
• What scale would this data create?
• Multi-choice alternative
– Respondent chooses from several alternatives
– Many types…
Diunduh dari:
………….. 23/8/2012
Multi-choice alternative questions…
• Determinant choice
– Choose only one from several possible responses
“Which faculty are you currently registered in at the University of
Lethbridge?
Management ___
Education ____
Arts/Science____
Health sciences____
Combined degree____
• What type of scale would these data create?
• Frequency determination
– Asks for an answer about frequency of occurrence
In a typical week, how often do you purchase chocolate chip
cookies?
__never
__ once
__ 2 or more times
What type of scale would these data create?
Diunduh dari:
………….. 23/8/2012
CHECK LIST
• Check list
– Provide multiple answers to a single question
– Should be mutually exclusive and exhaustive
“What brands of chocolate chip cookies have you, to the best
of your memory, purchased in the past month (check all
that apply?)”
__ Dare
__ Chips A’hoy
__ Presidents Choice Decadent etc. etc.
• What type of scale would these data create?
Diunduh dari:
………….. 23/8/2012
CHECK LIST
The checklist question allows the respondent to provide multiple answers to a single
question. The respondent indicates past experience, preference, and the like merely
by checking off items. In many cases the choices are adjectives that describe a
particular object.
A typical checklist question might ask the following:
Please check which, if any, of the following sources of information about investments
you regularly use.
1.
2.
3.
4.
5.
6.
7.
8.
Personal advice of your broker(s)
Brokerage newsletters
Brokerage research reports
Investment advisory service(s)
Conversations with other investors
Web page(s)
None of these
Other (please specify) __________
Diunduh dari:
………….. 23/8/2012
ATTITUDE RATING SCALES
Attitude:
An enduring disposition to consistently respond to various aspect of the world, including persons, events and objects
Typically seen as having three components:
– Cognitive
– Affective
– Behavioural
Scaling Techniques for Measuring Data Gathered from Respondents
The term scaling is applied to the attempts to measure the attitude objectively. Attitude is a resultant of number of external
and internal factors. Depending upon the attitude to be measured, appropriate scales are designed. Scaling is a technique
used for measuring qualitative responses of respondents such as those related to their feelings, perception, likes, dislikes,
interests and preferences.
Nominal Scale
This is a very simple scale. It consists of assignment of facts/choices to various alternative categories which are usually
exhaustive as well mutually exclusive. These scales are just numerical and are the least restrictive of all the scales. Instances
of Nominal Scale are - credit card numbers, bank account numbers, employee id numbers etc. It is simple and widely used
when relationship between two variables is to be studied. In a Nominal Scale numbers are no more than labels and are used
specifically to identify different categories of responses.
How do you stock items at present?
[ ] By product category
[ ] At a centralized store
[ ] Department wise
[ ] Single warehouse.
Ordinal Scale
Ordinal scales are the simplest attitude measuring scale used in Marketing Research. It is more powerful than a nominal
scale in that the numbers possess the property of rank order. The ranking of certain product attributes/benefits as deemed
important by the respondents is obtained through the scale.
Rank
the http://www.managementstudyguide.com/attitude-scales.htm…………..
following attributes (1 - 5), on their importance in a microwave oven.
Diunduh
dari:
24/8/2012
Affective
The feelings or emotions toward an object
Cognitive
Knowledge and beliefs
Behavioral
Predisposition to action
Intentions
Behavioral expectations
Diunduh dari:
………….. 23/8/2012
Attitude Scales: Scaling Defined
The term scaling refers to procedures for attempting to determine quantitative
measures of subjective and sometimes abstract concepts.
It is defined as a procedure for the assignment of numbers to a property of
objects in order to impart some of the characteristics of numbers to the
properties in question.
Unidimensional
Scaling
Multidimensional
Scaling
Procedures designed
to measure only one
attribute of a
respondent or object
Procedures designed to
measure several
dimensions of a
respondent or object
Diunduh dari:
………….. 23/8/2012
PROSES MENGUKUR ATTITUDE
•
•
•
•
Ranking
Rating
Sorting
Choice
A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher
than', 'ranked lower than' or 'ranked equal to' the second.
In mathematics, this is known as a weak order or total preorder of objects. It is not necessarily a total order of
objects because two different objects can have the same ranking.
The rankings themselves are totally ordered. For example, materials are totally preordered by hardness, while
degrees of hardness are totally ordered.
By reducing detailed measures to a sequence of ordinal numbers, rankings make it possible to evaluate
complex information according to certain criteria. Thus, for example, an Internet search engine may rank the
pages it finds according to an estimation of their relevance, making it possible for the user quickly to select the
pages they are likely to want to see.
Analysis of data obtained by ranking commonly requires non-parametric statistics.
In statistics, "ranking" refers to the data transformation in which numerical or ordinal values are replaced by
their rank when the data are sorted. For example, the numerical data 3.4, 5.1, 2.6, 7.3 are observed, the ranks
of these data items would be 2, 3, 1 and 4 respectively.
For example, the ordinal data hot, cold, warm would be replaced by 3, 1, 2. In these examples, the ranks are
assigned to values in ascending order. (In some other cases, descending ranks are used.) Ranks are related
to the indexed list of order statistics, which consists of the original dataset rearranged into ascending order.
Some kinds of statistical tests employ calculations based on ranks:
Friedman test
Kruskal-Wallis test
Rank products
Spearman's rank correlation coefficient
Wilcoxon rank-sum test
Wilcoxon signed-rank test.
Types of attitude scales
•
•
•
Simple attitude scales
Most basic form – respondent responds to a single question
Do not allow for fine distinctions or placement on continua
– You are at a company party and are feeling nervous, but you are obligated to be there. Do you:
__ find someone you know to buddy up with
__ take it as an opportunity to meet new people
What type of scale would these data create?
Attitude Scales
An attitude scale is a special type of questionnaire designed to produce scores indicating the intensity and direction
(for or against) of a person's feelings about an object or event. There are several types of scales that can be
constructed, but the most common is the Likert -type. The scale is constructed so that all its questions concern a
single issue.
Attitude scales are often used in attitude change experiments. One group of people is asked to fill out the scale twice,
once before some event, such as reading a persuasive argument, and again afterward. A control group fills out the
scale twice without reading the argument. The control group is used to measure exposure or practice effects. The
change in the scores of the experimental group relative to the control group, whether their attitudes have become
more or less favorable, indicates the effects of the argument.
Likert-type Scale
A Likert -type scale, named for Rensis Likert (1932) who developed this type of attitude
measurement, presents a list of statements on an issue to which the respondent indicates degree of agreement using
categories such as :
Diunduh dari: http://famusoa.net/mpowers/trandp/docs/14%20Attitude%20and%20Rating%20Scales%20by%20Sommer.pdf
Strongly Agree, Agree, Undecided,
Disagree, and Strongly Disagree.
………….. 24/8/2012
CATEGORY SCALES
• Category scales
– More sensitive; provides more information
– Overall, how satisfied are you with the high speed performance of
your Mercedes:
__ very satisfied
__ somewhat satisfied
__ neither satisfied nor dissatisfied
__ somewhat dissatisfied
__ very dissatisfied
If you could choose, how long would each term be?
___26 weeks __ 13 weeks __ 6 weeks ___4 weeks
What type of scale would these data create?
Diunduh dari:
http://chemse.oxfordjournals.org/content/1/3/307.abstract………….. 24/8/2012
CATEGORY SCALES
RATIO SCALES AND CATEGORY SCALES OF ODOUR INTENSITY
J. R. PIGGOTT and R. HARPER.
Chem. Senses (1975) 1 (3): 307-316. doi: 10.1093/chemse/1.3.307
The relation between a ratio scale obtained by magnitude estimation and a category scale of the
odour intensity of 1-butanol was studied, together with individual variations in the ratio scale. Series
of solutions of butanol in water in small bottles were presented to a panel for judgement, half using
the method of magnitude estimation, the other half a category scale. Plots were made of the
category scale against the ratio scale, and the ratio scales of individual members of the panel were
analysed.
A power function exponent of 0.48 was found for the panel's ratio scale, with individual values
ranging from 0.25 to 0.49.
The category scale was curved relative to the ratio scale; variability of the magnitude estimates was
approximately proportional to the magnitude estimates; and a small time-order error was found.
Odour intensity exhibits the three tested characteristics of a prothetic continuum, and the variability
of individual exponents was not as great as sometimes suggested.
Diunduh dari:
http://chemse.oxfordjournals.org/content/1/3/307.abstract………….. 24/8/2012
Summated rating scales – the Likert scale
•
Summated rating scales – the Likert scale
–
Respondents indicate their attitudes by checking how strongly they agree or disagree with statements
–
Chocolate chip cookies are my preferred variety of cookie
Strongly disagree Disagree Uncertain
(1)
(2)
(3)
Agree
(4)
Strongly Agree
(5)
What type of scale would these data create?
Ratio scales, category scales, and variability in the production of loudness and softness.
Bruce Schneider, and Harlan Lane.
J. Acoust. Soc. Am. Volume 35, Issue 12, pp. 1953-1961 (December 1963).
Several studies have shown that category scales are nonlinearly related to ratio scales of subjective magnitude.
A variability model has been proposed previously to account for this departure from linearity.
This article examines the model in the light of the empirical relations that enter into it: the ratio scale of subjective magnitude, the
corresponding category scale, and the variability of judgments in both physical and psychological units.
These relations are determined, through repeated measurement with a single observer, for the psychological continuum, loudness, and its
inverse, softness. The ratio scales are shown to be reciprocals, and the category scales complements. The category scale of softness is more
concave downward, relative to its magnitude scale, than is the category scale of loudness.
This outcome is also derived mathematically from the empirical equations relating the four scales to physical magnitude.
Variability is found to increase with increasing stimulus magnitude at the same rate for both loudness and softness productions, expressed either
in physical units or in psychological units.
Hence, the variability model is found not to accord with the observed difference in concavity between softness and loudness category scales
relative to their respective psychological magnitude scales.
Diunduh dari:
http://iris.lib.neu.edu/psych_fac_pubs/15/………….. 24/8/2012
SEMANTIC DIFFERENTIAL RATING SCALE
Semantic Differential Rating scale
–
An attitude measure consisting of a series of seven-point bipolar rating scales allowing response to a “concept”
Think of your favorite type of cookie. Rate it on each of the following continua:
Hard------------------------------------------------------Soft
Lots of chips---------------------------------------Fewer chips
Crispy---------------------------------------------------chewy
What type of scale would these data create?
Journal of Marketing Management, Vol. 9:3, Winter 1999, 114-123. ©1999
RATING THE RATING SCALES
Hershey H. Friedman, and Taiwo Amoo
Rating scales are used quite frequently in research, especially in surveys. Typically, an itemized rating scale asks subjects to
choose one response category from several arranged in hierarchical order.
Dishonest researchers can, of course, purposefully manipulate the outcome of their research, if they wish, but such biasing
may also be totally unintentional.
This paper examines issues involved in creating a relatively unbiased rating scale. These include: (1) Connotations of
category labels; (2) Response alternative effects; (3) Implicit assumptions of the question; (4) Forced-choice vs. non-forcedchoice rating scales; (5) Unbalanced and balanced rating scales; (6) Order effects; (7) Direction of comparison; (8) Optimal
number of points; (9) Context effects; (10) Rating approach, e.g., improvement needed, performance, comparison to
expectations, comparison to ideal, etc.
Diunduh dari:
http://academic.brooklyn.cuny.edu/economic/friedman/rateratingscales.htm………….. 24/8/2012
NUMERICAL RATING SCALE
Numerical Rating scale
– Similar to a semantic differential except that it uses numbers as response options
to identify response positions instead of verbal descriptions
Think of your favorite type of cookie. Rate it on each of the following continua:
Hard------------------------------------------------------------------------Soft
8
7
6
5
4
3
2
1
This scale is called an 8 point numerical scale, why?
What type of scale would these data create?
Numerical rating scale
“A scale used for the subjective measurement of a clinical sign/syndrome, in which numerical scores are
given (e.g. 0-4).
A description is given for each score. The observer chooses, for each individual observed, the number on
the scale which they consider most closely matches that individual.“
This system groups information in discrete units, which may place a constraint on the observer.
The NRS can also be used without a descriptor for each score, but is improved by the addition of the
descriptions.
Diunduh dari: http://wildpro.twycrosszoo.org/S/00Ref/KeywordsContents/n/Numerical_rating_scale.htm ………….. 24/8/2012
NUMERICAL RATING SCALE
Validation of the numerical rating scale for pain intensity and unpleasantness in pediatric
acute postoperative pain: sensitivity to change over time
Pagé, M. Gabrielle; Katz, Joel; Stinson, Jennifer; Isaac, Lisa; Martin-Pichora, Andrea L.; Campbell, Fiona.
Journal of Pain, 13(4), 359-369. (2012) Date: 2012.
This study evaluates the construct validity (including sensitivity to change) of the numerical rating scale
(NRS) for pain intensity (I) and unpleasantness (U) and participant pain scale preferences in
children/adolescents with acute postoperative pain.
Eighty-three children aged 8 to 18 years (mean = 13.8, SD = 2.4) completed 3 pain scales including NRS,
Verbal Rating Scale (VRS), and faces scales (Faces Pain Scale-Revised [FPS-R] and Facial Affective Scale
[FAS], respectively) for pain intensity (I) and unpleasantness (U) 48 to 72 hours after major surgery, and
the NRS, VRS and Functional Disability Index (FDI) 2 weeks after surgery. As predicted, the NRSI correlated
highly with the VRSI and FPS-R and the NRSU correlated highly with the VRSU and FAS 48 to 72 hours after
surgery.
The FDI correlated moderately with the NRS at both time points. Scores on the NRSI and NRSU at 48 to 72
hours were significantly higher than at 2 weeks after surgery. Children found the faces scales the easiest to
use while the VRS was liked the least and was the hardest to use. The NRS has adequate evidence of
construct validity including sensitivity for both pain intensity and unpleasantness. This study further
supports the validity of the NRS as a tool to measure both intensity and unpleasantness of acute pain in
children.
Diunduh dari: http://pi.library.yorku.ca/dspace/handle/10315/14340
Diunduh dari: http://wildpro.twycrosszoo.org/S/00Ref/KeywordsContents/n/Numerical_rating_scale.htm ………….. 24/8/2012
CONSTANT SUM SCALES
Constant Sum Scales
–
Attributes based on their importance to the person. Respondents are asked to divide a constant sum to indicate
the relative importance of attributes
Example: Suppose the photocopy budget per professor was $100 per month. How much should be allocated to the
following. Divide the $100 according to your preference:
____ photocopying for student needs;
____ photocopying for research needs;
____ photocopying for committee needs.
====
$100 TOTAL
Constant-Sum Scales
A scale that helps the researcher discover proportions is the constant-sum scale.
1.
2.
With this scale, the participant allocates points to more than one attribute or propertyindicant, such that they total a
constant sum, usually 100 or 10.
In the Exhibit 13-2 example, two categories are presented that must sum to 100.
In the restaurant example, the participant distributes 100 points among four categories to indicate the relative
importance of each attribute:
_____ Food Quality
_____ Atmosphere
_____ Service
_____ Price
100
TOTAL
3.
Up to 10 categories may be used, but both participant precision and patience suffer when toomany stimuli are
proportioned and summed.
1. A participant’s ability to add is also taxed in some situations; this is not a responsestrategy that can be
effectively used with children or the uneducated.
2. The advantage of the scale is its compatibility with percent (100 percent) and the fact thatalternatives that are
Diunduh dari: http://www.scribd.com/doc/82071910/157/Constant-Sum-Scales ………….. 24/8/2012
GRAPHIC RATING SCALES
Graphic Rating Scales
– An attitude measure consisting of a graphic continuum that allows respondents to rate an
object by choosing any point on the continuum
GRAPHIC RATING SCALES
1. The graphic rating scale was originally created to enable researchers to discern fine differences.
Theoretically, an infinite number of ratings are possible if participants are sophisticatedenough to differentiate and
record them.
2. They are instructed to mark their response at any point along a continuum.
Usually, the score is a measure of length (millimeters) from either endpoint.
The results are treated as interval data.
3. The difficulty is in coding and analysis; this scale requires more time than scales with predetermined categories.
Never __X___________ Always
4. Other graphic rating scales use pictures, icons, or other visuals to communicate with the rater and represent a variety of
data types.
5. Graphic scales are often used with children, whose more limited vocabulary prevents the useof scales anchored with
words
Diunduh dari:
http://www.scribd.com/doc/82071910/157/Constant-Sum-Scales………….. 24/8/2012
Rank-Order Scales
Scales in which the respondent compares one item with another or a group of items against each other and
ranks them.
A Rank Order scale gives the respondent a set of items and asks them to put the items in some form of order.
The measure of 'order' can include such as preference, importance, liking, effectiveness and so on.
The order is often a simple ordinal structure (A is higher than B). It can also be done by relative position (A scores 10 whilst
B scores 6).
Example
Please write a letter next to the four evening activities below to show your preference. Use A for your most preferred
activity, B for the next preferred, then C for the next and then D for the least preferred.
__ Staying in and watching television
__ Going bowling
__ Going out for a meal
__ Going to a bar with a friend
1.
2.
3.
4.
5.
6.
Discussion
Sorting of ordinal data can be done in several ways:
Priority sorting looks for the most important first, then the next most important and so on.
Block sorting sorts items in to sub groups and then sorts the sub-groups (this is more important, that is less important - then sort the 'more important' group).
Score sorting gives an absolute score to each item.
Pairwise sorting compares pairs of items, moving the more important item higher or giving it a higher score.
Q-Sorting is done by writing items on cards (Q-cards) and asking the subject to place these in order.
Swap-sorting uses pairwise comparison on cards or Post-It Notes in a vertical column, swapping each pair in turn until
the whole column is in order.
Rank order items are analyzed using Spearman or Kendall correlation.
Diunduh dari: http://changingminds.org/explanations/research/measurement/rank_ordering.htm
…………..
The Rank Order scale is23/8/2012
also known as the Ranking scale.
LIKERT SCALE
The Likert scale is the most frequently used variation of the summated rating scale.
Summated rating scales consist of statements that express either a favorable or
anunfavorable attitude toward the object of interest.
1. The participant is asked to agree or disagree with each statement.
2. Each response is given a numerical score to reflect its degree of attitudinal
favorableness,and the scores may be summed to measure the participant’s
overall attitude.
3. Summation is not necessary and in some instances may actually be misleading.
The participant chooses one of five levels of agreement.
1. The numbers indicate the value to be assigned to each possible answer, with 1 the
leastfavorable impression of Internet superiority and 5 the most favorable.
2. Likert scales also use 7 and 9 scale points.
The Likert scale has many advantages that account for its popularity.
1. It is easy and quick to construct.
2. It is more reliable and provides more data than many other scales.
3. It produces interval data.
Diunduh dari: http://www.scribd.com/doc/82071910/157/Constant-Sum-Scales ………….. 24/8/2012
LIKERT SCALE
Originally, creating a Likert scale involved a procedure know as item analysis.
• In the first step, a large number of statements were collected that met two criteria:
Each statement was relevant to the attitude being studied;
Each reflected a favorable or unfavorable position on that attitude.
• People similar to those who are going to be studied were asked to read each statementand
to state the level of their agreement with it, using a 5-point scale.
• A scale value of 1 indicated a strongly unfavorable attitude (strongly disagree). Theother
intensities were 2 (disagree), 3 (neither agree nor disagree), 4 (agree), and 5(strongly
agree), a strongly favorable attitude.
• To ensure consistent results, the assigned numerical values are reversed if thestatement is
worded negatively (1 is always strongly unfavorable and 5 is alwaysstrongly favorable).
• Each person’s responses are then added to secure a total score.
• The next step is to array these total scores and select some portion representing thehighest
and lowest total scores (generally the top and bottom 10 to 25 percent).
• The middle group (50 to 80 percent of participants) are excluded from the
subsequentanalysis.
Diunduh dari: http://www.scribd.com/doc/82071910/157/Constant-Sum-Scales ………….. 24/8/2012
Using Angler Characteristics and Attitudinal Data to Identify Environmental Preference
Classes: A Latent-Class Model
EDWARD MOREY, JENNIFER THACHER, and WILLIAM BREFFLE
Environmental & Resource Economics (2006) 34: 91–115
A latent-class model of environmental preference groups is developed and estimated with only the
answers to a set of attitudinal questions.
Economists do not typically use this type of data in estimation. Group membership is latent/unobserved.
The intent is to identify and characterize heterogeneity in the preferences for environmental amenities in
terms of a small number of preference groups. The application is to preferences over the fishing
characteristics of Green Bay. Anglers answered a number of attitudinal questions, including the
importance of boat fees, species catch rates, and fish consumption advisories on site choice.
The results suggest that Green Bay anglers separate into a small number of distinct classes with varying
preferences and willingness to pay for a PCB-free Green Bay.
The probability that an angler belongs to each class is estimated as function of observable characteristics
of the individual.
Estimation is with the expectation–maximization (E–M) algorithm, a technique new to environmental
economics that can be used to do maximum-likelihood estimation with incomplete information.
As explained, a latent-class model estimated with attitudinal data can be melded with a latent-class
choice model.
Diunduh dari: http://www.colorado.edu/economics/morey/papers/MoreyThacherBreffle2006.pdf ………….. 25/8/2012
Relating Environmental Ethical Attitudes and Contingent Valuation Responses Using Cluster Analysis,
Latent Class Analysis, and the NEP: A Comparison
G. Aldrich, K. Grimsrud, J. THACHER, and M. Kotchen
September 1, 2005
Environmental ethics and attitudes may be an important source of heterogeneity when
considering the welfare effects and equity implications of policy changes dealing with
environment and natural resources.
The New Ecological Paradigm (NEP) Scale is a set of 15 likert questions and is intended
to indicate whether an individual holds pro-environmental or anti-environmental beliefs.
This paper provide an overview and comparison of three methodologies that may be
applied to NEP survey data to detect environmental ethics groups: total NEP score,
latent class analaysis, and cluster analysis methods.
We find that while environmental attitudes do not significantly affect average
willingness to pay measures, there are significant differences in willingness to pay
across environmental attitude groups.
The willingness to pay estimates for each attitudinal group are consistent across the
different analystical measures.
Diunduh dari: http://www2.bren.ucsb.edu/~kolstad/events/OccWkshp/Aldrich.pdf ………….. 25/8/2012
Environmental and Resource Economics 14: 95–117, 1999.
The Validity of Environmental Benefits Transfer: Further Empirical Testing
ROY BROUWER and FRANK A. SPANINKS.
1
This paper provides further empirical evidence of the validity of environmental benefits transfer based on CV
studies by expanding the analysis to include control factors which have not been accounted for in previous
studies. These factors refer to differences in respondent attitudes.
Questionnaires complying with Dillman’s (1978) ‘total design method’formail
surveys were sent to randomly selected households. Since management agreements in peat meadow areas
usually concentrate on the protection of meadow birds and ditch-side vegetation, these elements received most
attention in the questionnaires. Except for some minor differences in wording, both studies used the same
valuation scenarios.
Traditional population characteristics were taken into account, but these variables do not explain why respondents
from the same socio-economic group may still hold different beliefs, norms or values and hence have different
attitudes and consequently state different WTP amounts.
The test results are mixed. The function transfer approach is valid in one case, but is rejected in the 3 other cases
investigated in this paper. We provide further evidence that in the case of statistically valid benefits transfer, the
function approach results in a more robust benefits transfer than the unit value approach.
We also show that the equality of coefficient estimates is a necessary, but insufficient condition for valid benefit
function transfer and discuss the implications for previous and future validity testing.
Diunduh dari: http://www.fnu.zmaw.de/fileadmin/fnu-files/courses/ere4_val/erebouwerspaninks.pdf ………….. 25/8/2012
Download