STATISTIK INFERENSI: PENGUJIAN HIPOTESIS BAGI ANALISIS KORELASI DAN REGRESI (UJIAN – rP , rS , rPb ) Rohani Ahmad Tarmizi - EDU5950 1 Analisis korelasi digunakan untuk menjawab persoalan kajian seperti berikut: Adakah terdapat hubungan antara dua pembolehubah tersebut? “Is there relationship between the two variables?” Sejauh manakah hubungan tersebut? “How strong is the relationship?” Apakah arah hubungan tersebut? “What is the direction of the relationship?” ANALISIS KORELASI Analisis juga membabitkan dua kategori pembolehubah iaitu pembolehubah prediktif dan pembolehubah kriterion. P/U prediktif adalah yang memberi kesan atau mempengaruhi P/U yang kedua. P/U kriterion adalah yang menerima kesan atau pengaruh daripada P/U pertama. X (prediktif) Y (kriterion) X1, X2, X3,.. Y (kriterion) Walau bagaimanapun, analisis ini hanya memeri gambaran hubungan dan tidak memberi rumusan “cause-and-effect relationship”. Sebagai contoh, penyelidik hendak menentukan hubungan antara: Keyakinan dalam mentadbir dengan prestasi kepimpinan dalam kalangan pengetua Persepsi guru kanan dan staff pentadbiran terhadap tahap kepimpinan pengetua di sekolah Umur dengan kepuasan bekerja Amalan pemakanan pangkat keyakinan untuk menyertai marathon. Dua Cara Menentukan Korelasi 1. Secara bergambar iaitu dinamakan gambarajah sebaran (scatter diagram) yang menunjukkan pola kedudukan pasangan titik-titik. Daripada gambarajah sebaran kita dapat merumus keteguhan (magnitud) korelasi tersebut serta arah korelasinya. Dua Cara Menentukan Korelasi 2. Secara berangka iaitu dengan menentukan pekali, koefisi atau indeks. Daripada pekali tersebut kita dapat mengetahui keteguhan (magnitud) korelasi tersebut serta arahnya sama positif atau negatif. Scatter Plots and Types of Correlation x = SAT score y = GPA GPA 4.00 3.75 3.50 3.25 3.00 2.75 2.50 2.25 2.00 1.75 1.50 300 350 400 450 500 550 600 650 700 750 800 Math SAT Positive Correlation as x increases y increases Scatter Plots and Types of Correlation x = hours of training y = number of accidents Accidents 60 50 40 30 20 10 0 0 2 4 6 8 10 12 14 16 18 Hours of Training Negative Correlation as x increases, y decreases 20 Scatter Plots and Types of Correlation x = height y = IQ IQ 160 150 140 IQ 130 120 110 100 90 80 60 64 68 72 76 80 Height No linear correlation Analisis Korelasi Menunjukkan 3 perkara penting, iaitu: Arah/Direction (positive or negative) Bentuk/Form (linear or non-linear) Kekuatan/Magnitude (size of coefficient) PEKALI ATAU KOEFISI KORELASI TERDAPAT BEBERAPA JENIS PEKALI KORELASI IAITU: Pearson product-moment correlation Digunakan apabila p/u x dan y adalah pada skala sela atau nisbah atau gabungan kedua-duanya. Spearman rho correlation Digunakan apabila p/u x dan y adalah pada skala ordinal atau gabungan ordinal dengan sela/nisbah. Point-biserial correlation Digunakan apabila p/u x adalah dikotomus dan p/u y adalah pada skala sela atau nisbah. Pekali Pearson r = n [xy] - [xy] [ n x2 - ( x) 2 ] [ n y2 - ( y) 2 ] n = bilangan pasangan skor x y = jumlah skor x didarab dengan skor y x = jumlah skor x y = jumlah skor y Pekali Spearman r = 1 - [6B2] n [ n2 - 1 ] n = bilangan pasangan skor B = jumlah beza antara setiap pasangan pangkatan Pekali Point-biserial r = y1 – y2 [ n1 n2 ] sy n[n-1] Correlation Coefficient - A measure of the strength and direction of a linear relationship between two variables The range of r is from -1 to 1. -1 If r is close to -1 there is a strong negative correlation 0 If r is close to 0 there is no linear correlation 1 If r is close to 1 there is a strong positive correlation Guildford Rule of Thumb r Strength of Relationship < 0.2 Negligible Relationship 0.2 – 0.4 Low Relationship 0.4 – 0.7 Moderate Relationship 0.7 – 0.9 High Relationship > 0.9 Very high Relationship Other Strengths of AssociationBy Johnson and Nelson (1986) r-value Interpretation 0.00 No relationship 0.01-0.19 Low relationship 0.20-0.49 Slightly Moderate relationship 0.50-0.69 Moderate relationship 0.70-0.99 Strong relationship 1.00 Perfect relationship The same strength interpretations hold for negative values of r, only the direction interpretations of the association would change. Association Between Two Scores Degree and strength of association .20–.35: When correlations range from .20 to .35, there is only a slight relationship .35–.65: When correlations are above .35, they are useful for limited prediction. .66–.85: When correlations fall into this range, good prediction can result from one variable to the other. Coefficients in this range would be considered very good. .86 and above: Correlations in this range are typically achieved for studies of construct validity or test-retest reliability. L1. Nyatakan hipotesis Hipotesis penyelidikan – Terdapat hubungan POSITIF yang signifikan antara tahap kepimpinan pengajaran Pengetua dengan prestasi akademik sekolah di Sabah Hipotesis nol/sifar – Tiada terdapat hubungan POSITIF yang signifikan antara tahap kepimpinan pengajaran Pengetua dengan prestasi akademik sekolah di Sabah L1. Nyatakan hipotesis Hipotesis penyelidikan – Terdapat hubungan NEGATIF yang signifikan antara tahap kepimpinan pengajaran Pengetua dengan BILANGAN MASALAH DISIPLIN sekolah di Sabah Hipotesis nol/sifar – Tiada terdapat hubungan NEGATIF yang signifikan antara tahap kepimpinan pengajaran Pengetua dengan BILANGAN MASALAH DISIPLIN sekolah di Sabah L2. TETAPKAN ARAS ALPHA = 0.01/ 0.05/ 0.10, TABURAN PERSAMPELAN, STATISTIK PENGUJIAN Nilai alpha ditetapkan oleh penyelidik. Ia merupakan nilai penetapan bahawa penyelidik akan menerima sebarang ralat semasa membuat keputusan pengujian hipotesis tersebut. Ralat yang sekecil-kecilnya ialah 0.01 (1%), 0.05 (5%) atau 0.10(10%). Nilai ini juga dipanggil nilai signifikan, aras signifikan, atau aras alpha. L2. Taburan Persampelan Taburan yang bersesuaian dengan analisis yang dijalankan. Ia merupakan model taburan korelasi yang mana nilai korelasi itu bertabur secara normal. Di kawasan kritikal terletak nilai korelasi yang “luar biasa” -> Ha adalah benar Dikawasan tak kritikal terletak nilai korelasi yang “biasa” -> Ho adalah benar L3. Nilai Kritikal Nilai kritikal adalah nilai yang menjadi sempadan bagi kawasan Ho benar dan Hp benar. Nilai ini merupakan nilai dimana penyelidik meletakkan penetapan sama ada cukup bukti untuk menolak Ho (maka boleh menerima Hp) ataupun tidak cukup bukti menolak Ho (menerima Ho). Nilai ini bergantung kepada nilai alpha dan arah pengujian hipotesis yang dilakukan. L4. Nilai Statistik Pengujian Ini adalah nilai yang dikira dan dijadikan bukti sama ada hipotesis sifar benar atau salah. Jika nilai statistik pengujian masuk dalam kawasan kritikal maka Ho adalah salah, ditolak dan Hp diterima Jika nilai statistik pengujian masuk dalam kawasan tak kritikal maka Ho adalah benar, maka terima Ho. L4. Nilai Statistik Pengujian r diuji = r diuji = 6 d 1 2 n n 1 2 L5. Membuat Keputusan, Kesimpulan dan tafsiran Jika nilai statistik pengujian masuk dalam kawasan tak kritikal maka Ho adalah benar, maka terima Ho. L5. Membuat Keputusan, Kesimpulan dan Tafsiran Jika nilai statistik pengujian masuk dalam kawasan kritikal maka Ho adalah tak benar, maka Ho ditolak dan seterusnya, Hp diterima (bermakna ada bukti Hp adalah benar) Example of Pearson correlation Data were collected from a randomly selected sample to determine relationship between average assignment scores and test scores in statistics. Distribution for the data is presented in the table below. Assuming the data are normally distributed. 1. Calculated an appropriate correlation Data set: coefficient. 2. Describe the nature of relationship between the two variable. 3. Test the hypothesis on the relationship at 0.01 level of significance. Assign 8.5 6 9 10 8 7 5 6 7.5 5 Test 88 66 94 98 87 72 45 63 85 77 Calculate the test statistic X Y XY X2 Y2 8.5 88 748 72.25 7744 6 66 396 36 4356 9 94 846 81 8836 10 98 980 100 9604 8 87 696 64 7569 7 72 504 49 5184 5 45 225 25 2025 6 63 378 36 3969 7.5 85 637.5 56.25 7225 5 77 385 25 5929 Steps in Hypothesis Testing 1. State the null and alternative hypothesis HO: ρ p = 0, HA: ρ p ≠ 0 2. Calculate the test statistics: r = .865 3. Determine critical value: df = n – 2, Two-tailed. r critical= 0.7646 4. Make your decision: r cal > r critical so reject null hypothesis, accept alternative hypothesis 5. Make conclusion: There is significant relationship between assignment scores and test scores r (8) = 0.87, p<0.01 Spearman’s rank correlation coefficient Non parametric method: Less power but more robust. Does not assume normal distribution. The correlation coefficient also varies between -1 and 1 Example of Spearman correlation Data solicited from a randomly selected sample of employees were used to measure relationship between ratings of working environment and one’s work commitment. 1. Calculate and describe the appropriate correlation coefficient 2. Test the hypothesis on the relationship at 0.05 level of significance ID 1 2 3 4 5 6 7 8 9 10 11 X 1 2 3 4 5 1 2 3 4 5 6 Y 1 1 2 3 4 3 3 2 5 5 5 . Null hypothesis: There is no significant correlation between between ratings of working environment and one’s work commitment among work employees. Research hypothesis: There is significant correlation between between ratings of working environment and one’s work commitment among work employees. Null hypothesis is true Research hypothesis is true Research hypothesis is true Determined the critical values in the sampling distribution. Degrees of freedom From Table r, r = ±.618 Participant Ratings of work environment Ratings of work commitment Rank of work environ ment Rank Work commit ment D D2 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 1 2 3 4 5 6 1 1 2 3 4 3 3 2 5 5 5 1.5 3.5 5.5 7.5 9.5 1.5 3.5 5.5 7.5 9.5 11 1.5 1.5 3.5 6 8 6 6 3.5 10 10 10 0 2 2 1.5 1.5 -4.5 -2.5 2 -2.5 -.5 1 0 4 4 2.25 2.25 20.25 6.25 4 6.25 0.25 1 50.5 Make a decision: Reject the null hypothesis hence accept research hypothesis. Conclusion: There was a statistically significant positive correlation between between ratings of working environment and one’s work commitment among employees (rho = 0.77, p < 0.05, N = 11). Participant Ratings of work environment Ratings of work commitment Rank of work environ ment 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 1 2 3 4 5 6 1 1 2 3 4 3 3 2 5 5 5 1.5 3.5 5.5 7.5 9.5 1.5 3.5 5.5 7.5 9.5 11 Rank Work commit ment D D2 0 2 2 1.5 1.5 -4.5 -2.5 2 -2.5 -.5 1 0 4 4 2.25 2.25 20.25 6.25 4 6.25 0.25 1 50.5 r = 1 - [6D2] n [ n2 - 1 ] r = 1 - [ 6(50.5 )] 11 [ 121 - 1 ] r = 1 – 0.229 r = 0.77 There is a positive and strong relationship between ratings of working environment and one’s work commitment among employees. 2. Test the hypothesis on the relationship between the two variables at 0.05 level of significance. a. State the null and alternative hypotheses H O : ρs = 0 H A : ρs ≠ 0 b. rs = 0. 77 c. Determine critical value Critical rs = 0.618 d. Decision: Since calculated rs (0.77) is larger than critical rs (0.618), we reject the null hypothesis, accept alternative hypothesis. e. Conclusion Conclude there is significant relationship between ratings towards work environment with level of work commitment at 0.05 level of significance, rs (11) = 0.77, p< .05. Results showed that the positive and high perception on work environment has positive impact on work commitment among employees. Point-biserial Correlation y1 – y2 rpb = sy • • • • • • [ n1 n2 ] n[n-1] Mean of group 1 Mean of group 2 Std dev of continuous variable No of subjects in group 1 No of subjects in group 2 Total no of subjects Example on Point-biserial correlation A psychologist hypothesizes an association between marital status (1-single, 2-married) and need for achievement. A questionnaire measuring need for achievement is administered to married and single people. 1. Calculate the appropriate correlation coefficient 2. Describe the nature of relationship between the two variables. 3. Test the hypothesis on the relationship at 0.05 level of significance Marital status 2 2 1 1 1 2 1 2 2 1 1 2 1 1 Need for Achievement 3 7 12 16 24 11 15 10 11 18 22 9 19 17 Point-biserial Correlation r = • • • • • • y1 – y2 [ n1 n2 ] sy n[n-1] Mean of married subject = 8.5 Mean of single subjects = 17.9 Std dev. of need of achievement scores = 5.89 No of married subjects = 6 (2) No of single subjects = 8 (1) Total no of subjects = 14 Point-biserial Correlation r = 17.9 – 8.5 5.89 r pb = 0.82 [8x6] 14 [ 14 - 1 ] The mean need for achievement for single individual is 17.9 and for married individuals is 8.5. There is a strong relationship between marital status and need for achievement. 3. Test the hypothesis on the relationship between the two variable at 0.05 level of significance. a. State the null and alternative hypotheses HO : ρ pb = 0 HA : ρ pb ≠ 0 b. r pb = 0.82 c. Determine critical value: Critical r pb = 0.532 d. Decision: Since calculated r pb (0.82) is greater than critical value, r pb (0.532), we can reject the null hypothesis thus accept alternative hypothesis. e. Conclusion Therefore there is a significant relationship between marital status and need for achievement, r pb (12)=.82, p<0.05. Findings also indicated that single individuals showed a higher need for achievement compared to married individuals. Hence marital status has an influence on one’s need for achievement. ANALISIS REGRESI Analisis regresi adalah lanjutan daripada analisis korelasi dimana sesuatu hubungan telah diperoleh. Analisis regresi dilaksanakan setelah suatu pola hubungan linear dijangkakan serta suatu pekali ditentukan bagi menunjukkan terdapat hubungan yang linear antara dua pembolehubah. Selanjutnya bolehlah kita menelah atau meramal sesuatu pembolehubah (p/u criterion) setelah pembolehubah yang kedua (p/u predictive) diketahui. Prosedurnya ANALISIS REGRESI MUDAH terdiri daripada: Melakarkan gambarajah sebaran bagi taburan pasangan skor tersebut Menentukan persamaan bagi garis regresi tersebut Persamaan ini juga dipanggil model regresi Persamaan/model bagi garis ini ialah Y’ = a + bx Dan selanjutnya dengan mengguna persamaan tersebut, nilai y boleh ditentukan bagi sesuatu nilai x yang telah ditentukan dan juga disebaliknya. PERSAMAAN BAGI GARIS REGRESI (LEAST-SQUARES REGRESSION LINE) Y’ = a + bx Y’ = Nilai anggaran bagi y b = kecerunan bagi garis tersebut a = pintasan pada paksi y KECERUNAN GARIS REGRESI b = n[ xy] - [xy] [ n x2 - ( x)2 ] n xy X y = bilangan pasangan skor = jumlah skor x didarab dengan skor y = jumlah skor x = jumlah skor y a = PINTASAN PADA PAKSI Y a=y–bx Data: Tahap kepemimpinan pengetua dengan persepsi guru terhadap tahap kepemimpinan pengetua X Y 12 8 2 3 1 4 6 6 5 9 8 6 4 6 15 22 11 14 13 6 PENGIRAAN ANALISIS REGRESI X Y 12 8 2 3 1 4 6 6 5 9 8 6 4 6 15 22 11 14 13 6 XY X2 Y2 PENGIRAAN ANALISIS REGRESI X Y XY X2 Y2 12 8 96 144 64 2 3 6 4 9 1 4 4 1 16 6 6 36 36 36 5 9 45 25 81 8 6 48 64 36 4 6 24 16 36 15 22 330 225 484 11 14 154 121 196 13 6 78 169 36 77 84 821 805 994 PERSAMAAN BAGI GARIS REGRESI (LEAST-SQUARES REGRESSION LINE) Y’ = bx + a Y’ = Nilai anggran bagi y b = kecerunan bagi garis tersebut a= pintasan pada paksi y r= 0.70. Ini menunjukkan bahawa 49% variasi dalam y adalah sumbangan daripada X Kecerunannya ialah 0.82 Min bagi x ialah 7.7 Min bagi y ialah 8.4 a = 2.1 (pintasan di paksi y) Model regresi ialah Y’ = .82x + 2.1 Jika x=7, maka Y’= 7.84 Jika x=10, maka Y’= 10.3 Jika x=14, maka Y’=13.58 Regression & Correlation A correlation measures the “degree of association” between two variables (interval (50,100,150…) or ordinal (1,2,3...)) Associations can be positive (an increase in one variable is associated with an increase in the other) or negative (an increase in one variable is associated with a decrease in the other) 58 Example: Height vs. Weight Graph One: Relationship between Height and Weight 180 Strong positive correlation between height and weight Can see how the relationship works, but cannot predict one from the other If 120cm tall, then how heavy? Weight (kgs) 160 140 120 100 80 60 40 20 0 0 50 100 150 200 Height (cms) 59 Example: Symptom Index vs Drug A Graph Two: Relationship between Symptom Index and Drug A Can see how relationship 160 works, but cannot make predictions 140 Symptom Index Strong negative correlation 120 100 What Symptom Index might 80 60 we predict for a standard dose of 150mg? 40 20 0 0 50 100 150 Drug A (dose in mg) 200 250 Correlation examples 61 Regression Regression analysis procedures have as their primary purpose the development of an equation that can be used for predicting values on some DV for all members of a population. A secondary purpose is to use regression analysis as a means of explaining causal relationships among variables. The most basic application of regression analysis is the bivariate situation, to which is referred as simple linear regression, or just simple regression. Simple regression involves a single IV and a single DV. Goal: to obtain a linear equation so that we can predict the value of the DV if we have the value of the IV. Simple regression capitalizes on the correlation between the DV and IV in order to make specific predictions about the DV. The correlation tells us how much information about the DV is contained in the IV. If the correlation is perfect (i.e r = ±1.00), the IV contains everything we need to know about the DV, and we will be able to perfectly predict one from the other. Regression analysis is the means by which we determine the best-fitting line, called the regression line. Regression line is the straight line that lies closest to all points in a given scatterplot This line sometimes pass through the centroid of the scatterplot. Example: Symptom Index vs Drug A “Best fit line” Graph Three: Relationship between Symptom Index and Drug A (with best-fit line) Allows us to describe relationship between variables more accurately. 180 Symptom Index 160 140 120 We can now predict specific 100 80 60 values of one variable from knowledge of the other 40 20 0 0 50 100 150 Drug A (dose in mg) 200 250 All points are close to the line Example: Symptom Index vs Drug B Graph Four: Relationship between Symptom Index and Drug B (with best-fit line) 160 Symptom Index 140 120 We can still predict specific values of one variable from knowledge of the other Will predictions be as 100 accurate? 80 60 Why not? 40 20 “Residuals” 0 0 50 100 150 Drug B (dose in mg) 200 250 3 important facts about the regression line must be known: The extent to which points are scattered around the line The slope of the regression line The point at which the line crosses the Y-axis The extent to which the points are scattered around the line is typically indicated by the degree of relationship between the IV (X) and DV (Y). This relationship is measured by a correlation coefficient – the stronger the relationship, the higher the degree of predictability between X and Y. The degree of slope is determined by the amount of change in Y that accompanies a unit change in X. It is the slope that largely determines the predicted values of Y from known values for X. It is important to determine exactly where the regression line crosses the Y-axis (this value is known as the Y-intercept). The regression line is essentially an equation that express Y as a function of X. The basic equation for simple regression is: Y = a + bX where Y is the predicted value for the DV, X is the known raw score value on the IV, b is the slope of the regression line a is the Y-intercept Simple Linear Regression ♠ Purpose To determine relationship between two metric variables To predict value of the dependent variable (Y) based on value of independent variable (X) ♠ Requirement : DV Interval / Ratio IV Internal / Ratio ♠ Requirement : The independent and dependent variables are normally distributed in the population The cases represents a random sample from the population Simple Regression How best to summarise the data? 160 180 140 160 140 Symptom Index Symptom Index 120 100 80 60 120 100 80 60 40 40 20 20 0 0 0 50 100 150 Drug A (dose in mg) 200 250 0 50 100 150 200 Drug A (dose in mg) Adding a best-fit line allows us to describe data simply 250 General Linear Model (GLM) How best to summarise the data? Establish equation for the best-fit line: Y = a + bX 200 180 160 140 Where: a = y intercept 120 100 (constant) b = slope of best-fit line Y = dependent variable X = independent variable 80 60 40 20 0 0 50 100 150 200 250 Simple Regression R2 - “Goodness of fit” For simple regression, R2 is the square of the correlation coefficient Reflects variance accounted for in data by the best-fit line Takes values between 0 (0%) and 1 (100%) Frequently expressed as percentage, rather than decimal High values show good fit, low values show poor fit Simple Regression Low values of R2 DV 300 R2 = 0 250 (0% - randomly scattered 200 points, no apparent relationship between X and Y) 150 100 Implies that a best-fit 50 line will be a very poor description of data 0 0 100 200 300 IV (regressor, predictor) Simple Regression High values of R2 300 250 R2 = 1 DV 200 150 (100% - points lie directly 100 50 0 0 100 200 300 IV on the line - perfect relationship between X and Y) 250 Implies that a best-fit DV 200 line will be a very good description of data 150 100 50 0 0 50 100 150 IV 200 250 Simple Regression R2 - “Goodness of fit” 180 160 160 140 120 120 S ymptom Index S ymptom Index 140 100 80 60 100 80 60 40 40 20 20 0 0 0 50 100 150 200 250 Drug A (dose in mg) Good fit R2 high High variance explained 0 50 100 150 200 Drug B (dose in mg) Moderate fit R2 lower Less variance explained 250 Problem: to draw a straight line through the points that best explains the variance 9 8 7 6 Line can then be used to predict Y from X 5 4 3 2 1 0 0 2 4 6 77 Example: Symptom Index vs Drug A “Best fit line” Graph Three: Relationship between Symptom Index and Drug A (with best-fit line) allows us to describe relationship between variables more accurately. 180 Symptom Index 160 140 120 We can now predict specific 80 60 values of one variable from knowledge of the other 40 20 All points are close to the line 100 0 0 50 100 150 200 250 Drug A (dose in mg) 78 Regression Establish equation for the best-fit line: Y = a + bX Best-fit line same as regression line b is the regression coefficient for x x is the predictor or regressor variable for y 79 Regression - Types Step –Descriptive Analysis Derive Regression / Prediction equation ● Calculate a and b a=y–b X Ŷ = a + bX Example on regression analysis Data were collected from a randomly selected sample to determine relationship between average assignment scores and test scores in statistics. Distribution for the data is presented in the table below. 1. Calculate coefficient of determination and the correlation coefficient 2. Determine the prediction equation. 3. Test hypothesis for the slope at 0.05 level of significance Data set: Scores ID Assign 1 8.5 2 6 3 9 4 10 5 8 6 7 7 5 8 6 9 7.5 10 5 Test 88 66 94 98 87 72 45 63 85 77 1. Derive Regression / Prediction equation = 215.5 = 8.257 26.1 a= y – b x ID 1 2 3 4 5 6 7 8 9 10 X 8.5 6 9 10 8 7 5 6 7.5 5 Y 88 66 94 98 87 72 45 63 85 77 Summary stat: = 77.5 – 8.257 (7.2) = 18.050 Prediction equation: Ŷ = 18.05 + 8.257X n ΣΧ ΣΥ ΣΧ² ΣΥ² ΣΧΥ 10 72 775 544.5 62,441 5,795.5 Interpretation of regression equation Ŷ = 18.05 + 8.257x For every 1 unit change in X, Y will change by 8.257 units ΔY 18.05 ΔX Example on regression analysis: MARITAL SATISFACTION Parents : X 1 3 7 9 8 4 5 Mean of X No of pairs X X squared Standard deviation XY Children : Y 3 2 6 7 8 6 3 Mean of Y Y X squared Standard deviation 1. Derive Regression / Prediction equation a= y – b x = 5.00 +.65 (5.29) = 8.438 Prediction equation: Ŷ = 8.44 + .65x Interpretation of regression equation Ŷ = 8.43 + .65x For every 1 unit change in X, Y will change by .65 units ΔY 8.43 ΔX ANALISIS “CHI-SQUARE” (KUASA-DUA KHI) Ini juga merupakan analisis hubungan tetapi lebih dikenali sebagai analisis perkaitan (association) Analisis ini digunakan pakai bagi menentukan perkaitan antara pasangan pembolehubah yang diukur pada skala nominal atau ordinal ataupun jika salah satunya dipadankan dengan data sela dan nisbah. Dengan itu pembolehubah seperti Bangsa, Jantina, Suka/tidak suka makanan, Tinggi pencapaian/rendah pencapaian, Kebimbangan tinggi/ kebimbangan sederhana/ kebimbangan rendah Data frekuensi dicerap dengan membilang kejadian (occurance setiap perkara). Sesuai untuk kajian tinjauan Daripada frekuensi yang dicerap (observed frequency) analisis “chi-square” memberi kita makluman bahawa ada/tiada perkaitan antara kedua-dua pemboleh ubah. ANALISIS “CHI-SQUARE” (KUASA-DUA KHI) KATAKANLAH, penyelidik mengumpul maklumat tentang bangsa bagi responden dan juga kategori amalan pemakanan setiap responden, ATAU penyelidik tinjau pelajar dibeberapa buah sekolah dari segi jantina dan minta/tidak minat kepada aliran sains ATAU penyelidik tinjau bapa-bapa dan mengumpul maklumat tahap pendidikan (tinggi/ sederhana/ rendah) dan dikaitkan dengan kategori gaji Bagi ketiga-tiga contoh tersebut analisis yang sesuai dijalankan adalah analisis tak parametrik (analisis kuasadua khi) dan seterusnya dibina jadual kontingensi atau jadual“crosstabulation”. Daripada frekuensi yang dicerap (observed frequency) analisis “chi-square” memberi kita makluman bahawa ada/tiada perkaitan antara kedua-dua pemboleh ubah. ANALISIS “CHI-SQUARE” (KUASA-DUA KHI) Terdapat dua cara/kategori – CHI-SQUARE TEST OF GOODNESS OF FIT dan TEST OF INDEPENDENCE/DEPENDENCE TEST GOODNESS OF FIT – menjawab persoalan “adakah terdapat perbezaan kadar bagi sesuatu perkara/kejadian/persetujuan” TEST OF INDEPENDENCE/ DEPENDENCE – menjawab persoalan “adakah terdapat perkaitan/kebersandaran/ hubungan antara dua perkara ANALISIS “CHI-SQUARE” (KUASA-DUA KHI) Dapatan bagi analisis ini lazimnya dalam bentuk jadual frekuensi yang dipanggil jadual kontingensi atau jadual “crosstabulation”. Daripada frekuensi yang dicerap (observed frequency) analisis “chi-square” ini memberi kita makluman bahawa ada/tiada perkaitan yang signifikan antara kedua-dua pembolehubah yang dikaji Ataupun ada/tiada perbezaan frekuensi yang signifikan antara kategori-kategori yang dikaji. •Daripada jadual tersebut kita boleh telitikan atau kajikan sama ada terdapat hubungan atau perkaitan antara kedua-dua pemboleh ubah tersebut. •Selanjutnya analisis pengujian hipotesis perlu dijalankan ia itu untuk menguji terdapatnya perkaitan antara kedua-dua pemboleh ubah tersebut dengan signifikan. •Pengujian hipotesis ini adalah ujian kuasa dua khi. •Sekiranya, terdapat perkaitan yang signifikan maka langkah seterusnya adalah dengan menentukan darjah atau magnitud hubungan tersebut. •Bagi analisis ini, data adalah dalam bentuk kekerapan dan sudah semestinya taburan skor adalah tidak normal. •Dengan itu taburan ini dipanggil taburan bebas (distribution-free). •Ujian ini juga dipanggil ujian tak parametrik oleh kerana ia tidak bertabur secara normal. •Sebagai “rule-of-thumb” penggunaan ujian parametrik digalakkan oleh kerana oleh kerana “power” atau kekuatannya, walaubagaimana pun jika data adalah dalam bentuk nominal serta juga terdapat taburan data yang tidak normal maka ujian tak parametrik diterima pakai. •Ujian-ujian parametrik – sign test, Mann-Whitney U test, Wilcoxon matched-pairs signed ranks, KruskalWallis, Chi-square. Uji diri anda!!!-Apakah pengujian statistik yang diperlukan dan seterusnya jalankan analisis yang diperlukan EXAMPLE DATA Parents Marital Children Marital Performance Satisfaction Satisfaction Subject 1 1 3 70 2 3 2 80 3 7 6 40 4 9 7 35 5 8 8 50 6 4 6 40 7 5 3 30 Subjek Pangkat Agresif Pangkat Agresif 1 8 14 2 10 12 3 4 9 4 1 4 5 5 11 6 6 10 7 3 1 8 9 12 9 7 10 10 2 4 CONTOH DATA 3 Jantina Tahap Stail Kepemimpinan Kepimpinan Persepsi Prestasi oleh Guru 1 18 Autokratik 20 1 20 Autokratik 30 1 24 Autokratik 40 1 11 Demokratik 85 1 15 Demokratik 70 2 16 Demokratik 30 2 12 Demokratik 80 2 19 Autokratik 40 2 17 Demokratik 25 2 22 Autokratik 75