Trend of Saudi Arabia Students Taking Higher Education Abroad By Majed Saeed Alghamdi

advertisement
Trend of Saudi Arabia Students Taking
Higher Education Abroad
A THESIS
SUBMITTED TO THE GRADUATE EDUCATIONAL COUNCIL
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
For the degree
MASTER OF SCIENCE
By
Majed Saeed Alghamdi
Advisor Dr. Rahmatullah Imon
Ball State University
Muncie, Indiana
May 2016
Trend of Saudi Arabia Students Taking Higher Education Abroad
A THESIS
SUBMITTED TO THE GRADUATE EDUCATIONAL COUNCIL
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE
MASTER OF SCIENCE
By
Majed Saeed Alghamdi
Committee Approval:
…………………………………………………………………………………………….
Committee Chairman
Date
……………………………………………………………………………………………
Committee Member
Date
…………………………………………………………………………………………….
Committee Member
Date
Department Head Approval:
……………………………………………………………………………………………
Head of Department
Date
Graduate office Check:
……………………………………………………………………………………………
Dean of Graduate School
Date
Ball State University
Muncie, Indiana
May, 2016
i
ACKNOWLEDGEMENTS
I would like to express my special appreciation and thanks to my advisor Professor Dr.
Rahmatullah Imon, you have been a tremendous mentor for me, for his patience, motivation,
enthusiasm, and immense knowledge. His guidance helped me in all the time during my analysis
and writing the report. I could not have imagined having a better advisor and mentor for my thesis
other than him I would also like to thank my committee members, professor Dr. Munni Begum
and Dr. Yayuan Xiao for their encouragement, insightful comments and patience. I am thankful to
all my classmates for their kind supports. Last but not the least, I would like to thank my family:
my parents, my brothers and sisters, for supporting me throughout my life.
Majed Alghamdi
May 7, 2016
ii
ABSTRACT
In this study our prime objective was to investigate the trend of Saudi Arabia students who are
studying abroad for higher education. We find student enrolment is growing almost exponentially
over the years. The most popular programs are Engineering and Medical Science and the least
popular programs are Agriculture and Fine Arts. We also find an evidence of gender discrimination
against women among the Saudi Arabia students studying abroad. In quest of which factors
influence the number of students studying abroad we consider regression analysis and find that
budget in higher education and oil price are the most important variables to explain students’
enrolment. Both regression and cross validation study reveal that the robust reweighted least
squares (RLS) fit the data better than other models and yield better forecasts.
iii
Table of Contents
CHAPTER 1 .................................................................................................................................. 1
INTRODUCTION ..................................................................................................................... 1
1.1 Objective of the Study ....................................................................................................... 3
1.2 Sources of Data .................................................................................................................. 3
1.3 Methodology...................................................................................................................... 4
CHAPTER 2 .................................................................................................................................. 5
Trend of Saudi Arabia Students Studying abroad ................................................................. 5
2.1 Trend Analysis ................................................................................................................... 5
2.2 Trend Analysis of Nine Major Programs ........................................................................ 10
2.3 Trend Analysis of Some Other Relevant Variables......................................................... 28
2.4 Summary Results of Trend Analysis ............................................................................... 34
CHAPTER 3 ................................................................................................................................ 35
Comparison between Genders and Different Programs ..................................................... 35
3.1 Comparison between Genders ......................................................................................... 35
3.2 Tests for the Equality of Means between Male and Female Students ............................. 41
3.3 Comparison of the Individual Treatment Means ............................................................. 46
3.4 Result Summary .............................................................................................................. 48
iv
CHAPTER 4 ................................................................................................................................ 50
Modeling and Fitting of Data Using Regression Diagnostics and Robust Regression ...... 50
4.1 Classical Regression Analysis ......................................................................................... 50
4.2 Regression Diagnostics.................................................................................................... 54
4.3 Robust Regression ........................................................................................................... 62
4.4 Regression Results ........................................................................................................... 65
4.5 Results Comparisons ....................................................................................................... 75
CHAPTER 5 ................................................................................................................................ 76
Cross Validation of Forecasts................................................................................................. 76
5.1 Evaluation of Forecasts by Cross Validation .................................................................. 76
5.2 Cross Validation Results ................................................................................................. 78
CHAPTER 6 ................................................................................................................................ 80
Conclusions and Areas of Further Research ........................................................................ 80
6.1 Conclusions ..................................................................................................................... 80
6.2 Areas of Further Research ............................................................................................... 81
References .................................................................................................................................... 82
APPENDIX A .............................................................................................................................. 84
APPENDIX B .............................................................................................................................. 88
v
List of Tables
Chapter 2
Table 2.1: Trend Summary of the Total Number of Students ...................................................... 12
Table 2.2: Trend Summary of the Total Number of Social Science Students .............................. 15
Table 2.3: Trend Summary of the Total Number of Natural Science Students ............................ 17
Table 2.4: Trend Summary of the Total Number of Medical Science Students ........................... 18
Table 2.5: Trend Summary of the Total Number of Law Students .............................................. 20
Table 2.6: Trend Summary of the Total Number of Humanities Students ................................... 21
Table 2.7: Trend Summary of the Total Number of Fine Arts ..................................................... 23
Table 2.8: Trend Summary of the Total Number of Engineering Students .................................. 24
Table 2.9: Trend Summary of the Total Number of Education Students ..................................... 26
Table 2.10 Trend Summary of the Total Number of Agriculture Students .................................. 27
Table 2.11: Trend Summary of Oil Revenue ................................................................................ 30
Table 2.12: Trend Summary of Budget in Higher Education ....................................................... 32
Table 2.13: Trend Summary of Oil Price...................................................................................... 33
Table 2.14: Trend Summary ......................................................................................................... 34
Chapter 3
Table 3.1: Summary Test Results for the Equality of Means between Male and Female Students
....................................................................................................................................................... 42
Table 3.2: Average Number of Students in Different Programs .................................................. 43
Table 3.3 ANOVA Table for the Equality of Mean Test of Nine Programs ................................ 48
vi
Chapter 4
Table 4.1: Regression Results Summary ...................................................................................... 75
Chapter 5
Table 5.1: Original and Forecasted Values for 2011-2014 ........................................................... 78
Table 5.2: Cross Validation Result Summary............................................................................... 79
vii
List of Figures
Chapter 2
Figure 2.1: Time Series Plot of the Total Number of Students .................................................... 10
Figure 2.2: Trend Analysis of the Total Number of Students....................................................... 11
Figure 2.3: Time Series Plot of Total Number of Students in Different Programs ...................... 12
Figure 2.4: Time Series Plot of Total Number of Students (in ln) in Different Programs ........... 13
Figure 2.5: Trend Analysis Plot of the Total Number of Social Science Students ....................... 15
Figure 2.6: Trend Analysis Plot of the Total Number of Students for Natural Science ............... 16
Figure 2.7: Trend Analysis Plot of the Total Number of Students for Medical Science .............. 18
Figure 2.8: Trend Analysis Plot of the Total Number of Students for Law ................................. 19
Figure 2.9: Trend Analysis Plot of the Total Number of Students for Humanities ...................... 21
Figure 2.10: Trend Analysis Plot of the Total Number of Students for Fine Arts ....................... 22
Figure 2.11: Trend Analysis Plot of the Total Number of Students for Engineering ................... 24
Figure 2.12: Trend Analysis Plot of the Total Number of Students for Education ...................... 25
Figure 2.13: Trend Analysis Plot of the Total Number of Students for Agriculture .................... 27
Figure 2.14: Time Series Plot of the Budget in Higher Education ............................................... 28
Figure 2.15: Time Series Plot of Oil Price .................................................................................... 28
Figure 2.16: Time Series Plot of Oil Revenue .............................................................................. 29
Figure 2.17: Trend Analysis of Oil Revenue ................................................................................ 30
Figure 2.18: Trend Analysis of Budget in Higher Education ....................................................... 31
Figure 2.19: Trend Analysis of Oil Price ...................................................................................... 33
viii
Chapter 3
Figure 3.1: Time Series Plot of Male and Female Students in Social Science ............................. 35
Figure 3.2: Time Series Plot of Male and Female Students in Natural Science ........................... 36
Figure 3.3: Time Series Plot of Male and Female Students in Medical Science .......................... 37
Figure 3.4: Time Series Plot of Male and Female Students in Law ............................................. 37
Figure 3.5: Time Series Plot of Male and Female Students in Humanities .................................. 38
Figure 3.6: Time Series Plot of Male and Female Students in Engineering ................................. 39
Figure 3.7: Time Series Plot of Male and Female Students in Education .................................... 39
Figure 3.8: Time Series Plot of Male and Female Students in Fine Arts ..................................... 40
Figure 3.9: Time Series Plot of Male and Female Students in Agriculture .................................. 40
Figure 3.10: Box Plot of Number of Students in Different Programs .......................................... 43
Chapter 4
Figure 4.1: Scatter Plot of the Total Number of Students vs Budget in Higher Education .......... 66
Figure 4.2: Scatter Plot of the Total Number of Students vs Oil Price ......................................... 67
Figure 4.3: RLS and OLS Fit of the Total Number of Students vs Oil Price ............................... 67
Figure 4.4: Scatter Plot of the Total Number of Students vs Oil Revenue ................................... 68
Figure 4.5: Normal Probability Plot of the Residuals for Model A .............................................. 72
Figure 4.6: Normal Probability Plot of the Residuals for Model B .............................................. 73
Figure 4.7: Normal Probability Plot of the Residuals for Model C .............................................. 74
Chapter 5
Figure 5.1: Scatterplot of RLS, OLS, Exponential Forecasts vs Original Values ........................ 78
ix
CHAPTER 1
INTRODUCTION
As early as the reign of King Abdulaziz, The founding king of Saudi Arabia, students were being
sponsored to study abroad. Early programs were limited to Arab countries such as Egypt and
Lebanon to study Arabic and Islamic studies. The number of Saudi Arabian students studying
abroad has increased dramatically during the past decade. This explosive growth can be
attributed to an educational agreement brokered between former U.S. president George Bush and
Saudi King Abdullah bin Abdulaziz Al Saud in 2005. The agreement opened the doors for Saudi
students to pursue their higher educational degrees in the U.S. with their government paying all
of their educational expenses. As a result over 100,000 Saudi students were enrolled in American
colleges and universities in 2013-14, making Saudi Arabia the fourth largest sponsor of
international students to the U.S.
Saudi enrollments overseas have been growing exponentially since the 2005 introduction of
the King Abdullah bin Abdulaziz Scholarship Program (KASP). In 2012, the KASP was extended
with the aim of helping a further 50,000 Saudis graduate from the world’s top 500 universities by
2020. According to data from the Institute for International Education, in the 2012/13 academic
year there were a total of 44,586 tertiary-level Saudi students in the United States, an almost 100
percent increase from 2010/11 and a 12-fold increase from 2005.
The most recent data from the Student and Exchange Visitor Program’s SEVIS database show that
there were a total of 70,366 active nonimmigrant Saudi students (including dependents) in the
1
United States in July 2014 on F, J or M visas. This compares to 61,944 at the same time in
2013. Saudi government data pegs the 2013/14 number of Saudi students and dependents in the
United States at a significantly larger 106,858. Of those 89,423 were reported to be on government
scholarships. The same data show that there were 20,252 students in the United Kingdom, 18,926
in Canada, and 13,002 in Australia, with just under 200,000 total Saudi students at institutions
abroad (75% male) across the world.
By level of study, 120,000 students are at the undergraduate level, 47,500 at the master’s level and
10,400 at the doctoral level. The KASP will continue to prioritize fields designated as important
to progressing the Saudi “knowledge economy,” such as medicine, engineering and science.
Approximately 70 percent of scholarship students currently study in subjects related to Business
Administration, Engineering, Information Technology and Medicine. The top fields of study for
Saudi students in the United States last year were: Intensive English (27.2%), Engineering
(21.1%), Business/Management (17.1%), Math and Computer Science (7.4%), and Health
Professions (5.6%).
The Saudi government is projected to invest over 10% of its annual budget to higher education for
the foreseeable future. Currently it invests nearly $2.4 billion in the KASP initiative annually,
which includes academic funding as well as living expenses for over 100,000 students enrolled in
graduate and undergraduate programs in the U.S. If the Saudi government continues to support
KASP at the current level, it will soon surpass South Korea in terms of sending more students
abroad to study
2
1.1 Objective of the Study
In this study our prime objective was to investigate the trend of Saudi Arabia students who are
studying abroad for higher education. We would like to investigate both the overall trend and also
trends of individual programs. We would like to see whether there is any special preference for
any particular program. Another point of our interest is to investigate whether there is any gender
discrimination among the students? We would also like to find out the most important factors that
influence the number of students studying abroad most. We would employ regression analysis for
this and for the validity of the model we would employ recent diagnostics. If the conventionally
used least squares method fails we would either use robust regression or choose some other models.
To confirm which method does fit the data best we would apply cross validation.
1.2 Sources of Data
The most important data I need for my study is the number of Saudi Arabia students studying
abroad for higher education. This data set is taken from the official website The Ministry of Higher
Education of Saudi Arabia as given below.
https://www.mohe.gov.sa/ar/Ministry/Deputy-Ministry-for-Planning-and-Informationaffairs/HESC/Ehsaat/Pages/default.aspx
We have data for both male and female students in nine programs from 1981-2014. The nine
programs are Social Science, Natural Science, Medical Science, Law, Humanities, Fine Arts,
Engineering, Education, and Agriculture.
We believe that Budget in Higher Education is a key factor to understand the number of Saudi
Arabia students studying abroad. The Budget in Higher Education data set from 1981 to 2014 is
3
taken from the official website of the Ministry of Finance of Saudi Arabia. Here is the link of the
data:
https://www.mof.gov.sa/english/DownloadsCenter/Pages/Budget.aspx
We know Saudi Arabia heavily relies on Oil. We feel Oil Revenue and Oil Price could be very
important variables for our study. We collect these data from 1981-3014 from the official website
of Saudi Arabian Moneytary Agency (SAMA). Here is the link of the data:
http://www.sama.gov.sa/en-US/EconomicReports/Pages/YearlyStatistics.aspx
All these data are presented in Appendix A of my thesis.
1.3 Methodology
In this study we have employed a number of modern and sophisticate statistical techniques. We
have used linear, quadratic and exponential trend models to investigate both the overall trend and
also trends of individual programs. We have used experimental design technique to see whether
there is any special preference for any particular program and to investigate whether there is any
gender discrimination among the students. We would also like to find out the most important
factors that influence the number of students studying abroad most. We employ Fisher’s LSD and
Tukey’s test in this regard. We employ recent diagnostics like Jarque-Bera and Rescaled Moments
for normality and the robust reweighted least squares (RLS) technique for regression analysis.
Finally we employ a cross validation study based on the mean squared percentage error (MSPE)
to confirm which method does fit the data best.
4
CHAPTER 2
Trend of Saudi Arabia Students Studying abroad
In this chapter we introduce different time series models that we are going to use in our study with
their estimation procedures and properties. An excellent review of different aspects of time series
models are available in Pyndick and Rubenfield (1998), Bowerman et al. (2005), Montgomery et
al. (2008) and estimation. A time series is a chronological sequence of observations on a particular
variable. A time series model accounts for patterns of the past movement of a variable and uses
that information to predict its future movements, i.e., it is a sophisticated method of extrapolating
data. There are two different approaches of modeling a time series data: deterministic and
stochastic.
2.1 Trend Analysis
We begin with simple models that can be used to forecast a time series on the basis of its past
behavior. Most of the series we encounter are not continuous in time, instead, they consist of
discrete observations made at regular intervals of time. We denote the values of a time series by {
y t }, t = 1, 2, …, T. Our objective is to model the series y t and use that model to forecast y t beyond
the last observation yT . We denote the forecast l periods ahead by yˆ T l .
We sometimes can describe a time series y t by using a trend model defined as
yt  TR t   t
where TR t is the trend in time period t.
5
(2.1)
2.1.1 Linear Trend Model:
TR t   0  1t
(2.2)
We can predict y t by
yˆ t  ˆ0  ˆ1t
(2.3)
Then the forecast l period ahead is given by
yˆ T l  ˆ0  ˆ1 T  l 
(2.4)
1 T  l  t 
 T
For this particular model the distance value is DV =
. Hence the 100(1–  )%
T
2
 t  t 
2
t 1


prediction interval for an individual value of the dependent variable yˆ T l  t T 2 ,  / 2 s 1  DV .
2.1.2 Polynomial Trend Model of Order p
TR t   0  1t   2 t 2  ...   p t p
(2.5)
If the number of observation is not too large, we can predict y t by
ŷt  ˆ0  ˆ1t  ˆ2t 2  ...  ˆ pt p
(2.6)
Then the forecast l period ahead is given by
2
p
ŷT l  ˆ0  ˆ1 T  l   ˆ2 T  l   ...  ˆ p T  l 
(2.7)
The 100(1–  )% prediction interval for an individual value of the dependent variable
yˆ
T l
 t T  p1,  / 2 s 1  DV
6

(2.8)
Quadratic Trend Model:
It is a special case of polynomial trend model when order p = 2. Hence from the above results we
have
TR t   0  1t   2 t 2
(2.9)
If the number of observation is not too large, we can predict y t by
ŷ t  ˆ0  ˆ1t  ˆ 2 t 2
(2.10)
Then the forecast l period ahead is given by
ŷ T l  ˆ0  ˆ1 T  l   ˆ 2 T  l 
2
(2.11)
The 100(1–  )% prediction interval for an individual value of the dependent variable
yˆ
T l
 t T 3,  / 2 s 1  DV

(2.12)
2.1.3 Comparisons of Different Methods
Minitab computes three measures of accuracy of the fitted model: MAPE, MAD, and MSD for
each of the simple forecasting and smoothing methods. For all three measures, the smaller the
value, the better the fit of the model. Use these statistics to compare the fits of the different
methods.
MAPE, or Mean Absolute Percentage Error, measures the accuracy of fitted time series values. It
expresses accuracy as a percentage.
MAPE =
|  y
t
 yˆt  / yt |
T
7
 100
(2.13)
where yt equals the actual value, ŷt equals the fitted value, and T equals the number of
observations.
MAD (Mean), which stands for Mean Absolute Deviation, measures the accuracy of fitted time
series values. It expresses accuracy in the same units as the data, which helps conceptualize the
amount of error.
MAD (Mean) =
| y
t
 yˆt |
T
(2.14)
where yt equals the actual value, ŷt equals the fitted value, and T equals the number of
observations.
MSD stands for Mean Squared Deviation. MSD is always computed using the same denominator,
T, regardless of the model, so you can compare MSD values across models. MSD is a more
sensitive measure of an unusually large forecast error than MAD.
MSD =
y
 yˆt 
2
t
T
(2.15)
where yt equals the actual value, ŷt equals the fitted value, and T equals the number of
observations.
2.1.4 Exponential smoothing
Exponential smoothing provides a forecasting method that is most effective when the components
of the time series may be changing over time. It is often more reasonable to have more recent
values of y t play a greater role than do earlier values. In such a case recent values should be
weighted more heavily in the moving average.
8
Suppose that the time series y t has a level (or mean) that may slowly change over time but has no
trend or seasonal pattern. This series can be described as
yt   0   t
(2.16)
Then the estimate  T for the level of the series in time period T is given by the smoothing equation
 T   yT  1     T 1
(2.17)
where  is a smoothing constant between 0 and 1, and  T 1 is the estimate of the level in the time
period T – 1.
A point forecast for one period ahead us given by
yˆ T 1   T
(2.18)
which implies

yˆ T 1 =  yT   1   yT 1   1   2 yT 2 ... =   1    yT 
 0

(2.19)
It is easy to show that the l period forecast yˆ T l can be given by

yˆ T l =   1    yT 

 0
(2.20)
There are several methods to choose the appropriate value of  . The most popular method is to
choose  which minimizes the mean sum of (squared) distances (MSD) of the actual and
forecasted values. Other measures of accuracy are the mean absolute percentage error (MAPE)
and the mean absolute deviation (MAD).
9
2.2 Trend Analysis of Nine Major Programs
In this section we would like to investigate trend of total number of students studying abroad in
nine major programs. For each program we consider three different trend models: linear, quadratic,
and exponential. We also compute MAPE, MAD and MSD to evaluate which method better fits
the data.
2.2.1 All Programs
At first we consider the total number of students studying abroad in all programs. Figure 2.1 gives
the time series plot of the total number of students from 1980 to 2014. From this figure it is clear
that the number of students studying abroad has an increasing trend. It seems to us that this increase
is not linear, it is exponential.
Time Series Plot of Total No. of Students
100000
Total
80000
60000
40000
20000
0
1980
1985
1990
1995
Year
2000
2005
2010
Figure 2.1: Time Series Plot of the Total Number of Students
Now we would like to fit this data by three trend models: linear, quadratic and exponential and
the graphs are presented in Figure 2.2.
10
Trend Analysis Plot for Total
Linear Trend Model
Yt = -17044 + 2139*t
Variable
A ctual
F its
100000
A ccuracy Measures
MA PE
208
MA D
17431
MSD
439097288
Total
75000
50000
25000
0
3
6
9
12
15
18
Index
21
24
27
30
33
Trend Analysis Plot for Total
Quadratic Trend Model
Yt = 27234 - 5241*t + 210.8*t**2
Variable
A ctual
Fits
100000
80000
A ccuracy Measures
MA PE
119
MA D
8259
MSD
110484665
Total
60000
40000
20000
0
3
6
9
12
15 18
21
Index
24
27
30
33
Trend Analysis Plot for Total
Growth Curve Model
Yt = 2054.90 * (1.0895**t)
Variable
A ctual
Fits
100000
A ccuracy Measures
MA PE
77
MA D
12364
MSD
505301336
Total
80000
60000
40000
20000
0
3
6
9
12
15 18
21
Index
24
27
30
33
Figure 2.2: Trend Analysis of the Total Number of Students
From Figure 2.2 it is clear that the number of Saudi Arabia students studying abroad has an
increasing trend. It seems to us that an exponential model may fit the data better. But graphical
summaries are very subjective in nature. So for more convincing conclusions we need to look at
11
numerical quantities. The following table gives a summary result to compare three different trend
models.
Table 2.1: Trend Summary of the Total Number of Students
Model
MAPE
MAD
MSD
Linear
208
17431
439097288
Quadratic
119
8259
110484665
Exponential
77
12364
505301336
Results presented in Table 2.1 clearly show that both the quadratic trend model and the exponential
trend model fit the data better than the linear model but in terms of MAPE the exponential trend
model is better than the other two models.
Now we will investigate trend models for nine separate programs.
Time Series Plot of Students in Different Programs
35000
Variable
A griculture
Education
Engineering
Fine A rts
Humanities
Law
Medical Science
Natural Science
Social Science
30000
25000
Data
20000
15000
10000
5000
0
1980
1985
1990
1995
2000
Year
2005
2010
Figure 2.3: Time Series Plot of Total Number of Students in Different Programs
12
Figure 2.3 shows that the number of Saudi Arabia students studying abroad in each different
programs has an overall increasing trend. But there are huge differences in the number of students
so when they are plotted together some programs are not distinguishable at all. As a remedy to this
problem we plot the same graph in natural log scale and the graph is presented in Figure 2.4.
Time Series Plot of Students in Different Programs (in ln)
11
Variable
A griculture
Education
Engineering
Fine A rts
Humanities
Law
Medical Science
Natural Science
Social Science
10
9
Data
8
7
6
5
4
3
1980
1985
1990
1995
2000
Year
2005
2010
Figure 2.4: Time Series Plot of Total Number of Students (in ln) in Different Programs
Figure 2.3 shows that the number of Saudi Arabia students studying abroad in each different
programs has an overall increasing trend. But there are huge differences in the number of students
so when they are plotted together some programs are not distinguishable at all. As a remedy to this
problem we plot the same graph in natural log scale and the graph is presented in Figure 2.4. It is
clear from this figure that the number of students differs significantly from one program to another.
The highest enrolled programs are Engineering, Natural Science, Medical Science and Social
Science. But the number of students in Social Science dropped in the last few years. The programs
which have relatively less number of students are Agriculture and Fine Arts.
13
Now we will investigate trend models for nine separate programs.
2.2.2 Social Sciences
Among the nine programs at first we consider the total number of students studying abroad in
Social Science program. Figure 2.5 gives linear, quadratic and exponential trend fits for the Social
Science program.
From the figure it is clear that the number of students studying abroad in Social Science program
shows an increasing trend. It seems to us that an exponential model may fit the data. The following
table gives a summary result to compare three different trend models.
Trend Analysis Plot for The Total of Social Sceiences
Linear Trend Model
Yt = -1599 + 289*t
35000
Variable
A ctual
Fits
30000
A ccuracy Measures
MA PE
234
MA D
3537
MSD
34029670
25000
Total
20000
15000
10000
5000
0
3
6
9
12
15
18
Index
21
24
27
30
33
Trend Analysis Plot for The Total of Social Sceiences
Quadratic Trend Model
Yt = 3155 - 503*t + 22.6*t**2
35000
Variable
A ctual
Fits
30000
A ccuracy Measures
MA PE
112
MA D
2595
MSD
30241525
Total
25000
20000
15000
10000
5000
0
3
6
9
12
15
18
Index
21
14
24
27
30
33
Trend Analysis Plot for The Total of Social Sceiences
Growth Curve Model
Yt = 658.094 * (1.0530**t)
35000
Variable
A ctual
Fits
30000
A ccuracy Measures
MA PE
93
MA D
2552
MSD
39771799
Total
25000
20000
15000
10000
5000
0
3
6
9
12
15
18
Index
21
24
27
30
33
Figure 2.5: Trend Analysis Plot of the Total Number of Social Science Students
Table 2.2: Trend Summary of the Total Number of Social Science Students
Model
MAPE
MAD
MSD
Linear
234
3537
34029670
Quadratic
112
2595
30241525
Exponential
93
2552
39771799
Results presented in Table 2.2 clearly show that the exponential trend model fits the data better
than the other two models.
2.2.3 Natural Sciences
Our next example is the total number of students studying abroad in Natural Science program.
Figure 2.6 gives linear, quadratic and exponential trend fits for the Natural Science program. From
the figure it is clear that the number of students studying abroad in Natural Science program has
an increasing trend and an exponential model may better fit the data.
15
Trend Analysis Plot for the Total of Natural Sciences
Linear Trend Model
Yt = -4613 + 508*t
30000
Variable
A ctual
Fits
25000
A ccuracy Measures
MA PE
278
MA D
4086
MSD
27110563
Total
20000
15000
10000
5000
0
-5000
3
6
9
12
15
18
Index
21
24
27
30
33
Trend Analysis Plot for the Total of Natural Sciences
Quadratic Trend Model
Yt = 5952 - 1252*t + 50.31*t**2
30000
Variable
A ctual
Fits
25000
A ccuracy Measures
MA PE
193
MA D
2392
MSD
8401020
Total
20000
15000
10000
5000
0
3
6
9
12
15
18
Index
21
24
27
30
33
Trend Analysis Plot for the Total of Natural Sciences
Growth Curve Model
Yt = 279.595 * (1.1053**t)
30000
Variable
A ctual
Fits
25000
A ccuracy Measures
MA PE
72
MA D
2666
MSD
30860217
Total
20000
15000
10000
5000
0
3
6
9
12
15
18
Index
21
24
27
30
33
Figure 2.6: Trend Analysis Plot of the Total Number of Students for Natural Science
16
Table 2.3: Trend Summary of the Total Number of Natural Science Students
Model
MAPE
MAD
MSD
Linear
278
4086
27110563
Quadratic
193
2392
8401020
Exponential
72
2666
30860217
Results presented in Table 2.3 clearly show that the exponential trend model fits the data better
than the other two models.
2.2.4 Medical Science
Our next example is the total number of students studying abroad in Medical Science program.
Figure 2.7 gives linear, quadratic and exponential trend fits of this data. From the figure it is clear
that the number of students studying abroad in natural science program has an increasing trend and
an exponential model may better fit the data.
Trend Analysis Plot for the Total of Medical Science
Linear Trend Model
Yt = -4742 + 528*t
30000
Variable
A ctual
F its
25000
A ccuracy Measures
MA PE
249
MA D
4015
MSD
25461692
Total
20000
15000
10000
5000
0
-5000
3
6
9
12
15
18
Index
21
24
27
30
33
Trend Analysis Plot for the Total of Medical Science
Quadratic Trend Model
Yt = 5652 - 1205*t + 49.50*t**2
30000
Variable
A ctual
F its
25000
A ccuracy Measures
MA PE
165
MA D
2250
MSD
7351186
Total
20000
15000
10000
5000
17
0
3
6
9
12
15
18
Index
21
24
27
30
33
Trend Analysis Plot for the Total of Medical Science
Growth Curve Model
Yt = 259.904 * (1.1148**t)
30000
Variable
A ctual
Fits
25000
A ccuracy Measures
61
MA PE
2408
MA D
25015184
MSD
Total
20000
15000
10000
5000
0
3
6
9
12
15
18
Index
21
24
27
30
33
Figure 2.7: Trend Analysis Plot of the Total Number of Students for Medical Science
Table 2.4: Trend Summary of the Total Number of Medical Science Students
Model
MAPE
MAD
MSD
Linear
249
4015
25461692
Quadratic
165
2250
7351186
Exponential
61
2408
25015184
Results presented in Table 2.4 clearly show that the exponential trend model fits the data better
than the other two models.
2.2.5 Law
Here we consider the total number of students studying abroad in law program. Figure 2.8 gives
linear, quadratic and exponential trend fits of this data. From the figure it is clear that the number
of students studying abroad in Law program has an increasing trend and an exponential model may
better fit the data.
18
Figure 2.8: Trend Analysis Plot of the Total Number of Students for Law
19
Table 2.5: Trend Summary of the Total Number of Law Students
Model
MAPE
MAD
MSD
Linear
563
657
644213
Quadratic
357
338
174624
Exponential
96
419
755189
Results presented in Table 2.5 clearly show that the exponential trend model fits the data better
than the other two models.
2.2.6 Humanities
Now we consider the total number of students studying abroad in Humanities program. Figure 2.9
gives linear, quadratic and exponential trend fits of this data. From the figure it is clear that the
number of students studying abroad in Humanities program has an increasing trend. We also
observe from this plot that both quadratic and exponential models adequately fit the data.
20
Figure 2.9: Trend Analysis Plot of the Total Number of Students for Humanities
Table 2.6: Trend Summary of the Total Number of Humanities Students
Model
MAPE
MAD
MSD
Linear
167
1179
2573862
Quadratic
58
752
1348197
Exponential
87
880
2475024
Results presented in Table 2.6 clearly show that the quadratic trend model fits the data better than
the other two models.
21
2.2.7 Fine Arts
Now we consider the total number of students studying abroad in Fine Arts program. Figure 2.10
gives linear, quadratic and exponential trend fits of this data. From the figure it is clear that the
number of students studying abroad in Fine Arts program has an increasing trend and an
exponential model may better fit the data
Figure 2.10: Trend Analysis Plot of the Total Number of Students for Fine Arts
22
Table 2.7: Trend Summary of the Total Number of Fine Arts
Model
MAPE
MAD
MSD
Linear
224.2
194.6
71151.6
Quadratic
180.2
132.1
29439.3
Exponential
69.5
126.9
84233.2
Results presented in Table 2.7 clearly show that the exponential trend model fits the data better
than the other two models.
.
2.2.8 Engineering
Now we consider the total number of students studying abroad in Engineering program. Figure
2.11 gives linear, quadratic and exponential trend fits of this data. From the figure it is clear that
the number of students studying abroad in Engineering program has an increasing trend. We also
observe from this plot that an exponential model may better fit the data.
.
23
Figure 2.11: Trend Analysis Plot of the Total Number of Students for Engineering
Table 2.8: Trend Summary of the Total Number of Engineering Students
Model
MAPE
MAD
MSD
Linear
397
4738
36869030
Quadratic
258
2724
11068847
Exponential
119
3466
50802116
Results presented in Table 2.8 clearly show that the exponential trend model fits the data better
than the other two models.
24
2.2.9 Education
Now we consider the total number of students studying abroad in Education program. Figure 2.12
gives linear, quadratic and exponential trend fits of this data. From the figure it is clear that the
number of students studying abroad in Education program has an increasing trend. We also observe
from this plot that both quadratic and exponential models adequately fit the data.
Figure 2.12: Trend Analysis Plot of the Total Number of Students for Education
25
Table 2.9: Trend Summary of the Total Number of Education Students
Model
MAPE
MAD
MSD
Linear
134
577
464455
Quadratic
48
301
214264
Exponential
82
506
523959
Results presented in Table 2.9 clearly show that the quadratic trend model fits the data better than
the other two models.
2.2.10 Agriculture
Finally we consider the total number of students studying abroad in Agriculture. Figure 2.13 gives
linear, quadratic and exponential trend fits of this data. From the figure it is clear that the number
of students studying abroad in Agriculture program has an increasing trend. We also observe from
this plot that both quadratic and exponential models adequately fit the data.
26
Figure 2.13: Trend Analysis Plot of the Total Number of Students for Agriculture
Table 2.10 Trend Summary of the Total Number of Agriculture Students
Model
MAPE
MAD
MSD
Linear
36.53
26.68
1190.99
Quadratic
28.773
20.265
610.926
Exponential
33.25
25.90
1214.57
Results presented in Table 2.10 clearly show that the quadratic trend model fits the data better than
the other two models.
27
2.3 Trend Analysis of Some Other Relevant Variables
Here we consider some other variables which we believe may have a significant impact on the
number of students studying abroad. These variables are budget in higher education, oil price and
oil revenue. Oil is the key factor of Saudi Arabia economy, so oil price and oil revenue should
affect almost all major policies of the government.
At first we would like to see the trend of these variables. Time series plots of these three variables
are presented in Figures 2.14 to 2.16.
Time Series Plot of Budgei in HE
2.0000E+11
Budgei in HE
1.5000E+11
1.0000E+11
5.0000E+10
0
1981
1986
1991
1996
Year
2001
2006
2011
Figure 2.14: Time Series Plot of the Budget in Higher Education
We observe from this figure that the budget in higher education has a steady progress over the
years and it clearly shows an increasing trend. Oil price dropped once but gained later and thus
shows an upward trend overall. Oil revenue also shows an increasing pattern.
Time Series Plot of Oil Price
100
90
80
Oil Price
70
60
50
40
30
20
10
1981
1986
1991
1996
Year
2001
2006
2011
Figure 2.15: Time Series Plot of Oil Price
28
Time Series Plot of Oil Revenue
1200000
1000000
Oil Revenue
800000
600000
400000
200000
0
1981
1986
1991
1996
Year
2001
2006
2011
Figure 2.16: Time Series Plot of Oil Revenue
Now we fit these three variables by three different trend models.
2.3.1 Oil Revenue
At first we consider oil revenue over the years. Figure 2.17 gives linear, quadratic and exponential
trend fits of this data. From the figure it is clear that oil revenue has an increasing trend. We also
observe from this plot that both quadratic and exponential models adequately fit the data.
Trend Analysis Plot for Oil Revenue
Linear Trend Model
Yt = -127953 + 26267*t
1200000
Variable
A ctual
F its
1000000
A ccuracy Measures
MA PE
8.35439E+01
MA D
1.64688E+05
MSD
4.23297E+10
Oil Revenue
800000
600000
400000
200000
0
3
6
9
12
15
18
21
Index
24
27
30
33
Trend Analysis Plot for Oil Revenue
Quadratic Trend Model
Yt = 294817 - 44194* t + 2013* t* * 2
1200000
Variable
A ctual
F its
Oil Revenue
1000000
A ccuracy Measures
MA PE
3.55741E+01
MA D
7.94309E+04
MSD
1.23704E+10
800000
600000
400000
200000
0
3
6
9
12
15
18
21
Index
29
24
27
30
33
Trend Analysis Plot for Oil Revenue
Growth Curve Model
Yt = 55445.0 * (1.0792**t)
1200000
Variable
A ctual
F its
Oil Revenue
1000000
A ccuracy Measures
MA PE
4.79737E+01
MA D
1.26068E+05
MSD
3.40103E+10
800000
600000
400000
200000
0
3
6
9
12
15
18
21
Index
24
27
30
33
Figure 2.17: Trend Analysis of Oil Revenue
Table 2.11: Trend Summary of Oil Revenue
Model
MAPE
MAD
MSD
Linear
8.35439E+01
1.64688E+05
4.23297E+10
Quadratic
3.55741E+01
7.94309E+04
1.23704E+10
Exponential
4.79737E+01
1.26068E+05
3.40103E+10
Results presented in Table 2.11 clearly show that the quadratic trend model fits the data better than
the other two models.
2.3.2 Budget in Higher Education
Next we consider the budget in higher education. Figure 2.18 gives linear, quadratic and
exponential trend fits of this data. From the figure it is clear that the budget in higher education
shows an increasing trend. We also observe from this plot that both quadratic and exponential
models adequately fit the data.
30
Trend Analysis Plot for Budgei in HE
Linear Trend Model
Yt = -38718871627 + 5497524487*t
Variable
A ctual
Fits
2.0000E+11
A ccuracy Measures
MA PE
5.86496E+04
MA D
1.89828E+10
MSD
5.58537E+20
Budgei in HE
1.5000E+11
1.0000E+11
5.0000E+10
0
3
6
9
12
15 18 21
Index
24
27
30
33
Trend Analysis Plot for Budgei in HE
Budgei in HE
Quadratic Trend Model
Yt = 12499933066 - 3038942962*t + 243899070*t**2
2.0000E+11
Variable
A ctual
Fits
1.5000E+11
A ccuracy Measures
MA PE
1.64690E+04
MA D
8.29748E+09
MSD
1.18811E+20
1.0000E+11
5.0000E+10
0
3
6
9
12
15 18 21
Index
24
27
30
33
Trend Analysis Plot for Budgei in HE
Budgei in HE
Growth Curve Model
Yt = 102994932 * (1.3105**t)
1.0000E+12
Variable
A ctual
Fits
8.0000E+11
A ccuracy Measures
MA PE
5.35190E+02
MA D
8.71668E+10
MSD
3.87341E+22
6.0000E+11
4.0000E+11
2.0000E+11
0
3
6
9
12
15 18 21
Index
24
27
30
33
Figure 2.18: Trend Analysis of Budget in Higher Education
31
Table 2.12: Trend Summary of Budget in Higher Education
Model
MAPE
MAD
MSD
Linear
5.86496E+04
1.89828E+10
5.58537E+20
Quadratic
1.64690E+04
8.29748E+09
1.18811E+20
Exponential
5.35190E+02
8.71668E+10
3.87341E+22
Results presented in Table 2.12 clearly show that the exponential trend model fits the data better
than the other two models.
2.3.3 Oil Price
Next we consider oil price. Figure 2.19 gives linear, quadratic and exponential trend fits of this
data. From the figure it is clear that oil price shows an increasing trend. We also observe from this
plot that both quadratic and exponential models adequately fit the data.
Trend Analysis Plot for Oil Price
Linear Trend Model
Yt = 30.67 + 0.877* t
100
Variable
A ctual
F its
90
80
A ccuracy Measures
MA PE
59.160
MA D
20.980
MSD
554.086
Oil Price
70
60
50
40
30
20
10
3
6
9
12
15
18
Index
21
32
24
27
30
33
Trend Analysis Plot for Oil Price
Quadratic Trend Model
Yt = 82.96 - 7.838* t + 0.2490* t* * 2
Variable
A ctual
F its
100
90
A ccuracy Measures
MA PE
18.8959
MA D
7.6090
MSD
95.7177
Oil Price
80
70
60
50
40
30
20
10
3
6
9
12
15
18
Index
21
24
27
30
33
Trend Analysis Plot for Oil Price
Growth Curve Model
Yt = 28.291 * (1.01911**t)
100
Variable
A ctual
F its
90
80
A ccuracy Measures
MA PE
48.344
MA D
19.889
MSD
565.457
Oil Price
70
60
50
40
30
20
10
3
6
9
12
15
18
Index
21
24
27
30
33
Figure 2.19: Trend Analysis of Oil Price
Table 2.13: Trend Summary of Oil Price
Model
MAPE
MAD
MSD
Linear
59.160
20.980
554.086
Quadratic
18.8959
7.6090
95.7177
Exponential
48.344
19.889
565.457
Results presented in Table 2.13 clearly show that the quadratic trend model fits the data better than
the other two models.
33
2.4 Summary Results of Trend Analysis
In this section we summarize the above trend results. Altogether we have considered 13 variables.
Table 2.14 gives a quick view regarding which model is appropriate for which variable.
Table 2.14: Trend Summary
Variable
Model
Direction
Total Number of Students
Exponential
Increasing
Students in Social Science
Exponential
Increasing
Students in Natural Science
Exponential
Increasing
Students in Medical Science
Exponential
Increasing
Students in Law
Exponential
Increasing
Students in Humanities
Quadratic
Increasing
Students in Fine Arts
Exponential
Increasing
Students in Engineering
Quadratic
Increasing
Students in Education
Exponential
Increasing
Students in Agriculture
Quadratic
Increasing
Oil Revenue
Quadratic
Increasing
Budget in Higher Education
Exponential
Increasing
Oil Price
Quadratic
Increasing
The above results show that out of 13 variables not a single one fit a linear trend model. For most
of the variables both quadratic and exponential models perform similar but on 8 cases exponential
model fit the data better and on 5 remaining cases quadratic model performs better and all of them
show increasing trend.
34
CHAPTER 3
Comparison between Genders and Different Programs
We have separate information regarding male and female Saudi Arabia students who are studying
abroad. In this chapter we would like to see whether there is any gender discrimination. We would
also like to see that whether there is a significant difference among the number of students studying
different programs.
3.1 Comparison between Genders
At first we would like to investigate whether there is any gender discrimination. At first we will
look at the number of male and female students in different programs.
3.1.1 Social Science
Figure 3.1 gives a time series plot of the number of male and female students in Social Science
program.
Figure 3.1: Time Series Plot of Male and Female Students in Social Science
35
It is clear from this figure that the number of male students is consistently higher but the gap
becomes very high in the recent years.
3.1.2 Natural Science
Figure 3.2 gives time series plot of the number of male and female students in Natural Science
program.
Figure 3.2: Time Series Plot of Male and Female Students in Natural Science
It is clear from this figure that the number of male students is consistently higher but the gap
becomes very high in the recent years.
3.1.3 Medical Science
Figure 3.3 gives a time series plot of the number of male and female students in Medical Science
program.
36
Figure 3.3: Time Series Plot of Male and Female Students in Medical Science
It is clear from this figure that the number of male students is consistently higher but the gap
becomes very high in the recent years.
3.1.4 Law
Figure 3.4 gives a time series plot of the number of male and female students in Law program.
Figure 3.4: Time Series Plot of Male and Female Students in Law
It is clear from this figure that the number of male students is consistently higher but the gap
becomes very high in the recent years.
37
3.1.5 Humanities
Figure 3.5 gives a time series plot of the number of male and female students in Humanities
program.
Figure 3.5: Time Series Plot of Male and Female Students in Humanities
It is clear from this figure that the number of female students was higher initially. Then the gap
between male and female gets narrowed. However, in recent years the number of male students
gets increased and currently it is more than the female students.
3.1.6 Engineering
Figure 3.6 gives a time series plot of the number of male and female students in Engineering
program.
38
Figure 3.6: Time Series Plot of Male and Female Students in Engineering
It is clear from this figure that the number of male students is consistently higher but the gap
becomes a rocket high in the recent years.
3.1.7 Education
Figure 3.7 gives a time series plot of the number of male and female students in Education
program.
Figure 3.7: Time Series Plot of Male and Female Students in Education
It is clear from this figure that the number of male students was higher before but the gap gets
narrowed and currently the number of female students has overtaken the number of male students.
39
3.1.8 Fine Arts
Figure 3.8 gives a time series plot of the number of male and female students in Fine Arts program.
Figure 3.8: Time Series Plot of Male and Female Students in Fine Arts
Probably this is the only program where the number of female students is consistently higher
than male students and the gap becomes higher in the recent years.
3.1.9 Agriculture
Figure 3.9 gives a time series plot of the number of male and female students in Agriculture
program.
Figure 3.9: Time Series Plot of Male and Female Students in Agriculture
40
Figure 3.9 shows that that the number of male students was much higher before. The gap narrowed
down gradually but the number of male students is consistently higher than the female students.
3.2 Tests for the Equality of Means between Male and Female
Students
In the previous section we have seen that in almost every program the number of male students is
higher than that of the female students. As we know graphs are very subjective here we test the
difference between mean of male and female students. Let us denote the number of male students
by X and the number of female students by Y. We are interested in testing the hypothesis
against
.
H 0 :  X  Y
H1 :  X  Y
Under H 0 , the test statistic becomes
X Y
Z
( X / n)  ( Y / m)
2
2
Assuming further normality and large sample sizes, the critical region for the test becomes
| x  y | z / 2
S
2
X
 
/ n  SY / m
2

We test the equality of mean of male and female students for all nine programs and the results are
presented below. We present the average number of male and female students, z-value and its
corresponding p-value, whether the difference is significant or not, and if so, to which gender it is
biased. It is worth mentioning that * stands for significant at the 10% level, ** stands for significant
at the 5% level and *** stands for significant at the 1% level.
41
Table 3.1: Summary Test Results for the Equality of Means between Male and Female Students
Program
Male
Female (Ave)
z-value
p-value
Difference
Biased to
(Ave)
Social Science
2737
722
2.20
0.032
**Significant
Male
Natural Science
3146
1137
2.09
0.040
**Significant
Male
Medical
3102
1388
1.86
0.068
*Significant
Male
Law
546
109
2.65
0.010
**Significant
Male
Humanities
890
957
-0.25
0.807
Insignificant
Fine Arts
57
127
-1.57
0.121
Insignificant
Engineering
4374
150
3.17
0.002
***Significant
Education
421
438
-0.14
0.887
Insignificant
Agriculture
79.6
5.74
11.40
0.000
***Significant
Science
Male
Male
It is clear from this table that the number of male students is significantly higher than the number
of female students in 6 out of 9 programs. Female students are more in only three programs but
the differences are not statistically significant. So we can say that male students have advantageous
position than female students.
3.2.1 Comparison among All Programs
Now we would like to see whether there is any difference among the number of students studying
different programs.
42
Table 3.2: Average Number of Students in Different Programs
Program
Average Number of Students
Social Science
3459
Natural Science
4284
Medical Science
4490
Law
655
Humanities
1847
Fine Arts
184.6
Engineering
4524
Education
859
Agriculture
85.32
ure, Education, Engineering, Fine Arts, Humanities, Law, Medical Science, Natural Scie
35000
30000
Data
25000
20000
15000
10000
5000
0
re
l tu
u
c
ri
Ag
a
uc
Ed
n
tio
r
ee
n
i
g
En
ing
e
Fi n
Ar
ts
m
Hu
s
itie
an
w
La
al
dic
e
M
ce
ien
c
S
ra
tu
a
N
l
ce
ien
c
S
l
ci a
So
Figure 3.10: Box Plot of Number of Students in Different Programs
43
ce
i en
c
S
The above table and the figure clearly shows differences in the average number of students, but
we also need to know whether this difference is statistically significant or not.
3.2.2 Tests for the Equality of Means among All Programs
Frequently, experiments want to compare more than two components. We will be comparing the
means of m normal distributions under the assumption that the variances are all the same. Let us
now consider m normal distributions with unknown means
1 , 2 ,..., m and an unknown but
common variance  2 . We wish to test the null hypothesis
.
H 0 : 1  2  ...  m  
X 11
X 12
X1j
X 1n1
X 1.
X 21
X 22
X2j
X 2n2
X 2.
X i1
X i2
X ij
X ini
X i.
X m1
X m2
X mj
X mnm
X m.
X ..
ni
The i-th group mean is
X i. 
X
j 1
where
X .. 
, i = 1, 2, …, m
ni
m
and the grand mean is
ij
ni
 X
i 1 j 1
n
m
ij

n X
i 1
i
n
n  n1  n2  ...  nm.
44
i.
To determine a critical region for a test of H 0 , we partition the total sum of squares as
 X
m
SS (TO) =
ni
i 1 j 1
 n X
m
Let
i
i 1
 X
ni
m
i 1 j 1
ij  X ..   X ij  X i .  X i .  X ..  =
m
ni
2
i 1 j 1
 X
m
ni
i 1 j 1
ij  X i .    ni  X i .  X .. 
2
m
2
i 1
 X ..  = SS (Programs), the sum of squares among the different programs.
2
i.
 X i. 
2
ij
2
= SS (Error), the sum of squares within programs (often called the error
sum of squares).
It is easy to show that
X ij  X i. 
 

2
2

~

1
,
and
2
2
n
m
j

1
i
X ij  X ..  /  ~  n  1  / ni
2

~  2 ni  1
2


X

X
 ij i. 
i 1 j 1
m
Hence,  ni X i.  X .. 2 /  2 ~  2 m  1 and i 1 j 1
~  2 n  m 
m
X
ni
ni
2
i.
2
2
i 1
SSProgram  / m  1
~ Fm1,nm
SSError  / n  m
Thus
The information used for the tests of the equality of several means is often summarized in an
analysis of variance (ANOVA) table.
Source
Sum of Squares (SS)
Degrees of Freedom
Mean Squares (MS)
F Ratio
Programs
SS(P)
m–1
MS(P) = SS(P)/(m – 1)
MS(P)/MS(E)
Error
SS(E)
n–m
MS(E) = SS(E)/(n – m)
Total
SS(T)
n–1
We would reject H 0 if the observed value of F is too large. Thus the critical region is in the form
.
F  F ;m1,nm
45
3.3 Comparison of the Individual Treatment Means
There are several methods by which we can compare treatment means.
3.3.1 The Least Significance Difference (Fisher’s LSD) Method
Suppose that following an analysis of variance F test where the null hypothesis is rejected, we
wish to test
H 0 : i   j for all i  j.
This could be done by using the t statistic
t=
yi.  y j .


EMS 1 / ni   1 / n j 
The pair of means  i and  j would be declared significantly different if


| yi.  y j . |  t(1 / 2), N  p EMS 1 / ni   1 / n j 
The quantity
LSD = t(1 / 2), N  p EMS1 / ni   1 / n j 
is called the least significant difference.
A design is called balanced when n1 = n2 = … = n p = n, and
LSD = t(1 / 2), N  p 2EMS/n
46
3.3.2 Duncan’s Multiple Range Test
A widely used procedure for comparing all pairs of means is the multiple range test proposed by
Duncan. We first arrange the p treatment means in ascending order and compute the standard error
of each average as
s y1.  EMS / nh
p
where nh  p / 1 / ni .
i 1
If n1 = n2 = … = n p = n, we have nh = n, and hence s y1.  EMS / n
The significant ranges are calculated as
Rk  r k , N  p  s y1. ,
k = 2, 3, …, p
where the values of r k , N  p  is obtained from a table given by Duncan. Then the observed
differences between means are tested, beginning with the largest versus smallest and compared
with the least significant range R p . Next, the difference between the largest and the second
smallest is computed and compared with the least significant range R p1 . Finally, the difference
between the second largest and the smallest is computed and compared with the least significant
range R p1 . This process is continued until the differences of all possible p(p–1)/2 pairs of means
have been considered. If an observed difference is greater than the corresponding least significant
range, then we conclude that the pair of means in question is significantly different.
3.3.3 The Newman-Keuls Test
This test is similar to Duncan’s multiple range test, except that the critical difference between
means are calculated differently. Here we compute a set of critical values
47
K k  q k , N  p  s y1. ,
k = 2, 3, …, p
where q k , N  p  is the upper  percentage point of the Studentized range for groups of means
of size k and N – p error degrees of freedom.
The Studentized range is defined as
q=
ymax  ymin
EMS / n
3.3.4 Tukey’s Test
Tukey proposed a multiple comparison procedure based on the Studentized range statistic. His
procedure requires the use of q  p, N  p  to determine the critical value of all pairwise
comparisons, regardless of how many means are in the group. Thus, Tukey’s test declares two
means significantly different if the absolute value of their sample differences exceeds
T = q  p, N  p  s y1.
3.4 Result Summary
At first we would like to test the equality of mean number of students in nine programs. The
summary results are presented in Table 3.3.
Table 3.3 ANOVA Table for the Equality of Mean Test of Nine Programs
Source
SS
DF
MS
Programs
998821022
8
124852628
Error
7322357160
297
24654401
Total
8321178183
305
48
F Ratio
5.06
p-value
0.000
Table 3.3 clearly shows that the programs effect is highly significant. So we must reject the
hypothesis of equal mean for the nine programs.
Now in search of which programs differ significantly from the other programs we report Tukey’s
test and Fisher’s LSD as they are very effective and readily available in MINITAB. Here we
present only the summary result the details result is presented in the Appendix.
Grouping Information Using Tukey Method
Engineering
Medical Science
Natural Science
Social Science
Humanities
Education
Law
Fine Arts
Agriculture
N
34
34
34
34
34
34
34
34
34
Mean
4524
4490
4284
3459
1847
859
655
185
85
Grouping
A
A
A B
A B C
A B C
A B C
B C
C
C
Tukey’s test shows that most of the Saudi Arabia students go abroad to study Engineering and
Medical Science and the least number of students study Agriculture and Fine Arts.
Grouping Information Using Fisher Method
Engineering
Medical Science
Natural Science
Social Science
Humanities
Education
Law
Fine Arts
Agriculture
N
34
34
34
34
34
34
34
34
34
Mean
4524
4490
4284
3459
1847
859
655
185
85
Grouping
A
A
A
A B
B C
C
C
C
C
However, Fisher’s LSD shows most of the Saudi Arabia students go abroad to study Engineering,
Medical Science and Natural Science and the least popular programs are Agriculture, Fine Arts,
Law and Education.
49
CHAPTER 4
Modeling and Fitting of Data Using Regression
Diagnostics and Robust Regression
In this chapter at first we discuss classical regression method with diagnostics and then discuss
some robust methods that are commonly used in regression. We will employ all these things to
investigate which variables have significant impact on the number of Saudi Arabia students
studying abroad.
4.1 Classical Regression Analysis
Regression is probably the most popular and commonly used statistical method in all branches of
knowledge. It is a conceptually simple method for investigating functional relationships among
variables. The user of regression analysis attempts to discern the relationship between a dependent
(response) variable and one or more independent (explanatory/predictor/regressor) variables.
Regression can be used to predict the value of a response variable from knowledge of the values
of one or more explanatory variables.
We write the multiple regression model as
Yi   0  1 X 1i   2 X 2i  ...   k X ki   i ,
i = 1, 2, …, n
(4.1)
where Y is the dependent variable, the X’s are the independent variables, and  is the error term.
Here we have a dependent variable and k explanatory variables excluding the intercept term. This
model is also called a k + 1 variable regression model.
50
The assumptions of the multiple regression model are quite similar to those of the two-variable
linear regression model:

The relationship between Y and X is linear. But no exact linear relationship exists between
two or more X’s.

The X’s are nonstochastic variables whose values are fixed.

The error has zero expected values: E(  ) = 0

The error term has constant variance for all observations, i.e.,
E(  i ) =  2 , i = 1, 2, …, n.
2

The random variables  i are statistically independent. Thus,
E(  i  j ) = 0, for all i  j.

The error term is normally distributed.
4.1.1 Estimation Technique
We can express the multiple regression model in matrix notation as:
Y=X  + 
(4.2)
Where
 y1 
 
 y2 
Y=  
...
 
y 
 n
 1 x11
1 x
12
X= 
... ...

 1 x1n
... xk1 
... xk 2 
... ... 

... xkn 
0 
 
 1 
 = 
...
 
 
 k
 1 
 
 
 =  2
...
 
 
 n
We obtain the OLS estimate of k unknown parameters  0 ,  1 , …,  k in such a way that the sum
n
2
of squares (SS)   i    = Y  X  Y  X  is minimized.
i 1
51
The value of  that minimizes   is given by the solution to
   
=0

We get
   
= 2 X Y – 2 X X = 0  ˆ =  X X 1 X Y

(4.3)
We also have
V ( ˆ ) =  2  X X 1
(4.4)
For this model, the residuals are
ˆi  Yi  Yˆi  Y  ˆ0  ˆ1 X1i  ˆ2 X 2i  ...  ˆk X ki ,
i = 1, 2, …, n
(4.5)
n
2
An unbiased and consistent estimate of  2 is s 2   ˆi /( n  k  1) . The estimated standard error
i 1
of ˆ j is s ˆ  s 2V j , where V j is the j-th diagonal element of  X X 1 . When the errors are
j
normally distributed, then
ˆ j   j
s ˆ
~ t n  k 1
j
4.1.2 Checking for Goodness of Fit
We can use the R 2 statistic as a measure of goodness of fit for the multiple regression model. We
know that
n
 ˆi
2
RSS
ESS
=1–
= 1 – n i 1
R2 =
2
TSS
TSS
 Yi  Y 
(4.6)
i 1
R 2 is the proportion of the total variation in Y explained by the regression of Y on X. It is easy to
show that R 2 ranges in value between 0 and 1. But it is only a descriptive statistics. Roughly
52
speaking, we associate a high value of R 2 (close to 1) with a good fit of the model by the regression
line and associate a low value of R 2 (close to 0) with a poor fit. How large must R 2 be for the
regression equation to be useful? That depends upon the area of application. If we could develop
a regression equation to predict the stock market, we would be ecstatic if R 2 = 0.50. On the other
hand, if we were predicting death in road accident, we would want the prediction equation to have
strong predictive ability, since the consequences of poor prediction could be quite serious.
But the difficulty with R 2 as a measure of goodness of fit is that it does not account for the number
of degrees of freedom. A natural solution is to use variances, not variations and that help to define
a corrected (adjusted) R 2 , defined as
R 2 = 1 – [Estimated V(  ) / Estimated V(Y)]
Now
n
Estimated V(  ) = s 2   ˆi /( n  k  1)
2
i 1
and
Estimated V(Y) =  Yi  Y  / (n – 1)
n
2
i 1
Thus the corrected R 2 becomes
n
R =1–
2
 ˆi
2
i 1
 Yi  Y 
n
2
n 1
n 1
= 1  1  R 2 
n  k 1
n  k 1
(4.7)
i 1
4.1.3 Tests of Regression Coefficients
We often like to establish that the explanatory variable X has a significant effect on Y, that the
coefficient of X (which is  ) is significant. In this situation the null hypothesis is constructed in
53
way that makes its rejection possible. We begin with a null hypothesis, which usually states that a
certain effect is not present, i.e.,  = 0. We estimate ˆ and its standard error from the data and
compute the statistic
t=
ˆ
~ t n k 1
s ˆ
(4.8)
4.2 Regression Diagnostics
Diagnostics are designed to find problems with the assumptions of any statistical procedure. In
diagnostic approach we estimate the parameters (in regression fit the model) by the classical
method (the OLS) and then see whether there is any violation of assumptions and/or irregularity
in the results regarding the six standard assumptions mentioned at the beginning of this section.
But among them the assumption of normality is the most important assumption.
4.2.1 Test for Normality
The normality assumption means the errors are distributed as normal. The simplest graphical
display for checking normality in regression analysis is the normal probability plot. This method
is based in the fact that if the ordered residuals are plotted against their cumulative probabilities
on normal probability paper, the resulting points should lie approximately on a straight line. An
excellent review of different analytical tests for normality is available in Imon (2003). A test based
on the correlation of true observations and the expectation of normalized order statistics is known
as the Shapiro – Wilk test. A test based on empirical distribution function is known as Anderson
– Darling test. It is often very useful to test whether a given data set approximates a normal
distribution. This can be evaluated informally by checking to see whether the mean and the median
54
are nearly equal, whether the skewness is approximately zero, and whether the kurtosis is close to
3. A more formal test for normality is given by the Jarque – Bera statistic:
JB = [n / 6] [ S 2  ( K  3) 2 / 4]
(4.9)
Imon (2003) suggests a slight adjustment to the JB statistic to make it more suitable for the
regression problems. His proposed statistic based on rescaled moments (RM) of ordinary least
squares residuals is defined as
RM = [n c 3 / 6] [ S 2  c ( K  3) 2 / 4]
(4.10)
where c = n/(n – k), k is the number of independent variables in a regression model. Both the JB
and the RM statistic follow a chi square distribution with 2 degrees of freedom. If the values of
these statistics are greater than the critical value of the chi square, we reject the null hypothesis of
normality.
4.2.2 Outliers
In Statistics we often observe that the values of descriptive measures are often much influenced
by few extreme observations which are commonly known as outliers. According to Barnett and
Lewis (1993), ‘Observations which stand apart from the bulk of the data are called outliers.’
Different aspects of outliers with its consequences are discussed by Hadi, Imon and Werner (2009).
Hampel et al. (1986) claim that a routine data set typically contains about 1-10% outliers, and even
the highest quality data set cannot be guaranteed free of outliers. to Barnett and Lewis (1993)
commented ‘Any outliers, however, are always extreme values in the sample.’ But this statement
is not always true, especially in regression analysis.
55
In a regression problem, observations are judged as outliers on the basis of how unsuccessful the
fitted regression equation is in accommodating them and that is why observations corresponding
to excessively large residuals are treated as outliers.
Types of Outliers
X – Outlier: This is a point that is outlying in regard to the x–coordinate. In the literature an X–
outlier is more popularly known as a high leverage point.
Y – Outlier: This is a point that is outlying only because its y–coordinate is extreme.
X – and Y – Outlier: A point that is outlying in both x and y coordinates is known as x – and y –
outlier.
Residual Outlier: This is a point that has a large standardized (deletion) residual. Most of the
commonly used outlier detection methods are based on this approach where an observation is
judged as outlier on the basis of how unsuccessful the fitted regression equation is in
accommodating it.
Detection of Outliers
We often use the following three types of residuals for the identification of outliers.
Standardized residuals
T
yi  xi ˆ
di 
ˆ
T
yi  xi ˆ
Studentized residuals ri  ˆ 1  w
ii
, i = 1, 2, …, n
(4.11)
, i = 1, 2, …, n
(4.12)
Deletion Studentized (Externally Studentized or R-Student) residuals
T
yi  xi ˆ
ti 
ˆ i  1  wii
, i = 1, 2, …, n
56
(4.13)
2
where ˆ i  is the OLS estimates of the mean squared error (MSE) based on a data set with the i-
th observation deleted.
As a thumb rule we call an observation outlier when its corresponding residual value exceeds 3 in
absolute value. A good review of recent outlier detection techniques in linear regression is
available in Imon (2008), and Hadi, Imon and Werner (2009).
4.2.3 Multicollinearity
One basic assumption of the multiple regression model is that there is no exact linear relationship
between any of the independent variables in the model. If such an exact linear relationship does
exist, we say that the independent variables are perfectly collinear or that perfect collinearity exists.
Multicollinearity arises when two or more variables (or combinations of variables) are highly
correlated with each other.
Effects of Multicollinearity

Wrong interpretation of the regression coefficients

Large variances and covariances for the OLS estimators of the regression parameters

Unduly large (in absolute value) estimates of the regression parameters
Indications of Multicollinearity
High Correlation Values
Calculate regression coefficients between all explanatory variables and test the maximum (in
absolute value) correlation coefficient by the statistic t =
57
rij n  2
1  rij
2
~ tn  2
There is an evidence of multicollineatiy at the 5% level of significance if
|t| > tn  2,0.975
Large Variance Inflation Factor
We know that the variance of ˆ j is  2V j , where V j is the j-th diagonal element of  X X 1 .
Consequently V( ˆ j ) is large, if V j is large. Hence V j will be called the variance inflation
factor (VIF) of the explanatory variable
X j . One or more large VIF’s indicate
multicollienarity.
Thumb rule:
VIF < 5
No multicollinearity
5  VIF  10
Moderate multicollinearity
VIF > 10
Severe multicollinearity
Large Condition Number
A condition number is associated with the characteristic roots (eigen values) of the matrix  X X  .
The condition number of  X X  is defined as
max
min

A large condition number indicates the existence of multicollinearity.
 < 10
Thumb rule:
No multicollinearity
10    30
Moderate multicollinearity
 > 30
Severe multicollinearity
Low Tolerance Value
Tolerance values are defined as inverse of VIF values. In other words, we can define
Tolerance value = 1/VIF
58
Since tolerance values are inverse of VIF’s, low tolerance values indicate multicollinearity
problem.
Thumb rule:
VIF > 0.2
0.1  VIF  0.2
VIF < 0.1
No multicollinearity
Moderate multicollinearity
Severe multicollinearity
4.2.4 Variable Selection
In some applications theoretical considerations or prior experience can be helpful in selecting the
regressors to be used in the model. Building a regression model that includes only a subset of
available regressors involves two conflicting objectives.
1. We would like the model to include as many regressors as possible so that the information
content in these factors can influence the fitted value of the response.
2. We want the model to include as few regressors as possible because the variance of the fitted
response increases as the number of regressors increases. Also the more regressors there are in a
model, the greater the cost of data collection and model maintenance.
Finding an appropriate subset of regressors for the model is called the variable selection problem.
Graphical Methods
A number of graphical displays are used for variable selection. Here is a list of few of them

Added Variable Plot

Partial Residual (PR) plot (Ezekiel, 1924)

Component and Component-plus-residual (CCPR) plot (Wood, 1973)
59

Augmented Partial Residual (APR) plot (Mallows, 1986)

Conditional Expectation and Residual (CERES) plot (Cook, 1993)

Robust Added Variable Plot (Imon, 2003)
Model Selection Criteria
Minimum Residual Mean Square (RMS)
ˆ 2 
SSE
n  k 1
n
where SSE =  ( yi  yˆi ) 2is the residual sum of squares, n is the number of observations, k is the
i 1
number of explanatory variables.
Maximum R-Square
R2  1 
SSE
,
SST
n
where SST   ( yi  y ) 2 is the total sum of squares.
i 1
Maximum Adjusted R-Square
Ra2  1 
SSE /( n  k  1)
SST /( n  1)
Akaike Information Criterion
For a model with p = k + 1 predictors including the intercept, the Akaike information criterion
suggests to choose p for which the statistic
n
AIC (p) = ln  1  ˆi 2   2 p
 n i 1  n
will be minimized. This statistic imposes a penalty for including insignificant variables.
60
Mallows Cp
For a model with p predictors,
Cp 
Y T ( I  W )Y
 (2 p  n),
σˆ 2
where ˆ 2 is a good estimate of s2 (usually obtained from the full model). The above expression
can be reexpressed as
Cp 
n  p ˆ p 2
σˆ 2
 (2 p  n),
2
where ˆ p is the MSE from the sub model. It is straight forward to show that for the full model
C p = p. But here we search for a sub model where C p ≈ p for a value of p which is less than the
value of p for the full model.
Other Model Selection Criteria

Schwarz Criterion (SC)

Bayesian Information Criterion (BIC)

Final Prediction Error (FPE) or Prediction Criterion (PC)

Hannan-Quinn Criterion (HQC)
Variable Selection Methods
Forward Selection
Start with the empty model, then add the most significant variable (the one with the largest t-value
or smallest p-value). Repeat until all candidate variables to enter the model have insignificant
regression coefficients.
61
Backward Elimination
Start with the full model, then delete the least significant variable (the one with the smallest t-value
or largest p-value). Repeat until all regression coefficients in the model are significant.
Stepwise Method
This is a combination of forward selection and backward elimination methods.
4.3 Robust Regression
Robustness is now playing a key role in time series. According to Kadane (1984) ‘Robustness is a
fundamental issue for all statistical analyses; in fact it might be argued that robustness is the
subject of statistics.' The term robustness signifies insensitivity to small deviations from the
assumption. That means a robust procedure is nearly as efficient as the classical procedure when
classical assumptions hold strictly but is considerably more efficient over all when there is a small
departure from them. The main application of robust techniques in a time series problem is to try
to devise estimators that are not strongly affected by outliers or departures from the assumed
model. In time series, robust techniques grew up in parallel to diagnostics [see Hampel et al.
(1986)] and initially they were used to estimate parameters and to construct confidence intervals
in such a way that outliers or departures from the assumptions do not affect them. A large body of
literature is now available [Rousseuw and Leroy (1987), Maronna, Martin, and Yohai (2006), Hadi, Imon
and Werner (2009)] for robust techniques that are readily applicable in linear regression or in time series.
62
4.3.1. L – estimator
A first step toward a more robust time series estimator was the consideration of least absolute values
estimator (often referred to as L – estimator). In the OLS method, outliers may have a very large influence
since the estimated parameters are estimated by minimizing the sum of squared residuals
n
u
t 1
2
t
L estimates are then considered to be less sensitive since they are determined by minimizing the sum of
absolute residuals
n
| u
t 1
t
|
The L estimator was first introduced by Edgeworth in 1887 who argued that the OLS method is over
influenced by outliers, but because of computational difficulties it was not popular and not much used
until quite recently. Sometimes we consider the L – estimator as a special case of L p -norm estimator in
the literature where the estimators are obtained by minimizing
n
| u
t 1
t
|p
The L1 -norm estimator is the OLS, while the L2 - norm estimator is the L – estimator. But unfortunately
a single erroneous observation (high leverage point) can still totally offset the L-estimator.
4.3.2. Least Median of Squares
Rousseeuw (1984) proposed Least Median of Squares (LMS) method which is a fitting technique less
sensitive to outliers than the OLS. In OLS, we estimate parameters by
n
Minimizing the sum of squared residuals
u
t 1
63
2
t
Which is obviously the same if we
1 n 2
Minimize the mean of squared residuals  u t .
n t 1
Sample means are sensitive to outliers, but medians are not. Hence to make it less sensitive we can replace
the mean by a median to obtain median sum of squared residuals
2
MSR ( ˆ ) = Median { uˆ t }
(4.14)
Then the LMS estimate of  is the value that minimizes MSR ( ˆ ). Rousseeuw and Leroy (1987) have
shown that LMS estimates are very robust with respect to outliers and have the highest possible 50%
breakdown point.
4.3.3. Least Trimmed Squares
The least trimmed (sum of) squares (LTS) estimator is proposed by Rousseeuw (1984). In this method
we try to estimate  in such a way that
LTS ( ˆ ) = minimize
h
 uˆ  
t 1
t
2
(4.15)
Here ût  is the t-th ordered residual. For a trimming percentage of  , Rousseeuw and Leroy (1987)
suggested choosing the number of observations h based on which the model is fitted as h = [n (1 –  )]
+ 1. The advantage of using LTS over LMS is that, in the LMS we always fit the regression line based
on roughly 50% of the data, but in the LTS we can control the level of trimming. When we suspect that
the data contains nearly 10% outliers, the LTS with 10% trimming will certainly produce better result
than the LMS. We can increase the level of trimming if we suspect there are more outliers in the data.
64
4.3.4 Reweighted Least Squares
Another way to obtain a set of results based on a robust fit is the method of Reweighted Least
Squares (RLS) proposed by Rousseeuw and Leroy (1987). In this method, the parameters are
estimated by the LMS method and the outliers are identified. After that the final model is fitted by
the least squares without the potential outliers. Since this fitting does not involve any outliers this
method is claimed to be more appropriate for the majority of the observations. However, the
residuals of the deleted points are reestimated from the robust fit to produce a full set of residuals.
4.4 Regression Results
Here we employ regression method to understand which variables have significant impact on the
number of Saudi Arabia Students studying abroad. Budget in higher education can be an immediate
choice. Saud Arabia economy heavily relies on oil. So the two other variables one can consider
are oil price and oil revenue. We begin with a simple linear regression model with the number of
Saudi Arabia students studying abroad on the three explanatory variables one at a time.
Figure 4.1 gives a scatter plot of the total number of students versus budget in higher education.
We observe an upward and strong linear relationship between these two variables. The attached
MINITAB output shows that the value of R 2 is 0.83 and the p-value corresponding to the variable
budget in higher education is highly significant (0.000).
65
Scatterplot of Total No. of Students vs Budgei in HE
100000
Total No. of Students
80000
60000
40000
20000
0
0
5.0000E+10
1.0000E+11
Budgei in HE
1.5000E+11
2.0000E+11
Figure 4.1: Scatter Plot of the Total Number of Students vs Budget in Higher Education
Regression Analysis: Total No. of Students versus Budget in HE
The regression equation is
Total No. of Students = - 5982 + 0.000000 Budget in HE
Predictor
Constant
Budget in HE
Coef
-5982
0.00000046
S = 12621.3
R-Sq = 83.0%
SE Coef
3025
0.00000004
T
-1.98
12.48
P
0.057
0.000
VIF
1.000
R-Sq(adj) = 82.4%
Figure 4.2 gives a scatter plot of the total number of students versus budget in higher education.
We observe an upward and linear relationship between these two variables. The attached
MINITAB output shows that the value of R 2 is 0.529 which is not great. This graph also shows
that probably there are few outliers in this data. So we think it will be a good idea to employ a
robust regression here. We fit the reweighted least squares (RLS) method to this data and the fitted
plot is presented in Figure 4.3.
66
Scatterplot of Total No. of Students vs Oil Price
100000
Total No. of Students
80000
60000
40000
20000
0
10
20
30
40
50
60
Oil Price
70
80
90
100
Figure 4.2: Scatter Plot of the Total Number of Students vs Oil Price
Regression Analysis: Total No. of Students versus Oil Price
The regression equation is
Total No. of Students = - 19210 + 860 Oil Price
Predictor
Constant
Oil Price
Coef
-19210
860.3
S = 20985.0
SE Coef
7525
143.6
R-Sq = 52.9%
T
-2.55
5.99
P
0.016
0.000
VIF
1.000
R-Sq(adj) = 51.4%
OLS and RLS Fit of Total No. of Students vs Oil Price
Variable
O LS
RLS
Total No. of Students
100000
No. of Students
80000
60000
40000
20000
0
10
20
30
40
50
60
Oil Price
70
80
90
100
Figure 4.3: RLS and OLS Fit of the Total Number of Students vs Oil Price
67
Regression Analysis: Total No. of Students_1 versus Oil Price_1
The regression equation is
Total No. of Students_1 = - 29017 + 1363 Oil Price_1
Predictor
Constant
Oil Price_1
S = 6886.77
Coef
-29017
1362.94
SE Coef
2561
56.54
R-Sq = 96.2%
T
-11.33
24.10
P
0.000
0.000
R-Sq(adj) = 96.0%
We observe from Figure 4.3 that the robust RLS fit the data much better than the traditionally used
OLS fit. Now we observe an upward and very linear relationship between these two variables. The
attached MINITAB output shows that the value of R 2 gets increased from 0.529 to 0.962 which
is a huge improvement. So we can say robust regression performs much better than the classical
regression method here.
Scatterplot of Total No. of Students vs Oil Revenue
Total No. of Students
100000
80000
60000
40000
20000
0
0
200000
400000
600000
Oil Revenue
800000
1000000
1200000
Figure 4.4: Scatter Plot of the Total Number of Students vs Oil Revenue
68
Figure 4.4 gives a scatter plot of the total number of students versus oil revenue. We observe an
upward and linear relationship between these two variables. The attached MINITAB output shows
that the value of R 2 is 0.786 which is good.
Regression Analysis: Total No. of Students versus Oil Revenue
The regression equation is
Total No. of Students = - 6054 + 0.0797 Oil Revenue
Predictor
Constant
Oil Revenue
S = 14154.7
Coef
-6054
0.079707
SE Coef
3443
0.007362
R-Sq = 78.6%
T
-1.76
10.83
P
0.088
0.000
R-Sq(adj) = 77.9
Since each of the three explanatory variables shows a linear relationship with the total number of
students studying abroad, now we fit a multiple linear regression model.
Response variable: The total number of students studying abroad
Explanatory variables: Budget in higher education, Oil price, and Oil revenue.
Regression Analysis: Total No. of versus Budget in HE, Oil Revenue, Oil Price
The regression equation is
Total No. of Students = - 18688 + 0.000000 Budget in HE - 0.0127 Oil Revenue
+ 417 Oil Price
Predictor
Constant
Budget in HE
Oil Revenue
Oil Price
Coef
-18688
0.00000042
-0.01267
417.3
S = 10526.9
R-Sq = 88.9%
SE Coef
4476
0.00000008
0.01897
134.2
T
-4.18
5.23
-0.67
3.11
P
0.000
0.000
0.509
0.004
VIF
6.812
12.003
3.471
R-Sq(adj) = 87.8%
The attached MINITAB output for multiple regression is quite confusing. Here the value of R 2 is
0.889 which is good, but we observe that the effect of oil revenue is negative which completely
69
conflicts with our findings in Figure 4.4. It may be a clear case of wrong sign problem which is
caused by multicollinearity. We checked the VIF values and found the largest one as 12.003 which
shows that this model is severely affected by multicollinearity.
The above results suggest us that we cannot keep all the three explanatory variables in the model.
In quest of which of the explanatory variables should remain in the model we apply the forward
selection, the backward elimination and stepwise regression methods and the MINITAB results
are reported.
Stepwise Regression: Total No. of versus Oil Revenue, Budget in HE, ...
Forward selection.
Alpha-to-Enter: 0.05
Response is Total No. of Students on 3 predictors, with N = 34
Step
Constant
Budget in HE
T-Value
P-Value
1
-5982
2
-17088
0.00000
12.48
0.000
0.00000
9.92
0.000
Oil Price
T-Value
P-Value
S
R-Sq
R-Sq(adj)
Mallows Cp
350
3.98
0.000
12621
82.95
82.42
16.0
10432
88.72
87.99
2.4
Stepwise Regression: Total No. of versus Oil Revenue, Budget in HE, ...
Backward elimination.
Alpha-to-Remove: 0.05
Response is Total No. of Students on 3 predictors, with N = 34
Step
Constant
1
-18688
Oil Revenue
T-Value
P-Value
-0.013
-0.67
0.509
2
-17088
70
Budget in HE
T-Value
P-Value
0.00000
5.23
0.000
0.00000
9.92
0.000
Oil Price
T-Value
P-Value
417
3.11
0.004
350
3.98
0.000
S
R-Sq
R-Sq(adj)
Mallows Cp
10527
88.88
87.77
4.0
10432
88.72
87.99
2.4
Stepwise Regression: Total No. of versus Oil Revenue, Budget in HE, ...
Alpha-to-Enter: 0.05
Alpha-to-Remove: 0.05
Response is Total No. of Students on 3 predictors, with N = 34
Step
Constant
Budget in HE
T-Value
P-Value
1
-5982
2
-17088
0.00000
12.48
0.000
0.00000
9.92
0.000
Oil Price
T-Value
P-Value
S
R-Sq
R-Sq(adj)
Mallows Cp
350
3.98
0.000
12621
82.95
82.42
16.0
10432
88.72
87.99
2.4
All these three methods come up with exactly the same conclusion, i.e. the explanatory variables
that we should keep in our study are budget in higher education and oil price. Let us denote this as
Model A
Regression Analysis: Model A: Total No. of Stu versus Budget in HE, Oil Price
The regression equation is
Total No. of Students = - 17088 + 0.000000 Budget in HE + 350 Oil Price
Predictor
Constant
Budget in HE
Oil Price
Coef
-17088
0.00000037
350.09
S = 10432.5
R-Sq = 88.7%
SE Coef
3747
0.00000004
87.97
T
-4.56
9.92
3.98
P
0.000
0.000
0.000
R-Sq(adj) = 88.0%
71
VIF
1.519
1.519
The attached MINITAB output for Model A looks better now. Here the value of R 2 is 0.887 which
is good, but more importantly we see that the effects of both of the explanatory variables are
positive and they are statistically significant.
Probability Plot of Residuals
Normal - 95% CI
99
Mean
StDev
N
AD
P-Value
95
90
0
10111
34
0.906
0.019
Percent
80
70
60
50
40
30
20
10
5
1
-30000
-20000
-10000
0
Residuals
10000
20000
30000
Figure 4.5: Normal Probability Plot of the Residuals for Model A
But when we look at the normality plot of residuals as shown in Figure 4.5 we do not feel very
good about Model A. For this particular case the value of the Jarque-Bera test is 6.72 (p-value
0.0347) and the RM test is 8.37 (p-value 0.0152). So both of the tests reject the assumption of
normality of errors and thus the model looks questionable. As an alternative choice we fit the
same model by the robust reweighted least squares (RLS) method and we call it Model B.
Regression Analysis: Model B: Total No. of Stu versus Budget in HE_1, Oil Price_1
The regression equation is
Total No. of Students_1 = - 24848 + 0.000000 Budget in HE_1 + 992 Oil Price_1
Predictor
Constant
Budget in HE_1
Oil Price_1
S = 5984.88
Coef
-24848
0.00000016
991.7
R-Sq = 97.2%
SE Coef
2647
0.00000005
136.8
T
-9.39
2.91
7.25
R-Sq(adj) = 97.0%
72
P
0.000
0.008
0.000
The attached MINITAB output shows that Model B produces even better fit in terms of R 2 as its
value goes up to 0.972 from 0.887 when the OLS fit was done. Here the effects of both of the
explanatory variables are positive and they are statistically significant.
Probability Plot of RLS
Normal - 95% CI
99
Mean
StDev
N
AD
P-Value
95
90
-7.56700E-12
5730
25
0.532
0.157
Percent
80
70
60
50
40
30
20
10
5
1
-20000
-10000
0
RLS
10000
20000
Figure 4.6: Normal Probability Plot of the Residuals for Model B
For model B, the normality plot of residuals as shown in Figure 4.6 look much better than what
we saw for Model A. For a confirmation we compute the Jarque-Bera and the RM values for Model
B. We see that the value of the Jarque-Bera test is 1.56 (p-value 0.4584) and the RM test is 1.69
(p-value 0.4296). So both of the tests now accept the assumption of normality of errors and thus
the model can be considered as a valid one.
In the previous chapter we have seen that most of the variables we consider here in our regression
model show exponential growth. So it may be a good idea to fit the model using a log
transformation on the response as suggested by Montgomery et al. (2013). This third model will
be denoted as Model C.
73
Regression Analysis: Model C:
The regression equation is
Total No. of Students_2 = 7.44 + 0.000000 Budget in HE_2 + 0.0217 Oil Price_2
Predictor
Constant
Budget in HE_2
Oil Price_2
S = 0.331114
Coef
7.4370
0.00000000
0.021717
SE Coef
0.1189
0.00000000
0.002792
R-Sq = 92.6%
T
62.53
10.12
7.78
P
0.000
0.000
0.000
R-Sq(adj) = 92.1%
Probability Plot of Residuals_1
Normal - 95% CI
99
Mean
StDev
N
AD
P-Value
95
90
-4.44089E-15
0.3209
34
0.459
0.247
Percent
80
70
60
50
40
30
20
10
5
1
-1.0
-0.5
0.0
Residuals_1
0.5
1.0
Figure 4.7: Normal Probability Plot of the Residuals for Model C
The attached MINITAB output shows that Model C falls in between Model A and Model B in
terms of possessing better R 2 . For this model the value of R 2 is 0.926. But it was 0.972 for Model
B and 0.887 for Model A. Here the effects of both of the explanatory variables are positive and
they are statistically significant.
The normality plot of residuals for model C looks good as shown in Figure 4.7. Now we compute
the Jarque-Bera and the RM values for Model C. We see that the value of the Jarque-Bera test is
1.86 (p-value 0.3946) and the RM test is 1.97 (p-value 0.3734). So both of the tests now accept
the assumption of normality of errors and thus the model can be considered as a valid one.
74
4.5 Results Comparisons
In this section we summarize our above findings. To explain the number of students studying
abroad we began with three explanatory variables but this model failed the multicollinearity check.
After that we employed the variable selection procedure to select the best set of regressors. After
this selection was made we fit the data with three different models and the result summaries are
presented in Table 4.1.
Table 4.1: Regression Results Summary
Model
R2
JB
RM
Normality
A: OLS
0.887
0.0347
0.0152
Rejected
B: RLS
0.972
0.4584
0.4296
Accepted
C: Exponential
0.926
0.3946
0.3734
Accepted
The above results suggest that the traditional least squares method performs worst among the three
models considered here. It not only possesses the lowest R 2 , it fails the normality test as well.
Both the robust fit and the exponential model pass the normality test but we will put the robust
RLS ahead of the exponential model both in terms of possessing higher R 2 and p-value in test of
normality.
75
CHAPTER 5
Cross Validation of Forecasts
In this chapter our main objective is to evaluate forecasts made by different regression methods
and models. We would employ the cross validation method for this purpose.
5.1 Evaluation of Forecasts by Cross Validation
Cross-validation is a technique for assessing how the results of a statistical analysis will generalize
to an independent data set. It is mainly used in settings where the goal is prediction, and one wants
to estimate how accurately a predictive model will perform in practice. One round of crossvalidation involves partitioning a sample of data into complementary subsets, performing the
analysis on one subset (called the training set), and validating the analysis on the other subset
(called the validation set or testing set). An excellent review of different type of cross validation
techniques is available in Izenman (2008). Picard and Cook (1984) developed all basic
fundamentals of applying cross validation technique in regression and time series.
According to Montgomery et al. (2013), three types of procedures are useful for validating a
regression or time series model.
(i)
Analysis of the model coefficients and predicted values including comparisons with prior
experience, physical theory, and other analytical models or simulation results,
(ii)
Collection of new data with which to investigate the model’s predictive performance,
76
(iii) Data splitting, that is, setting aside some of the original data and using these observations to
investigate the model’s predictive performance. Since we have a large number of data set, we
prefer the data splitting technique for cross-validation of the fitted model.
In order to find out the best prediction model we usually leave out say, l observations aside as
holdback period. The size of l is usually 10% to 20% of the original data. Suppose that we
tentatively select two models namely, A and B. We fit both the models using (T – l) set of
observations. Then we compute
MSPE A 
1 l
2
e Ai

l t 1
(5.1)
for model A and
1 l
2
MSPE B   eBi
l t 1
(5.2)
for model B. Several methods have been devised to determine whether one MSPE is statistically
different from the other. One such popular method of testing is the F-test approach, where Fstatistic is constructed as a ratio between the two MSPEs keeping the larger MSPE in the numerator
of the F-statistic. If the MSPE for model A is larger, this statistic takes the form:
F
MSPE A
MSPE B
(5.3)
This statistic follows an F distribution with (l , l) degrees of freedom under the null hypothesis of
equal forecasting performance. If the F-test is significant we will choose model B for this data
otherwise, we would conclude that there is a little bit difference in choosing between these two
models.
77
5.2 Cross Validation Results
In this section we employ the linear regression with the OLS and RLS methods and an exponential
model for cross validation. Since we have 34 years data, we will use the first 90% of our data (30
years) for fitting the model and information for the last 10% of observations (4 years) will be
forecasted by these three different methods.
Table 5.1: Original and Forecasted Values for 2011-2014
Year
Original
RLS
OLS
Exponential
2011
95991
89716.3
69734.2
70962
2012
86030
90866.4
78140.5
97382
2013
102302
95339.7
89855.8
136358
2014
90925
87741.9
89071.0
121570
Scatterplot of Original vs RLS, OLS, Exponential Forecasts
140000
Variable
Original
RLS
OLS
Exponential
130000
Forecast
120000
110000
100000
90000
80000
70000
85000
87500
90000
92500
95000
Original
97500
100000
102500
Figure 5.1: Scatterplot of RLS, OLS, Exponential Forecasts vs Original Values
78
Table 5.1 provides total number of students studying abroad. Three different forecasted values are
for the years 2011-2014 are presented together with the original values.
Figure 5.1 gives a graphical display to show which forecasted values get closer to their
corresponding original ones. The original values are plotted in black dots while the RLS forecasts
plotted in red dots are quite close to the black ones. This graph clearly shows that the RLS forecast
are much better than the OLS forecasts. Although exponential model performed better than the
OLS fit. In terms of forecasts it seems to perform even worse the OLS.
Table 5.2: Cross Validation Result Summary
Model
MSPE
F
p-value
OLS
227502579
RLS
30342093
7.49791
0.0383
Exponential
713559588
0.525061
0.7260
As we know that the graphical summaries are subjective, we do an analytical test to evaluate the
forecasts as designed in (5.1) to (5.3) and the results are presented in Table 5.2. We observe from
this table that the MSPE value for the RLS is much less than that of OLS and exponential model.
We also observe that the p-value of the F test is highly significant in comparison to the OLS.
However, the exponential forecasts produce very insignificant p-value in this regard. Thus we can
conclude that the RLS produces the best set of forecasts followed by the OLS forecasts.
Exponential forecasts are the worst in this study.
79
CHAPTER 6
Conclusions and Areas of Further Research
In this chapter we will summarize the findings of our research to draw some conclusions and
outline ideas for our future research.
6.1 Conclusions
In this study our prime objective was to investigate the trend of Saudi Arabia students who are
studying abroad for higher education. We investigate both the overall trend and also trends of nine
individual programs. We observe that not a single variable fit linear trend model. All of them fit
either quadratic or exponential models. Then we investigate trends of some other variables such
as budget in higher education, oil price, and oil revenue which should influence the number of
students studying abroad. We observe similar trend for these variables as well.
We also observe that most of the Saudi Arabia students go abroad to study Engineering and
Medical Science and the least number of students study Agriculture and Fine Arts. We also found
that the number of male students is significantly higher than the number of female students in 6
out of 9 programs. Female students are more in only three programs but the differences are not
statistically significant. So we get an evidence of gender discrimination among the Saudi Arabia
students studying abroad.
In quest of which factors influence the number of students studying abroad we consider regression
analysis and the two variables that we found affect most are budget in higher education and oil
80
price. We also observe that commonly used least squares method have several limitations in this
case so we finally used the robust reweighted least squares to fit the data. To verify how good the
fit is, we did cross validation to generate forecasts for the last four years of data and we found that
the RLS fit produces much better forecasts than other methods.
Our findings cause a little bit concern about the future of the programs in which the Saudi Students
go abroad for higher studies. Since we see that oil price has a significant positive impact on the
number of students we suspect the recent fall in oil price might affect the programs adversely.
6.2 Areas of Further Research
Although our data sets are time series, we are not able to consider a variety of time series methods
due to time constraints. We only consider the deterministic models in fitting the data. In future we
would like to extend our research by considering stochastic ARIMA models. Volatility could be
an essential part of this data. We would like to consider ARCH/GARCH or ARFIMA/GARFIMA
models on these data in future.
81
References
1.
Bowerman, B. L., O’Connell, R. T., and Koehler, A. B. (2005). Forecasting, Time
Series, and Regression: An Applied Approach, 4th Ed., Duxbury Publishing, Thomson
Books/Cole, New Jersey.
2.
Hadi, A.S., Imon, A.H.M.R. and Werner, M. (2009). Detection of outliers, Wiley
Interdisciplinary Reviews: Computational Statistics, 1, pp. 57 – 70.
3.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W. (1986). Robust
Statistics: The Approach Based on Influence Function, Wiley, New York.
4.
Imon, A. H. M. R. (2003). Residuals from Deletion in Added Variable Plots, Journal of
Applied Statistics, 30, 841– 855.
5.
Imon, A. H. M. R. (2003). Regression Residuals, Moments, and Their Use in Tests for
Normality, Communications in Statistics—Theory and Methods, 32, pp. 1021 – 1034.
6.
Imon, A. H. M. R. (2008). Diagnostic Robust Approach of Outlier Detection in Regression,
Journal of Statistical Research, 42, 105 – 120.
7.
Izenman, A.J. (2008), Modern Multivariate Statistical Techniques: Regression,
Classification, and Manifold Learning, Springer, New York.
8.
Kadane, J.B. (1984). Robustness of Bayesian Analysis, Elsevier North-Holland,
Amsterdam.
9.
Maronna, R.A., Martin, R.D. and Yohai, V.J. (2006), Robust Statistics: Theory and
Methods, Wiley, New York.
82
10
Montgomery, D., Jennings, C., and Kulachi, M. (2008), Introduction to Time Series
Analysis and Forecasting, Wiley, New York.
11. Montgomery, D., Peck, E., and Vining, G. (2013), An Introduction to Regression
Analysis, 5th Ed., Wiley, New York.
12. Pindyck, R. S. and Rubenfeld, D. L. (1998), Econometric Models and Economic
Forecasts, 4th Ed. Irwin/McGraw-Hill Boston.
13
Rousseeuw, P.J. (1984). Least Median of Squares Regression, Journal of the American
Statistical Association, 79, pp. 871 – 880.
14. Rousseeuw, P.J. and Leroy, A.M. (1987). Robust Regression and Outlier Detection, Wiley,
New York.
15. Rousseeuw, P.J. and Leroy, A.M. (1987). A Fast Algorithm for S-Regression Estimates,
Journal of Computational and Graphical Statistics, 15, pp. 414–427.
16. Saudi Arabian Moneytary Agency (SAMA).
http://www.sama.gov.sa/en-US/EconomicReports/Pages/YearlyStatistics.aspx
17. Saudi Arabia Cultural Mission to the U.S.
http://www.sacm.org/ArabicSACM/pdf/Posters_Sacm_schlorship.pdf
19. The Ministry of Education
https://www.mohe.gov.sa/ar/Ministry/Deputy-Ministry-for-Planning-andInformation-affairs/HESC/Ehsaat/Pages/default.aspx
20. The Ministry of Education
https://www.mof.gov.sa/english/DownloadsCenter/Pages/Budget.aspx
83
APPENDIX A
Table: A1. Number of Saudi Students Studying Abroad for Higher Education
Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
Social Science
Male Female
2015
84
2061
213
1735
156
1356
141
1540
164
1199
161
1062
138
939
92
685
112
570
79
598
82
628
81
605
89
647
88
475
51
531
58
598
151
107
75
676
254
1759
534
1917
568
687
244
764
296
754
333
591
241
2267
510
4663
968
5424
1273
9462
2045
16318 4132
26043 7702
1547
1093
1542
1269
1287
1068
Total
2099
2274
1891
1497
1704
1360
1200
1031
797
649
680
709
694
735
526
589
749
182
930
2293
2485
931
1060
1087
832
2777
5631
6697
11507
20450
33745
2640
2811
2355
Natural Science
Male
1124
1117
974
673
611
647
645
597
555
462
423
430
424
428
425
481
536
535
595
974
1072
730
788
776
597
2823
3136
5130
7118
8584
11945
16331
19047
16245
Female
48
78
72
47
53
65
61
71
125
100
88
79
76
73
89
133
372
424
388
537
570
436
392
407
282
607
720
1262
1715
2567
4481
6306
8230
7711
84
Total
1172
1195
1046
720
664
712
706
668
680
562
511
509
500
501
514
614
908
959
983
1511
1642
1166
1180
1183
879
3430
3856
6392
8833
11151
16426
22637
27277
23956
Medical Science
Male
1312
758
666
508
621
637
654
578
542
448
361
431
508
552
559
550
673
860
966
1361
1626
1171
1214
1376
1709
3895
4983
3652
6173
7524
11589
14717
17208
15097
Female
235
130
110
81
86
82
86
64
59
58
46
60
59
60
50
62
110
206
248
313
398
307
362
398
467
986
1380
1674
2340
3736
6287
7913
9881
8847
Total
1547
888
776
589
707
719
740
642
601
506
407
491
567
612
609
612
783
1066
1214
1674
2024
1478
1576
1774
2176
4881
6363
5326
8513
11260
17876
22630
27089
23944
Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
Law
Male
123
313
42
32
39
43
41
39
44
39
36
35
55
54
29
29
31
39
78
183
292
24
105
127
240
506
625
756
1744
1729
2289
2989
3096
2715
Female
2
19
4
9
6
6
2
1
2
1
3
1
1
1
0
0
8
8
8
17
25
56
5
10
25
37
58
82
208
260
475
629
902
827
Humanities
Total
125
332
46
41
45
49
43
40
46
40
39
36
56
55
29
29
39
47
86
200
317
80
110
137
265
543
683
838
1952
1989
2764
3618
3998
3542
Male
408
327
236
190
287
321
228
191
116
97
107
107
129
108
111
335
441
533
481
711
754
653
568
567
268
677
949
522
4336
1920
1998
5370
3161
3050
Female
117
2363
2203
274
252
261
260
168
110
49
34
44
57
61
90
501
735
549
816
1048
1119
1018
1010
1030
744
977
1495
408
2820
1786
1455
3800
2646
2231
85
Fine Arts
Total
525
2690
2439
464
539
582
488
359
226
146
141
151
186
169
201
836
1176
1082
1297
1759
1873
1671
1578
1597
1012
1654
2444
930
7156
3706
3453
9170
5807
5281
Male
98
45
47
27
29
24
17
13
12
10
9
10
10
12
5
3
13
9
6
14
18
24
14
20
21
28
27
17
68
77
143
269
331
474
Female
2
33
32
11
23
26
35
18
26
11
16
22
21
21
26
22
35
37
31
38
53
56
58
50
62
64
119
52
178
266
406
621
868
994
Total
100
78
79
38
52
50
52
31
38
21
25
32
31
33
31
25
48
46
37
52
71
80
72
70
83
92
146
69
246
343
549
890
1199
1468
Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
Engineering
Male Female
1490
20
1137
14
1026
68
849
12
737
9
537
17
499
6
449
10
451
9
428
10
467
18
362
3
407
2
411
6
419
37
544
15
1123
34
1435
100
542
46
498
43
516
44
681
88
2711
162
5481
292
5080
130
6665
317
10647
360
18104
692
21461
672
30164
968
26255
860
1490
20
1137
14
1026
68
Total
1510
1151
1094
861
746
554
505
459
460
438
485
365
409
417
456
559
1157
1535
588
541
560
769
2873
5773
5210
6982
11007
18796
22133
31132
27115
1510
1151
1094
Education
Male
382
516
514
339
473
296
174
157
148
120
123
120
104
109
62
74
118
107
228
459
458
193
176
177
171
300
2144
319
1254
610
955
863
1016
1059
Female
25
212
265
202
309
344
351
192
106
68
87
93
93
88
60
55
88
75
353
631
560
311
276
167
224
323
1019
216
710
716
1341
1342
1867
2117
86
Agriculture
Total
407
728
779
541
782
640
525
349
254
188
210
213
197
197
122
129
206
182
581
1090
1018
504
452
344
395
623
3163
535
1964
1326
2296
2205
2883
3176
Male
219
176
138
107
99
81
95
82
82
52
49
50
55
62
61
54
66
58
82
82
83
79
74
54
34
81
80
29
44
74
74
88
88
74
Female
1
4
3
2
3
4
3
0
1
1
1
0
1
1
2
2
2
6
3
4
8
4
10
13
5
15
31
0
0
2
12
19
18
14
Total
220
180
141
109
102
85
98
82
83
53
50
50
56
63
63
56
68
64
85
86
91
83
84
67
39
96
111
29
44
76
86
107
106
88
Table: A2. Saudi Arabia Oil Revenue, Oil Price and Budget in Higher Education
Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
Oil Revenue
328594
186006
145123
121348
88425
42464
67405
48400
75900
96800
149497
128790
105976
95505
105728
135982
159985
79998
104447
214424
183915
166100
231000
330000
504540
604470
562186
983369
434420
670265
1034360
1144818
1035046
913346
Budget in HE
2.76845E+06
9.35426E+06
1.03608E+07
9.30524E+06
1.10786E+07
7.13496E+09
6.00293E+09
6.15068E+09
5.73860E+09
5.75337E+09
6.09730E+09
3.18550E+10
3.41000E+10
3.51000E+10
2.69120E+10
2.76267E+10
4.17000E+10
4.31000E+10
4.41000E+10
4.92840E+10
5.43000E+10
4.70370E+10
6.75000E+10
6.36500E+10
7.01000E+10
8.73000E+10
9.67000E+10
1.05000E+11
1.22100E+11
1.37600E+11
1.50000E+11
1.68600E+11
2.04000E+11
2.10000E+11
87
Oil Price
77.80
74.58
68.43
69.36
67.16
26.21
28.38
20.45
25.20
28.40
23.50
22.64
20.52
19.31
19.24
23.07
23.04
15.08
21.60
35.64
31.14
31.27
30.92
35.14
50.21
59.94
62.59
80.38
53.89
68.60
88.79
93.06
88.95
80.34
APPENDIX B
One-way ANOVA: Agriculture, Education, Engineering, Fine Arts, Humanities, ...
Source
Factor
Error
Total
DF
8
297
305
S = 4965
SS
998821022
7322357160
8321178183
R-Sq = 12.00%
MS
124852628
24654401
Level
Agriculture
Education
Engineering
Fine Arts
Humanities
Law
Medical Science
Natural Science
Social Science
N
34
34
34
34
34
34
34
34
34
F
5.06
P
0.000
Mean
85
859
4524
185
1847
655
4490
4284
3459
R-Sq(adj) = 9.63%
StDev
38
895
8019
340
2142
1159
7337
7319
6584
Individual 95% CIs For Mean Based on
Pooled StDev
--------+---------+---------+---------+(-------*--------)
(-------*--------)
(--------*-------)
(-------*-------)
(-------*--------)
(-------*--------)
(-------*--------)
(-------*--------)
(-------*--------)
--------+---------+---------+---------+0
2000
4000
6000
Pooled StDev = 4965
One-way ANOVA: Agriculture, Education, Engineering, Fine Arts, Humanities, ...
Source
Factor
Error
Total
S = 4965
DF
8
297
305
SS
998821022
7322357160
8321178183
R-Sq = 12.00%
MS
124852628
24654401
Level
Agriculture
Education
Engineering
Fine Arts
Humanities
Law
Medical Science
Natural Science
Social Science
N
34
34
34
34
34
34
34
34
34
Mean
85
859
4524
185
1847
655
4490
4284
3459
F
5.06
P
0.000
R-Sq(adj) = 9.63%
StDev
38
895
8019
340
2142
1159
7337
7319
6584
Individual 95% CIs For Mean Based on
Pooled StDev
--------+---------+---------+---------+(-------*--------)
(-------*--------)
(--------*-------)
(-------*-------)
(-------*--------)
(-------*--------)
(-------*--------)
(-------*--------)
(-------*--------)
--------+---------+---------+---------+0
2000
4000
6000
Pooled StDev = 4965
Grouping Information Using Tukey Method
88
Engineering
Medical Science
Natural Science
Social Science
Humanities
Education
Law
Fine Arts
Agriculture
N
34
34
34
34
34
34
34
34
34
Mean
4524
4490
4284
3459
1847
859
655
185
85
Grouping
A
A
A B
A B C
A B C
A B C
B C
C
C
Means that do not share a letter are significantly different.
Tukey 95% Simultaneous Confidence Intervals
All Pairwise Comparisons
Individual confidence level = 99.79%
Agriculture subtracted from:
Education
Engineering
Fine Arts
Humanities
Law
Medical Science
Natural Science
Social Science
Lower
-2965
700
-3639
-1977
-3169
666
460
-365
Center
774
4439
99
1761
569
4405
4198
3373
Upper
4512
8177
3838
5500
4308
8143
7937
7112
------+---------+---------+---------+--(-------*------)
(-------*------)
(------*-------)
(-------*------)
(------*-------)
(-------*------)
(------*-------)
(-------*------)
------+---------+---------+---------+---5000
0
5000
10000
Upper
7403
3064
4726
3534
7369
7163
6338
------+---------+---------+---------+--(------*-------)
(-------*------)
(-------*------)
(-------*------)
(------*-------)
(-------*------)
(------*-------)
------+---------+---------+---------+---5000
0
5000
10000
Upper
-601
1061
-131
3704
3498
2673
------+---------+---------+---------+--(------*-------)
(-------*------)
(------*-------)
(-------*------)
(-------*------)
(-------*------)
------+---------+---------+---------+---5000
0
5000
10000
Education subtracted from:
Engineering
Fine Arts
Humanities
Law
Medical Science
Natural Science
Social Science
Lower
-73
-4413
-2751
-3943
-107
-314
-1138
Center
3665
-674
988
-204
3631
3425
2600
Engineering subtracted from:
Fine Arts
Humanities
Law
Medical Science
Natural Science
Social Science
Lower
-8078
-6415
-7607
-3772
-3979
-4803
Center
-4339
-2677
-3869
-34
-240
-1065
89
Fine Arts subtracted from:
Humanities
Law
Medical Science
Natural Science
Social Science
Lower
-2076
-3268
567
361
-464
Center
1662
470
4305
4099
3274
Upper
5400
4208
8044
7837
7012
------+---------+---------+---------+--(------*-------)
(-------*------)
(-------*------)
(------*-------)
(-------*------)
------+---------+---------+---------+---5000
0
5000
10000
Center
-1192
2643
2437
1612
Upper
2546
6382
6175
5350
------+---------+---------+---------+--(-------*------)
(------*-------)
(-------*------)
(------*-------)
------+---------+---------+---------+---5000
0
5000
10000
Center
3835
3629
2804
Upper
7574
7367
6542
------+---------+---------+---------+--(-------*------)
(------*-------)
(-------*------)
------+---------+---------+---------+---5000
0
5000
10000
Humanities subtracted from:
Law
Medical Science
Natural Science
Social Science
Lower
-4930
-1095
-1301
-2126
Law subtracted from:
Medical Science
Natural Science
Social Science
Lower
97
-109
-934
Medical Science subtracted from:
Natural Science
Social Science
Lower
-3945
-4770
Center
-206
-1031
Upper
3532
2707
------+---------+---------+---------+--(-------*------)
(-------*------)
------+---------+---------+---------+---5000
0
5000
10000
Natural Science subtracted from:
Social Science
Lower
-4563
Center
-825
Upper
2913
------+---------+---------+---------+--(------*-------)
------+---------+---------+---------+---5000
0
5000
10000
Grouping Information Using Fisher Method
Engineering
Medical Science
Natural Science
Social Science
Humanities
Education
Law
Fine Arts
Agriculture
N
34
34
34
34
34
34
34
34
34
Mean
4524
4490
4284
3459
1847
859
655
185
85
Grouping
A
A
A
A B
B C
C
C
C
C
90
Means that do not share a letter are significantly different.
Fisher 95% Individual Confidence Intervals
All Pairwise Comparisons
Simultaneous confidence level = 43.41%
Agriculture subtracted from:
Education
Engineering
Fine Arts
Humanities
Law
Medical Science
Natural Science
Social Science
Lower
-1596
2069
-2271
-609
-1801
2035
1828
1003
Center
774
4439
99
1761
569
4405
4198
3373
Upper
3144
6809
2469
4131
2939
6775
6568
5743
---------+---------+---------+---------+
(------*------)
(------*-----)
(-----*------)
(------*------)
(------*-----)
(------*-----)
(------*------)
(------*-----)
---------+---------+---------+---------+
-3500
0
3500
7000
Upper
6035
1696
3358
2166
6001
5795
4970
---------+---------+---------+---------+
(-----*------)
(------*------)
(------*------)
(-----*------)
(-----*------)
(------*------)
(-----*------)
---------+---------+---------+---------+
-3500
0
3500
7000
Upper
-1969
-307
-1499
2336
2130
1305
---------+---------+---------+---------+
(------*-----)
(-----*------)
(------*------)
(------*------)
(-----*------)
(------*------)
---------+---------+---------+---------+
-3500
0
3500
7000
Upper
4032
2840
6675
6469
5644
---------+---------+---------+---------+
(------*------)
(-----*------)
(-----*------)
(------*-----)
(-----*------)
---------+---------+---------+---------+
-3500
0
3500
7000
Education subtracted from:
Engineering
Fine Arts
Humanities
Law
Medical Science
Natural Science
Social Science
Lower
1295
-3044
-1382
-2574
1261
1055
230
Center
3665
-674
988
-204
3631
3425
2600
Engineering subtracted from:
Fine Arts
Humanities
Law
Medical Science
Natural Science
Social Science
Lower
-6709
-5047
-6239
-2404
-2610
-3435
Center
-4339
-2677
-3869
-34
-240
-1065
Fine Arts subtracted from:
Humanities
Law
Medical Science
Natural Science
Social Science
Lower
-708
-1900
1935
1729
904
Center
1662
470
4305
4099
3274
91
Humanities subtracted from:
Law
Medical Science
Natural Science
Social Science
Lower
-3562
273
67
-758
Center
-1192
2643
2437
1612
Upper
1178
5013
4807
3982
---------+---------+---------+---------+
(------*-----)
(------*-----)
(------*------)
(------*-----)
---------+---------+---------+---------+
-3500
0
3500
7000
Center
3835
3629
2804
Upper
6205
5999
5174
---------+---------+---------+---------+
(------*------)
(-----*------)
(------*------)
---------+---------+---------+---------+
-3500
0
3500
7000
Law subtracted from:
Medical Science
Natural Science
Social Science
Lower
1465
1259
434
Medical Science subtracted from:
Natural Science
Social Science
Lower
-2576
-3401
Center
-206
-1031
Upper
2164
1339
---------+---------+---------+---------+
(-----*------)
(------*------)
---------+---------+---------+---------+
-3500
0
3500
7000
Natural Science subtracted from:
Social Science
Lower
-3195
Center
-825
Upper
1545
---------+---------+---------+---------+
(------*-----)
---------+---------+---------+---------+
-3500
0
3500
7000
92
Download