Student Enrollment Prediction with Multiple Linear Regression

Development of Predictive Modelling Applications for Historical Student Enrolment Data with a Multiple Linear Regression Approach line 1: 1st Given Name Surname line 2: dept. name of organization (of Affiliation) line 3: name of organization (of Affiliation) line 4: City, Country line 5: email address or ORCID line 1: 2nd Given Name Surname line 2: dept. name of organization (of Affiliation) line 3: name of organization (of Affiliation) line 4: City, Country line 5: email address or ORCID line 1: 3rd Given Name Surname line 2: dept. name of organization (of Affiliation) line 3: name of organization (of Affiliation) line 4: City, Country line 5: email address or ORCID line 1: 4th Given Name Surname line 2: dept. name of organization (of Affiliation) line 3: name of organization (of Affiliation) line 4: City, Country line 5: email address or ORCID line 1: 5th Given Name Surname line 2: dept. name of organization (of Affiliation) line 3: name of organization (of Affiliation) line 4: City, Country line 5: email address or ORCID line 1: 6th Given Name Surname line 2: dept. name of organization (of Affiliation) line 3: name of organization (of Affiliation) line 4: City, Country line 5: email address or ORCID Abstract—This research addresses the increasing complexity of higher education administration due to the digitization of services, which has raised challenges related to resource allocation, stakeholder support, and privacy concerns. This research aims to develop a predictive application using Multiple Linear Regression (MLR) to forecast student enrollment numbers, an important task for optimizing resource management. Historical enrollment data was collected and refined to build the MLR model, focusing on variables such as academic year, number of programs, and number of applicants per program. The MLR model was evaluated using standard metrics, including Mean Absolute Percentage Error (MAPE), which showed high accuracy with an average prediction error of 0.15% for Informatics Engineering and 0.47% for Information Systems through 10-Fold Cross Validation. The developed application has the potential for strategic decisionmaking in higher education by providing accurate and efficient predictions of new student admissions. emphasize the importance of adaptation in leadership, overcoming existing barriers, and being aware of changes in the world of work that affect higher education. Keywords—Multiple Linear Regression, Student Enrolment Prediction,Mean Absolute Percentage Error, Higher Education Administration. Regression is a model building technique used to predict the value of given input data. Regression is a statistical measure used to determine the strength of the relationship between the dependent variable (independent) and the independent variable (independent). The main method for making predictions is to build a regression model by finding the relationship between one or more independent or predictor variables (X) and the dependent or response variable (Y). Linear regression models the relationship between scalar variables and one or more explanatory variables [7]. Multiple Linear Regression or Multiple Linear Regression can be used in prediction or forecasting which is compiled on the basis of relevant data relationship patterns in the past. In the regression method, the predicted variable, such as sales or demand for a product, is generally stated as the dependent variable, this variable is influenced by the independent variable. There are basically two kinds of relationship analysis in forecasting, namely cross section analysis or causal model and time series analysis which will be discussed in this study. I. INTRODUCTION The digitization of education services has brought significant changes in higher education administration, creating increasing complexity and expanding the management options available [1]. However, the application of learning analytics in this sector is inseparable from a number of challenges, including resource limitations, the need for stakeholder buy-in, and pressing ethical and privacy issues to address [2]. Understanding the dynamics at play in higher education also requires attention to the route dependencies and political changes that complicate the situation [3]. The critical role of higher education data infrastructures, often shaped by political objectives, is increasingly apparent in this process of education sector reform [4]. This complexity is compounded by changes in the structure of the workforce in higher education, including an increase in workers in the third space. In managing this growing complexity, researchers Prediction is an approach model which is expected to produce a forecast or forecast regarding a description of future conditions based on data from the previous time through a mathematical calculation process. Prediction has a very important role in the process of determining the results related to an event that will occur so that it can be well prepared for what will be needed [5]. Prediction itself includes classification and regression. The classification in question is the classification of an entity into certain groups according to certain standards. In addition to classification, there is also regression which can be used to make predictions based on the relationship between 2 or more parameters. Regression can make predictions to get a value that describes future conditions based on influencing parameters [6]. Past research suggests that increased enrollment in higher education poses significant challenges in maintaining the quality of education, which requires careful planning and strong infrastructure [8]. This concern is further exacerbated by the projected decline in high school graduates and specific challenges such as high dropout rates among black males, which require appropriate interventions [9]. This fact confirms the importance of more in-depth research and thorough analysis in formulating effective strategies to increase college participation. Another study was conducted by [10]. with the title Undergraduate International Student Enrollment Forecasting Model: An Application of Time Series Analysis. This research builds a SARIMA model to estimate the number of foreign students enrolling in undergraduate programs at a university in the Midwest is the focus of this research. Elements considered in this model include enrollment trends, visa policy changes, and tuition rates. The findings show visa policy changes as well as increased Chinese enrollment as variables that have significant influence. Although the influence of tuition fees is very low, it is still significant. The use of these insights provides useful direction for policy formulation, enrollment strategies, and student support services for undergraduate students from other countries. Based on the above explanation, the Faculty of Information Technology at KH. A. Wahab Hasbullah University, as an example of a higher education institution, must conduct annual planning that includes estimating the number of student enrollments to optimize resource allocation in the field of academic administration. Accurate enrollment prediction facilitates management in managing class capacity, budget allocation, and staffing, and allows institutions to proactively respond to changes in program demand [11]. Therefore, the development of reliable predictive models is key in supporting strategic decisions taken by the Faculty of Information Technology, in order to improve the effectiveness of educational services and future plans. II. RESEARCH METHODOLOGY This research consists of several stages as shown in the frameworkof Figure 1. The next stage is model development, where the processed data is tested using the Multiple Linear Regression method. The pattern shown by simple regression analysis assumes that the relationship between >2 variables can be expressed with a straight line [12]. Multiple Linear Regression can be written as: 𝑦 = 𝛽0 + 𝛽1𝑋1 + ⋯ + 𝛽𝑛𝑋𝑛 + 𝜀 (1) where y is the dependent variable (Dependent), X1 Independent variable (Independent), 𝛽0 is the value of y when the other parameters / independent variables are 0 (intercept), MLR estimation coefficient and is 𝜀 Error. The application prototype will be developed if the model shows optimal performance, otherwise, the preprocessing will be adjusted and the model tested again. If the model demonstrates optimal performance, the development of an application prototype proceeds. If not, preprocessing adjustments are made, and the model is re-tested. The validation stage employs K-Fold Cross Validation, a robust method to evaluate the model’s performance. In K-Fold Cross Validation, the dataset is split into 'k' subsets or folds [13] . The model is trained on 𝑘−1 folds and tested on the remaining fold. This process is repeated 'k' times, with each fold serving as the test set exactly once. The model's performance is averaged over all 'k' trials to ensure that it generalizes well across different subsets of data. This method reduces the chances of overfitting and provides a more accurate estimate of the model’s effectiveness [14], with MAPE being the primary metric used for evaluating the error rate. Once the model is validated, the focus turns to implementation, which is the development and implementation of a validated application prototype. This stage involves designing, coding, and testing the application according to user requirements. Finally, in the analysis and interpretation stage, the prediction results are analyzed to understand the factors influencing student enrollment trends and evaluate whether the model meets the research needs. The research findings are further analyzed to understand the practical implications and develop recommendations that can improve or enhance processes, policies, or technology development based on the research results. III. RESULT AND DISCUSSION A. Performance Analysis The following are the performance analysis results of the Multiple Linear Regression model analyzed for its ability to handle linear relationships between input and output variables. Table 1. Prediction of the Number of New IT Students Figure 1. Research Methodology This method begins with problem identification including interviews to understand the problem of student enrollment and observation of hardware, software, and operators. Then a literature study was conducted to review various methods of predicting student enrollment and multiple linear regression techniques. In the data collection stage, data related to student enrollment and planning from the Faculty of Information Technology was collected and described to ensure clear integration. The collected data is then processed in the data processing stage, including selecting relevant data, completing missing data, correcting incorrect or inconsistent data, and formatting data according to model needs. Year Actual Predict 2013 346 346.00 2014 177 177.27 2015 202 201.78 2016 171 171.30 2017 293 293.47 2018 507 506.56 2019 223 222.76 2020 204 203.58 2021 279 279.91 2022 239 238.80 2023 248 247.49 Table 2. Table 3 Prediction of the Number of New SI Students Year Actual Predict 2013 209 209.00 Figure 2. Correlogram Heatmap 2014 76 77.00 2015 161 161.51 2016 177 176.86 2017 157 156.49 2018 191 191.41 2019 143 142.02 2020 92 92.71 2021 102 101.69 The Correlogram Heatmap shows that all the variables used in the model have a very strong correlation with each other. This can be seen from the dark red color that dominates the heatmap, which shows a correlation value close to 1. This very strong correlation indicates that the variables have a strong linear relationship, which can contribute to the high accuracy of the model in predicting the number of applicants. A strong correlation also indicates that a change in one input variable is likely to be followed by a change in the other input variables, thus providing more consistent predictions. 2022 127 126.76 2023 61 60.28 Table 3. Evaluation Results Validation K-Fold B. Implementation The results of the research that have been achieved in the form of a website-based application prototype where in the prototype, successfully predicting the number of new students at the Faculty of Information Technology, KH. A. Wahab Hasbullah University. MAPE TI SI 1 0.08% 0.66% 2 0.11% 0.32% 3 0.18% 0.07% 4 0.16% 0.32% 5 0.09% 0.22% 6 0.11% 0.68% 7 0.20% 0.77% 8 0.33% 0.29% 9 0.08% 0.18% 10 0.20% 1.18% 11 0.15% 0.47% The results showed that the Multiple Linear Regression (MLR) model, evaluated using Mean Absolute Percentage Error (MAPE), provided excellent performance by producing accurate predictions indicated by the small MAPE value [15]. Further analysis revealed factors that influenced the prediction results, including the identification of significant predictor variables such as academic year, number of study programs, and number of applicants in each program. The influence of these variables in predicting the number of student enrollments became clear, providing greater insight into the contribution of each variable to the accuracy of the model. These results show that the MLR model is not only able to provide predictions that are close to the actual values, but also clarify how each variable plays an important role in influencing the prediction results, Figure 3. Dashboard Page The Dashboard page is the initial page when the user accesses the website before entering the prediction page. On this page, the user uploads the registrant data for each year in the form of a csv file. If the inputted data is correct, the system will redirect to the prediction page. Figure 4. Prediction Page The prediction page is used to view the amount of new student data in the form of a graph of the predicted number of new students and the actual data on the number of applicants. IV. CONCLUSION The results demonstrate that the Multiple Linear Regression (MLR) model effectively predicts the number of new students enrolling in the Information Technology (TI) and Information Systems (SI) programs. Evaluated using 10Fold Cross-Validation, the model for TI shows a low Mean a Mean Absolute Percentage Error (MAPE) of 0.15%, indicating high prediction accuracy. For the SI program, the MAPE is 0.47%, slightly higher but still within acceptable limits for practical application. The successful implementation of the model into a prototype web-based application further validates its utility, enabling accurate enrollment predictions for the Faculty of Information Technology. These findings suggest that the MLR model is a reliable tool for forecasting student enrollment, contributing valuable insights for institutional planning and resource allocation. ACKNOWLEDGMENT The authors are grateful to …. for providing the funding needed for this research and its preparation for publication. necessary for this research and its preparation for publication. The authors assume all responsibility for any errors and omissions in this research. REFERENCES [1] [2] [3] [4] Harper, D. A., Muñoz, F.-F., & Vázquez, F. J. (2021). Innovation in online higher-education services: Building complex systems. Economics of Innovation and New Technology, 30(4), 412–431. https://doi.org/10.1080/10438599.2020.1716508 Tsai, Y., Poquet, O., Gašević, D., Dawson, S., & Pardo, A. (2019). Complexity leadership in learning analytics: Drivers, challenges and opportunities. British Journal of Educational Technology, 50(6), 2839–2854. https://doi.org/10.1111/bjet.12846 Kauko, J. (2014). Complexity in higher education politics: Bifurcations, choices and irreversibility. Studies in Higher Education, 39(9), 1683–1699. https://doi.org/10.1080/03075079.2013.801435 Williamson, B. (2018). The hidden architecture of higher education: Building a big data infrastructure for the ‘smarter university.’ International Journal of Educational Technology in Higher Education, 15(1), 12. https://doi.org/10.1186/s41239-018-0094-1 [5] Chen, Y., Li, R., & Hagedorn, L. S. (2019). Undergraduate International Student Enrollment Forecasting Model: An Application of Time Series Analysis. Journal of International Students, 9(1), 242–261. https://doi.org/10.32674/jis.v9i1.266 [6] Cho, J., & Lee, J. (2018). Multiple Linear Regression Models for Predicting Nonpoint-Source Pollutant Discharge from a Highland Agricultural Region. Water, 10(9), 1156. https://doi.org/10.3390/w10091156 [7] Hartaka, I. M., Eka Suadnyana, I. B. P., & Somawati, A. V. (2021). Tantangan Dan Solusi Penerimaan Mahasiswa Baru Prodi Filsafat Hindu Stahn Mpu Kuturan Singaraja. Jurnal Penjaminan Mutu, 7(2). https://doi.org/10.25078/jpm.v7i2.2778 Grip, R. S., & Grip, M. L. (2020). Using Multiple Methods to Provide Prediction Bands of K-12 Enrollment Projections. Population Research and Policy Review, 39(1), 1–22. https://doi.org/10.1007/s11113-019-09533-2 [8] [9] Jameel A. Scott, Kenneth J. Taylor, & Robert T. Palmer. (2013). Challenges to Success in Higher Education: An Examination of Educational Challenges from the Voices of College-Bound Black Males. The Journal of Negro Education, 82(3), 288. https://doi.org/10.7709/jnegroeducation.82.3.0288 [10] Johnson, D. M. (2019). Student Demographics: The Coming Changes and Challenges for Higher Education. In D. M. Johnson, The Uncertain Future of American Public Higher Education (pp. 141–156). Springer International Publishing. https://doi.org/10.1007/978- 3-030-01794-1_10 Jozaghi, A., Shen, H., Ghazvinian, M., Seo, D.-J., Zhang, Y., Welles, E., & Reed, S. (2021). Multi-model streamflow prediction using conditional bias-penalized multiple linear regression. Stochastic Environmental Research and Risk Assessment, 35(11), 2355–2373. https://doi.org/10.1007/s00477-021-02048-3 [11] [12] Rossi, E., Pecorini, I., & Iannelli, R. (2022). Multilinear Regression Model for Biogas Production Prediction from Dry Anaerobic Digestion of OFMSW. Sustainability, 14(8), 4393. https://doi.org/10.3390/su14084393 [13] Werth, J., & Sigman, M. S. (2021). Linear Regression Model Development for Analysis of Asymmetric Copper-Bisoxazoline Catalysis. ACS Catalysis, 11(7), 3916–3922. https://doi.org/10.1021/acscatal.1c00531 [14] Li, X. (2022). Sequence Model and Prediction for Sustainable Enrollments in Chinese Universities. Sustainability, 15(1), 214. https://doi.org/10.3390/su15010214 [15] Doresdiana, H., Badawi Saluy, A., & Author, C. (2021). Spare Parts Demand Forecasting During Covid 19 pandemic (Automotive Company Case Study). 2(2). https://doi.org/10.38035/dijefa.v2i2

Student Enrollment Prediction with Multiple Linear Regression

Related documents

Products

Support

Student Enrollment Prediction with Multiple Linear Regression

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib