Chemosphere 299 (2022) 134250 Contents lists available at ScienceDirect Chemosphere journal homepage: www.elsevier.com/locate/chemosphere Modelling and investigating the impacts of climatic variables on ozone concentration in Malaysia using correlation analysis with random forest, decision tree regression, linear regression, and support vector regression Abdul-Lateef Balogun a, c, Abdulwaheed Tella b, c, * a Professional Services Department (Resources), Esri Australia, 613 King Street, West Melbourne, VIC, 3003, Australia Earth, Environment and Space Division, Foresight Institute of Research and Translation, Ibadan, Nigeria Geospatial Analysis and Modelling (GAM) Research Laboratory, Department of Civil and Environmental Engineering, Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Perak, Malaysia b c H I G H L I G H T S • Climate change has the potential to influence air pollution. • Temperature is an essential factor for ozone pollution. • Wind serves as a medium for transboundary pollution. • Random forest outperformed the other machine learning algorithms for surface ozone prediction. • Industrialized and urbanized areas are hotspots for bad air quality. A R T I C L E I N F O A B S T R A C T Handling Editor: Kyusik Yun Climate change is generally known to impact ozone concentration globally. However, the intensity varies across regions and countries. Therefore, local studies are essential to accurately assess the correlation of climate change and ozone concentration in different countries. This study investigates the effects of climatic variables on ozone concentration in Malaysia in order to understand the nexus between climate change and ozone concentration. The selected data was obtained from ten (10) air monitoring stations strategically mounted in urban-industrial and residential areas with significant emissions of pollutants. Correlation analysis and four machine learning algorithms (random forest, decision tree regression, linear regression, and support vector regression) were used to analyze ozone and meteorological dataset in the study area. The analysis was carried out during the southwest monsoon due to the rise of ozone in the dry season. The results show a very strong correlation between tem­ perature and ozone. Wind speed also exhibits a moderate to strong correlation with ozone, while relative hu­ midity is negatively correlated. The highest correlation values were obtained at Bukit Rambai, Nilai, Jaya II Perai, Ipoh, Klang and Petaling Jaya. These locations have high industries and are well urbanized. The four machine learning algorithms exhibit high predictive performances, generally ascertaining the predictive accu­ racy of the climatic variables. The random forest outperformed other algorithms with a very high R2 of 0.970, low RMSE of 2.737 and MAE of 1.824, followed by linear regression, support vector regression and decision tree regression, respectively. This study’s outcome indicates a linkage between temperature and wind speed with ozone concentration in the study area. An increase of these variables will likely increase the ozone concentration posing threats to lives and the environment. Therefore, this study provides data-driven insights for decisionmakers and other stakeholders in ensuring good air quality for sustainable cities and communities. It also serves as a guide for the government for necessary climate actions to reduce the effect of climate change on air pollution and enabling sustainable cities in accordance with the UN’s SDGs 13 and 11, respectively. Keywords: Air quality Machine learning Ozone Sustainable cities Climate change * Corresponding author. Earth, Environment and Space Division, Foresight Institute of Research and Translation, Ibadan, Nigeria. E-mail address: tellaabdulwaheed01@gmail.com (A. Tella). https://doi.org/10.1016/j.chemosphere.2022.134250 Received 21 March 2021; Received in revised form 1 December 2021; Accepted 5 March 2022 Available online 19 March 2022 0045-6535/© 2022 Elsevier Ltd. All rights reserved. A.-L. Balogun and A. Tella Chemosphere 299 (2022) 134250 Author contributions is a correlation between climatic factors and air quality, suggesting that weather conditions influence airborne pollution. Climatic factors like temperature, wind speed, and relative humidity significantly impact the atmospheric balance (Tong et al., 2018; Tang, 2019; Hanaoka and Masui, 2019), thereby affecting the spread, scale, and magnitude of air pollutants. Variations in the climatic conditions also affect the air pollution emission strength and rate, dispersion, atmospheric reaction, and deposition (Xie et al., 2017; Zhao et al., 2016). Temperature en­ hances the rate of photochemical reactions in the atmosphere and ozone formation (Hassan et al., 2015). Wind speed influences the dispersion rate of pollutants while relative humidity impacts the pollutants’ life cycle in the atmosphere and their deposition (He, 2017; Plocoste et al., 2019). Understanding air pollution enablers such as climatic factors’ changes is crucial to air pollution monitoring and mitigation studies because the sources, composition, and potential risks of pollutants vary across the year (Kim et al., 2017). The study of Althuwaynee et al. (2020) highlighted the significance of assessing the inter-correlation between pollutants and climatic variables in modelling the future con­ centration of air pollutants. Similarly, Manimaran and Narayana (2018) showed that understanding the nexus between ozone and climatic fac­ tors (e.g. temperature, wind speed and relative humidity) will facilitate effective air quality monitoring and control. Thus, this study attempts to model the correlation between ozone and climatic factors in major Malaysian cities. This study has the potential to provide new insights into the formation, emission, accumulation, and behaviour of the pollutant which will aid mitigation initiatives to improve the deterio­ rating air quality in Malaysia therefore supporting climate actions to reduce climate change effects (SDG13) and enabling sustainable cities and communities (SDG11). Abdul-Lateef Balogun: conceptualization, data curation, investi­ gation, methodology, project administration, resources, supervision, validation, writing – review & editing. Abdulwaheed Tella: concep­ tualization, data curation, formal analysis, investigation, methodology, resources, software, validation, visualization, writing – original draft, writing – review & editing. 1. Introduction Air pollution is one of the most hazardous environmental problems locally, regionally, and globally (Bayat et al., 2019; Lee et al., 2019). Its effects transcend the ecosystem (De Marco et al., 2019; Bayat et al., 2019), affecting human health (Wu et al., 2019; Rovira et al., 2020), economy and environmental sustainability (Bayat et al., 2019; Tella et al., 2021b). Exposure to indoor and ambient air pollutants such as ozone has negatively impacted human health, causing complications linked to pulmonary disease, lung cancer, heart disease, health failure, asthmatic condition, and infertility (Rovira et al., 2020; Wang et al., 2019; An and Yu, 2018). Over 85% of the world’s population live in polluted environments (World Health Organization 2016), causing approximately 3 million mortality per annum. The southeast Asian region experiences significant air pollution due to rapid industrialization, forest degradation, burning of fossil fuels, and an increase in the use of automobiles (Hamid and Long, 2017). Trans-boundary pollution is a common phenomenon in the region, with Malaysia being the most impacted (Latif et al., 2018) due to the emission air pollutants from neighbouring countries. This has a great negative impacts on people’s well-being (Wong et al., 2017) and the economy (Khan et al., 2016). A study by Azid et al. (2015) shows that ozone (O3), a major pollutant in Malaysia, greatly affects air quality thereby impacting public health. It is noteworthy that tropospheric ozone is potentially dangerous if its concentration rises beyond the brink (Tang et al., 2020). Ozone is not primarily emitted into the air, rather, a high concentration of tropospheric ozone originates as secondary pollutants from the photochemical reactions of sunlight and two precursor chemicals, that is, volatile organic chemicals (VOCs) and nitrogen oxides (NOx ≡ NO + NO2) (Tang et al., 2020; Nassikas et al., 2020; Wang et al., 2020). Tropospheric ozone, which helps to absorb ultraviolet radiation and protects the earth from harmful radiation, doubled as a greenhouse and urban air pollutant, a triggering factor for asthma (Nassikas et al., 2020), affecting environmental sustainability (Yang et al., 2020), contributing to climate change (Wang et al., 2020) and causes respiratory problems even when exposed to it for a short time (World Health Organization, 2013; Dimakopoulou et al., 2020). High ozone concentration also affects vegetation and exacerbates the asthmatic condition, particularly in children due to the tenderness of their respiratory tract (Yusoff et al., 2019). According to Mabahwi et al. (2015), 10.36% of respiratory dis­ eases in Malaysia could be linked to air pollutants’ effect, and 19.48% of mortality is due to respiratory problems caused by air pollutants, including ozone. Thus, predicting and understanding the formation and emission rate is essential for alerting the public for appropriate intervention. 1.2. Modelling of ozone concentration Ozone concentration can be predicted using deterministic models or statistical models (Wen et al., 2019; Wang et al., 2020). Deterministic models such as Weather Research Forecasting (WRF) and community multi-scale air quality (CMAQ) models use different physical and chemical mechanism that is associated to air pollutants emission, transportation, and dispersion (Sharma et al., 2016; Djalalova et al., 2015). This approach has some limitations which affect the predictive performance of the model. For instance, deterministic approach exhibit a very low accuracy for micro-urban settlement (Wang et al., 2020). Also, it has high computational cost, and depend highly on scarce air pollutant source and emission data (Tella and Balogun, 2021). The statistical models and emerging artificial intelligence (e.g. machine learning and deep learning) algorithms which do not depend on a physical and chemical mechanism (Bai et al., 2018) are being used to forecast air pollution. So far, these recent models have shown a higher predictive accuracy for air pollution modelling compared to the deter­ ministic models (Choubin et al., 2020). Machine learning is an effective technique for understanding the inter-dependence of climatic data and air pollution since it supports exploratory analysis of data without using an empirical model (Tong, 2020; Dou et al., 2019). Further, machine learning addresses the non-linearity problem, enhancing the model’s predictive performance (Ma et al., 2020; Li et al., 2019). Although some studies have been undertaken to model ozone (Yusoff et al., 2019; Ahamad et al., 2014) and other air pollutants in Malaysia, most of the adopted models are Multiple Linear Regression (MLR) model (Nazif et al., 2018; Tan et al., 2016; Abdullah et al., 2019). According to the review study conducted by (Nur Shaziayani et al., 2020), 72% of studies used linear regression model compared to other non-linear sta­ tistical models. The MLR model, which has shown better predictive performance than the deterministic approach, is constrained by model performance (Ma et al., 2020) when compared to machine learning models. Also, it suffers from multicollinearity problems between 1.1. Causes and enablers of air pollution Atmospheric pollution is caused mainly by emission from natural sources (e.g. volcanic eruption and methane emission) or anthropic sources (e.g. gases emitted from the burning of fossils fuels, smoking, industrial activities, vehicular movement, and open burning (How and Ling, 2016). However, studies have ascertained the impact of climate change on the formation, dispersion, and transportation of ozone and other air pollutants (Tong et al., 2018; Kalisa et al., 2018; Orru et al., 2017). According to Nguyen et al. (2019), and Tang et al. (2020), there 2 A.-L. Balogun and A. Tella Chemosphere 299 (2022) 134250 predictors (Li et al., 2019). That is, the predictors and the response need to be linearly correlated, which opposes real-life circumstances which are not linearly correlated. Attempts to leverage the high predictive capabilities of machine learning algorithms for modelling air pollution is limited in Malaysia. A comparative assessment of different machine learning algorithms for this purpose is also lacking. Comparing the efficacy of multiple algo­ rithms is vital to identify the optimal algorithm for ozone prediction. Further, there is little consideration of the impacts of atmospheric conditions on ozone in the country. Consideration of climatic parame­ ters is pertinent to air pollution management because the pollutants’ behaviour can be better understood by taking into consideration the climatic variables impacts. Moreover, this study will evince the potential of climate change increasing air pollution in the future, therefore, creating insights for decision-makers and stakeholders. Based on the foregoing, it is hypothesized that ML algorithms will produce good results in predicting ozone concentration in Malaysia, although model performance will likely vary based on the strengths and weaknesses of the ML algorithms. Also, it is hypothesized that climatic variables will correlate with ozone, particularly in locations with a large concentration of industries. Thus, this study investigates the impact of climatic variables on ozone while considering the variations in seasons. In order to determine the predictive capability of the climatic variables for ozone concentra­ tion in the study area, four machine learning algorithms (random forest, linear regression, support vector regression, and decision tree regres­ sion) are used. Leveraging the effectiveness and efficiency of opensource software and programming languages in overcoming the limita­ tion of uncertainty in statistical models (Althuwaynee et al., 2020), the following objectives are pursued in this study: ozone was compared using statistical indices, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of determinant (R2). 2.1. Study area Malaysia (Fig. 1) is a Southeast Asian nation with latitude and longitude 4.2105◦ N and 101.9758◦ E, respectively. It has thirteen states and three federal territories. The country is split into Peninsular Malaysia and Malaysian Borneo by the south china sea, with over 30 million and a land area of over 320,000 km2 (Ab Rahman et al., 2013). The country has a tropical climate with high humidity throughout the year. The boundary oceans subdue warming. It has a mean yearly rainfall of 250 cm. The annual seasonal variation is based on the northeast monsoon and southwest monsoon. The northeast monsoon experiences intense precipitation, while the southwest monsoon is defined by drier weather (Tang, 2019; Kwan et al., 2013). Air pollution in the country is driven by urbanization and industri­ alization (Chin et al., 2019). Malaysia’s economic advancement has contributed to the degradation of its air quality, especially in agglom­ erated regions. The most common air pollution source is emissions from vehicles and trucks, which contribute over 70% to urban air pollution (Afroz et al., 2003; Chin et al., 2019). The country is also exposed to frequent haze pollution caused by open fire from neighbouring Indonesia (Gaveau et al., 2014; Althuwaynee et al., 2020). 2.2. Sampling stations Ten air pollution monitoring stations spread across five states in Peninsular Malaysia were used. Two monitoring stations were selected from each state for even distribution. The selection comprises two states in the northern region (Perak and Penang), two states in the central region (Selangor and Negeri Sembilan), and a state in the southern re­ gion (Malaka), as shown in Fig. 2. Most of these stations are strategically mounted in urbanindustrialized areas such as Petaling Jaya in Selangor, Bukit Rambai in Melaka, and Nilai in Negeri Sembilan (Ahmat et al., 2015; Ahamad et al., 2014). Some monitoring stations are in Penang, the most populous state in Malaysia after Selangor (Ahamad et al., 2014). Selangor’s monitoring station (Petaling Jaya) is characterized as an urban area due to its proximity to the central business district of Kuala Lumpur. The hourly concentration of ground-level ozone (O3) and climatic variables was acquired from the Malaysian Department of Environment (DoE). After that, an analysis of the hourly O3 data was carried out. (i) To model the correlation between climatic variables and ozone at ten air pollution monitoring stations; (ii) To determine the climatic variables’ potential to accurately pre­ dict surface ozone trend using decision tree regression, support vector regression, random forest, and linear regression algorithm. (iii) Comparative accuracy assessment of the four machine learning models 2. Materials and methods This study’s methodology is classified into four sections. The first section discusses the data acquisition, preparation, and analysis. We subsequently classified and analyzed the hourly concentration of the atmospheric pollution data and the climatic data in Malaysia’s south­ west monsoon season. Southwest monsoon is one of the monsoons in Malaysia that extends from May to September and denote the temperate weather (Tella et al., 2021b). The southwest monsoon is characterized by a rise in temperature and a more prolonged warm climate (Andaya and Andaya, 2016), which makes it suitable for ozone studies with a higher concentration during the warm and temperate season (APIMS, 2021). The study was done using ten monitoring stations in Peninsular Malaysia from 2012 to 2016. The second phase of the research imple­ mented Pearson’s Rank Correlation to examine the relationship between the climatic criteria and ozone concentration in each monitoring station following the approach of (Tella et al., 2021b; Jumin et al., 2020). Compared to spearman correlation which shows monotonic association between variables, Pearson’s correlation depicts linear interrelationship between variables. Thus, the Pearson’s correlation is more used, espe­ cially for air pollution studies. GIS was used to produce ozone source density maps to visualize the land-use around the monitoring stations with the highest correlation indices. This is important in order to iden­ tify sources of ozone in the most vulnerable areas. Machine learning algorithms are used to model the predictive capabilities of the predictor (climatic variables). The accuracy of the models in predicting surface 2.3. Correlation analysis The correlation factor will measure the extent of the linear de­ pendency between temperature and Ozone. The correlation ranges from − 1 to +1. A correlation quotient closer to − 1 shows a weaker correla­ tion, while a coefficient more relative to +1 shows a stronger interre­ lationship. Values of − 1 or +1 depict either completely negative or positive interlink, respectively, while 0 indicates absolutely no rela­ tionship. The most widely used correlation coefficient, Pearson’s cor­ relation coefficient denoted by r (Tang et al., 2020), is adopted for this study as shown in Eq. (1). P(ab) = Cov (a, b) δaδb (1) where P (ab) is the Pearson correlation coefficient, cov(a,b) is the covariance of parameters and δaδb is the multiplication of the standard deviation of the two parameters. A classification index was adopted (Mukaka, 2012; Hinkle et al., 2003) for the correlation coefficient interpretation as shown in Table 1. The correlation coefficient value indicates the level of influence of 3 A.-L. Balogun and A. Tella Chemosphere 299 (2022) 134250 Fig. 1. The study area map. the independent variable on the dependent variable. Pandas dataframe. corr () in python was used for Pearson’s correlation analysis. The analysis covered five years (2012–2016) to investigate the correlation trend. ∑n Xm )(Yi − Ym ) i=1 (Xi − R2 = √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ( ∑n 2 )( ∑n 2) i=1 (Xi − Xm ) i=1 (Yi − Ym ) √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ∑n 2 i=1 (Yi − Xi ) RMSE = n 2.4. Machine learning modelling Machine learning (ML) is an algorithm based computational study for deriving knowledge from data with implicit instructions (Ma and Cheng, 2016). ML trains algorithms to accept data and predict new data using statistical analysis (Sayad et al., 2019). For this study, the fixed monitoring stations were divided into two for training and testing. Four machine learning models, random forest, linear regression, support vector regression, and decision trees regression, were used to predict the ozone’s hourly concentration. The model is used to ascertain the inde­ pendent variables’ potential (climatic variables) to predict the depen­ dent variable (O3). The model was developed using scikit-learn (Buitinck et al., 2013) within the python programming environment. 80% of the dataset was used for model training and the rest of the dataset was used to test the model. A similar dataset training-testing ratio was adopted by (Bhalgat et al., 2019; Zeinalnezhad et al., 2020). Model validation was done using the coefficient of the determinant (R2), which tests for models’ fitness using values between 0 and 1. Values nearer to 1 depict a mutual relationship, while values closer to 0 indicate a weaker associ­ ation. The mean absolute error (MAE), which measures the mean ab­ solute distance between predicted and true values, and the root mean square error (RMSE), which shows the possibility of considerable mis­ predictions were also adopted for model validation. These indicators are commonly used in validating ML algorithms (Iskandaryan et al., 2020). Eqs. (1)–(3) show the formula for calculating the R2, RMSE, and MAE, respectively. MAE = n 1∑ |Yi − Xi | n i=1 (1) (2) (3) where n is the total number of data points or instances, Xi and Yi are the actual and predicted values, respectively, Xm and Ym are the mean of the actual and predicted values, respectively. 2.5. Model selection Linear regression is a statistics-based machine learning model used for quantitative analysis and prediction of numerical variables based on correlation, and it is widely applied for air pollution studies (Ezimand and Kakroodi, 2019). It is used to ascertain how well an explanatory variable can linearly predict the response variable. As regards multiple linear regression, more than one predictor is used to predict a single dependent variable. That is, it is used in examining the association be­ tween multiple predictors and an observed variable (Tella et al., 2021b). Equation (4) shows the multiple linear regression model. P = b0 + b1 R1 + b2 R2 ... + bn Rn + ∈ (4) where P is the predicted or observed variable, while R is the regressor or explanatory variable. b0 is the y-intercept (constant), b1 , b2 …bn is the slope coefficient for the regressors, and the ∈ is the residuals. For this study, the P represents the predicted ozone concentration, while the R 4 Chemosphere 299 (2022) 134250 A.-L. Balogun and A. Tella Fig. 2. Location of monitoring stations. represents the climatic variables. Support vector regression (SVR) is a supervised algorithm and application of support vectors (Bai et al., 2018) used for the regression model by researchers (Bishop, 2006; Steinwart and Christmann, 2008). The SVR model relies on a subset of the training, ignoring any data that is close to the model’s prediction (within a threshold ε) (Suárez Sánchez et al., 2011). SVR depends on the choice of kernel and relevant pa­ rameters to solve the regression problem. The kernel used for this study is the radial basis function (RBF). One of the strengths of SVR is its high dimensional space which does rely on the input space dimensionality (Bai et al., 2018). SVR uses a linear function, also called the SVR equation, for non-linear mapping of the imported data into higher dimensionality. The SVR equation is presented in Equation 5 (Bai et al., 2018). nodes. The root node is the first node that gets split up into more nodes, called the interior nodes. The interior nodes represent the model’s data features and decision rules, while the leaf nodes stand for the final result from the decision. The DecisionTreeRegressor function from sklearn was used for training the model Fig. 3 shows the decision tree structure. Random forest invented by Breiman (2001) is an ensemble learning model that can perform classification, regression, clustering, interaction detection, and variable selection (Rahmati et al., 2017; Belgiu and Drăguţ, 2016). The random forest learning method is based on the combination of decision trees that split the input data based on the parameters like a tree structure (Ma and Cheng, 2016; Breiman, 2001) (Fig. 4). Each tree is constructed using a bootstrapped sample of the data, splitting each node in the tree according to the best subset and chosen predictors randomly at each point (Araki et al., 2018; Rahmati et al., 2017). The final class is predicted, and output is resolved based on the number of the decision trees’ vote (Micheletti et al., 2014; Rahmati et al., 2017). Random forest is resistant to overfitting and outliers and P(x) = (ω × φ(x)) + t where P(x) is the predicted values, ω is the weight vector of the feature space dimension, and t is the threshold. Decision Trees (DT) is a non-parametric model of supervised learning used for both classification and regression analysis. It is based on a binary tree that splits one or more nodes to make up a decision tree (Kadavi et al., 2019). The decision trees algorithm splits the dataset into smaller classes and represents the result in a leaf node. Basically, the decision tree trains the dataset in the form of a tree structure for pre­ diction. That is why it is sometimes called tree structure regression. DT has three different nodes, namely, root nodes, interior nodes, and leaf Table 1 Description of Correlation index. 5 Correlation index Description 0.90–1.00 (− 0.90 to − 1.00) 0.70–0.90 (− 0.70 to − 0.90) 0.50–0.70 (− 0.50 to − 0.70) 0.30–0.50 (− 0.30 to − 0.50) 0.00–0.30 (0.00 to − 0.30) Very strong + ve (-ve) correlation Strong + ve (-ve) correlation Moderate + ve (-ve) correlation Weak + ve (-ve) correlation Negligible correlation A.-L. Balogun and A. Tella Chemosphere 299 (2022) 134250 Fig. 3. Architecture of decision tree model. Fig. 4. Random forest model architecture. has been established to have high performance (Balogun et al., 2021; Tella et al., 2021a). RandomForestRegressor of sklearn was used for this model in python, and the number of estimators (that is trees grown is 500). 3.1. Correlation analysis of climatic variables and O3 The ozone concentration with climatic variables such as tempera­ ture, wind speed and relative humidity were analyzed. This was done for all the monitoring stations. The ozone concentration is evaluated with temperature, wind speed and relative humidity to get the correlation index, presented in Table 2. From the correlation analysis, we observed that the correlation index between temperature and ozone for all monitoring stations is > 0.7 and <0.9. According to Table 2, there exist a strong positive correlation between temperature and ozone. This study’s outcome aligns with some previous studies that established a connection between ozone and temperature. Studies carried out by Ueno and Tsunematsu (2019) in Japan, Melkonyan and Wagner (2013) in Germany, Tang et al. (2020) and Pu et al. (2017) in China all indicated a significant effect and cor­ relation of temperature to Ozone, concluding that a rise in warming 3. Results This section presents the analysis of the correlation study of the air pollutants (O3) and climatic variables based on seasonal variation (southwest monsoon and northeast monsoon) from 2012 to 2016. The outcome of the predictions and evaluation of the machine learning models’ performance was also presented. The influence of temperature, wind speed and relative humidity on O3 vis-a-vis seasonal variations is discussed, and recommended strategies for mitigating atmospheric pollution in the context of a changing climate are offered. 6 A.-L. Balogun and A. Tella Chemosphere 299 (2022) 134250 Fig. 5. Air monitoring stations and industries density map. Kompalli et al. (2014), and Xu et al. (2015), wind speed influences the air masses, pollutants’ concentration, and dilution due to transboundary pollution from neighbouring countries. That is, the inflow of pollutants from a distant place can contribute to an increased concentration in a location where they were lower pollutants (Oleniacz et al., 2016). In the presence of nitrogen oxides in the tropospheric air, wind speed rise can affect the inflow of precursors that aid the formation of ozone in the atmosphere (Seinfeld and Pandis, 2016; Gorai et al., 2015; Oleniacz et al., 2016). From Table 2, there exist moderate to a strong negative correlation between relative humidity and ozone. The correlation ranges from – 0.662 to – 0.844. The inverse relationship between ozone and relative humidity was also discovered by a study by Jasaitis et al. (2016), which is related to the association of humidity with rainfall (Tella et al., 2021b). Thus, ozone reduces during the rainy period (Jasaitis et al., 2016). Chen et al. (2019) examined the influence of climatic variables on ozone concentration in Beijing discover that low humidity is a suit­ able climatic condition for photochemical reaction in ozone production. From Table 2, it is observed that the strongest correlation between climatic variables and ozone concentration occurs in the monitoring stations such as CA0006 (Bukit Rambai), CA0010 (Nilai), CA0009 (Jaya II Perai), CA0008 (Ipoh), CA00011 (Klang) and CA0016 (Petaling Jaya). These areas are thus more vulnerable to air pollution than other areas. This is because most of these stations are located in residential, indus­ trial, and urban regions (Ahmat et al., 2015; Ahamad et al., 2014). For instance, Tasek Ipoh, situated in Perak, is recognized for its historical industrial establishments such as cement factories and stone quarrying. In late 2019, the monitoring stations exceeded the good and moderate Air Pollution Index (API) level to an unhealthy level (Aqilah, 2019), putting individuals’ lives in danger and disrupting daily Table 2 Correlation analysis of climatic variables and ozone concentration. States Stations Wind Speed (m/s) Relative Humidity (%) Melaka CA0006 CA0043 CA0047 CA0010 CA0009 CA0003 CA0008 CA0020 CA0011 CA0016 0.736 0.619 0.763 0.677 0.696 0.415 0.537 0.564 0.663 0.611 − − − − − − − − − − Negeri Sembilan Penang Perak Selangor 0.822 0.736 0.793 0.662 0.844 0.788 0.832 0.778 0.790 0.795 Temperature (oC) 0.870 0.759 0.819 0.747 0.851 0.794 0.841 0.811 0.842 0.820 causes a higher concentration of ozone, especially during hot periods. Also, Fu and Tian (2019) concluded from a systematic review of liter­ ature that tropospheric ozone is produced as an outcome of solar radiation. There exists a positive correlation between wind speed and ozone concentration in all ten monitoring stations. The correlation index ranges from 0.415 to 0.763. Using Table 2 as a baseline, ozone and wind speed is moderate to strongly correlated. This result aligns with Jasaitis et al. (2016) whereby an increase in wind speed influences the rise in the ozone concentration in Bathic Sea in Lithuania. The authors discovered a higher ozone level as the wind speed rises, while the lowest ozone concentration was recorded in the absence of wind. Awang et al. (2018) observed a positive association between wind speed and ozone in Malaysia, which further validates this research result. According to 7 A.-L. Balogun and A. Tella Chemosphere 299 (2022) 134250 activities. The API shows the different air quality levels ranging from good, moderate, unhealthy, very unhealthy, and hazardous. The API of Malaysia is the calculated daily hourly data of air pollutants obtained from the Air Quality Monitoring Network in the country (APIMS, 2021). Therefore, the pollutant with the highest concentration is used to represent the API value. According to the APIMS (2021), ground-level ozone is sometimes used to determine the air pollution index value in some areas in Malaysia. Also, the reading of the ozone concentration is usually high in the afternoon (APIMS, 2021). Also, Bukit Rambai is stationed in an industrialized region which is recognized for its rapid growth (Ahmat et al., 2015) as shown in Fig. 5 while Petaling Jaya and Kelang monitoring stations are located in a highly dense environment and around many large scale industries, sit­ uated in the most populous state in Malaysia (Ahamad et al., 2014). Moreover, Petaling Jaya’s monitoring station is located close to the centre city of Kuala Lumpur, surrounded by commercial, residential and industrial features (Azmi et al., 2010). The dense concentration of in­ dustries (Fig. 5) explains why these regions are vulnerable to poor air quality, exacerbated by variations in climatic factors (e.g. temperature), particularly in the southwest monsoon. It is noteworthy that a strong correlation between the two variables, as seen in Table 2. Nevertheless, it is not a conclusive justification to establish a relationship between the two variables. That is, it does not necessarily imply causation (Buchanan, 2012; Akoglu, 2018). Thus, correlation analysis alone is insufficient to explain if an increase in temperature due to climate change could influence a rise in ozone concentration. This necessitates adopting four machine learning algo­ rithms to determine these climatic variables’ reliability as predictors of the ozone variation trend. is above 0.5. The low MAE and RMSE values are generally acceptable because low values closer to 0 indicates a high predictive performance (Fong et al., 2018). The MAE value ranges from 2.344 to 10.390, while the RMSE value ranges from 2.737 to 10.964. Virtually all the algo­ rithms exhibit a good fit, which establishes a mutual association be­ tween these variables. The RMSE and MAE of all the algorithms give a low value, which shows the predictive performance’s accuracy. In related studies that used linear regression, Suhaimi et al. (2019) obtained R2 of 0.68 and RMSE of 8.67 while Moustris et al. (2012) ob­ tained R2 of 0.65 and RMSE of 25.5. Similar to the predictive perfor­ mance of SVR in this study, Chaiyakhan et al. (2017) obtained a high SVR predictive performance when compared with linear regression in air quality prediction in Thailand while Ishak et al. (2017) air pollution prediction in Tunisia showed that Random Forest outperformed SVR. This implies that the predictive performance of all the algorithms is significant for ozone concentration prediction. However, the models’ performances differ. A comparative assessment of the ML algorithms in this study reveals that the random forest (RF) has the highest predictive performance, followed by linear regression, support vector regression and decision trees regression. A similar outcome was obtained in Watson et al. (2019) study to predict ozone exposure during a California wildfire. Ten ma­ chine learning algorithms were used with random forest exhibiting the highest predictive performance. Also, Zhan et al. (2018) used the random forest to accurately predict China’s ozone concentration. RF is known for its powerful prediction accuracy and performance, which can model the non-linear relationship between the predictors and output, unlike other algorithms such as support vector machines and neural network (Zhan et al., 2018; Rahmati et al., 2017; Li et al., 2019). The SVR algorithm required more processing time for both training and testing phases. Although it has good generalization performance, it tends to be very slow during the testing stage. This may be due to the bulkiness of the data used for this study, which supports the findings of (Ye et al., 2020). Despite variations in the algorithms’ performance, the results indicate the climatic variables’ capability as predictors of ozone in the study area, validating the outcome of the correlation analysis. 3.2. Assessment of models’ performance Table 3 presents the model validation outcome. The prediction gives relatively high accuracy for ozone concentration for all models. The coefficient of the determinant (R2) ranges from 0.216 to 0.970, which indicates a good fit. Notably, over 95% of the coefficient of determinant 4. Discussion Table 3 Validation outcome of ML models. Station ID CA0006 CA0043 CA0047 CA0010 CA0009 CA0003 CA0008 CA0020 CA0011 CA0016 R2 MAE RMSE R2 MAE RMSE R2 MAE RMSE R2 MAE RMSE R2 MAE RMSE R2 MAE RMSE R2 MAE RMSE R2 MAE RMSE R2 MAE RMSE R2 MAE RMSE MLR (ppm) DT (ppm) RF (ppm) SVR (ppm) 0.805 5.826 7.374 0.668 6.048 7.787 0.769 7.198 10.311 0.657 4.814 6.743 0.763 5.837 7.706 0.663 8.247 10.679 0.722 5.810 7.872 0.656 2.380 2.971 0.755 6.327 8.520 0.666 8.049 11.282 0.739 6.623 8.538 0.342 8.003 10.964 0.524 10.398 14.790 0.566 5.333 7.586 0.642 6.478 9.464 0.620 7.950 11.341 0.647 5.898 8.870 0.216 7.683 11.462 0.663 6.993 10.001 0.630 7.225 11.878 0.958 2.344 3.439 0.936 2.379 3.423 0.952 3.136 4.676 0.918 2.840 3.297 0.970 1.824 2.737 0.934 2.861 4.722 0.963 2.007 2.881 0.912 2.790 4.284 0.959 2.097 3.491 0.949 2.342 4.412 0.676 7.093 9.058 0.510 6.631 8.626 0.670 8.802 11.953 0.480 5.241 8.641 0.732 5.676 7.683 0.530 9.711 13.158 0.616 6.777 10.823 0.231 5.905 9.335 0.639 6.847 9.233 0.662 6.547 10.978 An extensive study of the climatic impact on ozone concentration in Malaysia from 2012 to 2016 was examined as mentioned in the objec­ tives. The ozone concentration in Malaysia can be attributed to forest fires induced by man (APIMS, 2021). Temperature is a significant factor in the occurrence of forest fire, and wind speed serves as a transportation medium. Notably, emitted smoke from forest fire transported to Malaysia in the form of transboundary pollution (Tella et al., 2021b) can increase the ozone concentration. For instance, it was discovered that wildfires are one of the sources of ozone in Canada (Moeini et al., 2020). Also, according to the United States Environmental Protection Agency (EPA), an increase in ozone concentration was observed from monitored wildfire in the United States (EPA, 2021). Relative humidity serves as a reductive factor to the surge in ozone concentration because it links to rainfall that aid in clearing pollutants from the atmosphere. Out of these three climatic factors, the tempera­ ture has the strongest nexus with ozone concentration. Notably, the temperature is one of the climatic parameters which contribute to air pollution, especially ozone concentration (Dawson et al., 2007; Hede­ gaard et al., 2008; Ng and Awang, 2018). According to Christensen et al. (2007), a warming climate can worsen ozone pollution in a densely populated area, thereby increasing ozone concentration and elongating the ozone season (Wu et al., 2008; Nolte et al., 2008; Bloomer et al., 2009; Hong et al., 2019). The Inter-Government Panel forecast on Climate Change (IPCC) in­ dicates a 1 ◦ C rise in the global temperature by 2025 and 3 ◦ C rise before the end of the 21st century (EEA, 2016). This suggests a potential in­ crease in the impacts of temperature on air quality in the future, 8 Chemosphere 299 (2022) 134250 A.-L. Balogun and A. Tella considering the inter-link between climate variation and airborne pollution (Orru et al., 2017; Fuzzi et al., 2015; Bond et al., 2013). Moreover, the temperature is considered a crucial factor influencing open burning in Malaysia, which emits pollutants. According to Ahamad et al. (2014), a high concentration of ozone exceeding the recommended Malaysian Air Quality Guideline (RMAQG) of 100 ppb is caused by biomass combustion. Ozone greatly impacts air quality during the dry season (Rani et al., 2018), thereby affecting human health. The results suggest a potential spike in ozone-induced health challenges, particularly during the warm southwest season. This could be mitigated by adopting measures that reduce ozone concentration, such as refuelling cars when the weather is cool, conserving electricity, and boosting public transport while man­ aging private cars (EPA, 2018). Ahmat, H., Yahaya, A.S., Ramli, N.A., 2015. ’PM10 analysis for three industrialized areas using extreme value. Sains Malays. 44, 175–185. Akoglu, H., 2018. ’User’s guide to correlation coefficients. Turk. J. Emerg. Med. 18, 91–93. Althuwaynee, O.F., Balogun, A.L., Al Madhoun, W., 2020. ’Air pollution hazard assessment using decision tree algorithms and bivariate probability cluster polar function: evaluating inter-correlation clusters of PM10 and other air pollutants. GIScience Remote Sens. 57, 207–226. An, R., Yu, H., 2018. Impact of ambient fine particulate matter air pollution on health behaviors: a longitudinal study of university students in Beijing, China. Publ. Health 159, 107–115. Andaya, B.W., Andaya, L.Y., 2016. A History of Malaysia. Macmillan International Higher Education. APIMS, 2021. ’Information about API’. Department of Environmnet, Malaysia. http://a pims.doe.gov.my/public_v2/aboutapi.html. (Accessed 28 November 2021). Aqilah, I., 2019. More schools closed in Perak as air quality deteriorates’, star media group Berhad. https://www.thestar.com.my/news/nation/2019/09/18/more-schoo ls-closed-in-perak-as-air-quality-deteriorates. (Accessed 22 July 2020). Araki, S., Shima, M., Yamamoto, K., 2018. Spatiotemporal land use random forest model for estimating metropolitan NO2 exposure in Japan. Sci. Total Environ. 634, 1269–1277. Awang, N.R., Ramli, N.A., Shith, S., Zainordin, N.S., Manogaran, H., 2018. ’Transformational characteristics of ground-level ozone during high particulate events in urban area of Malaysia. Air Qual. Atmos. Health 11, 715–727. Azid, A., Juahir, H., Toriman, M.E., Endut, A., Kamarudin, M.K.A., Rahman, M.N.A., Hasnam, C.N.C., Saudi, A.S.M., Yunus, K., 2015. ’Source Apportionment of Air Pollution: a Case Study in Malaysia. Jurnal Teknologi, p. 72. Azmi, S.Z., Latif, M.T., Ismail, A.S., Juneng, L., Jemain, A.A., 2010. Trend and status of air quality at three different monitoring stations in the Klang Valley, Malaysia. Air Qual. Atmos. Health 3, 53–64. Bai, L., Wang, J., Ma, X., Lu, H., 2018. Air pollution forecasts: an overview. Int. J. Environ. Res. Publ. Health 15, 780. Balogun, A.-L., Tella, A., Baloo, L., Adebisi, N., 2021. ’A review of the inter-correlation of climate change, air pollution and urban sustainability using novel machine learning algorithms and spatial information science. Urban Clim. 40, 100989. Bayat, R., Ashrafi, K., Shafiepour Motlagh, M., Hassanvand, M.S., Daroudi, R., Fink, G., Künzli, N., 2019. ’Health impact and related cost of ambient air pollution in Tehran. Environ. Res. 176, 108547. Belgiu, M., Drăguţ, L., 2016. ’Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogrammetry Remote Sens. 114, 24–31. Bhalgat, P., Pitale, S., Bhoite, S., 2019. Air quality prediction using machine learning algorithm. Int. J. Comput. Appl. Technol. Res. 8, 367–370. Bishop, C.M., 2006. Pattern Recognition and Machine Learning. springer. Bloomer, B.J., Stehr, J.W., Piety, C.A., Salawitch, R.J., Dickerson, R.R., 2009. ’Observed relationships of ozone air pollution with temperature and emissions. Geophys. Res. Lett. 36. Bond, T.C., Doherty, S.J., Fahey, D.W., Forster, P.M., Berntsen, T., DeAngelo, B.J., Flanner, M.G., Ghan, S., Kärcher, B., Koch, D., 2013. Bounding the role of black carbon in the climate system: a scientific assessment. J. Geophys. Res. Atmos. 118, 5380–5552. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. Buchanan, M., 2012. ’Cause and correlation, 852-52 Nat. Phys. 8. Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., 2013. API Design for Machine Learning Software: Experiences from the Scikit-Learn Project. arXiv preprint arXiv:1309.0238. Chaiyakhan, K., Chujai, P., Kerdprasop, N., Kerdprasop, K., 2017. Hourly ground-level ozone concentration prediction using support vector regression. In: International MultiConference of Engineers and Computer Scientists. Chen, Z., Zhuang, Y., Xie, X., Chen, D., Cheng, N., Yang, L., Li, R., 2019. ’Understanding long-term variations of meteorological influences on ground ozone concentrations in Beijing during 2006–2016. Environ. Pollut. 245, 29–37. Chin, Y.S.J., De Pretto, L., Thuppil, V., Ashfold, M.J., 2019. ’Public awareness and support for environmental protection—a focus on air pollution in peninsular Malaysia. PLoS One 14. Choubin, B., Abdolshahnejad, M., Moradi, E., Querol, X., Mosavi, A., Shamshirband, S., Ghamisi, P., 2020. Spatial Hazard Assessment of the PM10 Using Machine Learning Models in Barcelona, Spain, vol. 701. Science of The Total Environment, p. 134474. Christensen, J.H., Hewitson, B., Busuioc, A., Chen, A., Gao, X., Held, R., Jones, R., Kolli, R.K., Kwon, W., Laprise, R., 2007. ’Regional climate projections. In: Climate Change, 2007: the Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. University Press, Cambridge (Chapter 11). Dawson, J.P., Adams, P.J., Pandis, S.N., 2007. ’Sensitivity of ozone to summertime climate in the eastern USA: a modeling case study. Atmos. Environ. 41, 1494–1511. De Marco, A., Proietti, C., Anav, A., Ciancarella, L., D’Elia, I., Fares, S., Fornasier, M.F., Fusaro, L., Gualtieri, M., Manes, F., Marchetto, A., Mircea, M., Paoletti, E., Piersanti, A., Rogora, M., Salvati, L., Salvatori, E., Screpanti, A., Vialetto, G., Vitale, M., Leonardi, C., 2019. ’Impacts of air pollution on human and ecosystem health, and implications for the National Emission Ceilings Directive: insights from Italy. Environ. Int. 125, 320–333. Dimakopoulou, K., Douros, J., Samoli, E., Karakatsani, A., Rodopoulou, S., Papakosta, D., Grivas, G., Tsilingiridis, G., Mudway, I., Moussiopoulos, N., Katsouyanni, K., 2020. ’Long-term exposure to ozone and children’s respiratory health: results from the RESPOZE study. Environ. Res. 182, 109002. 5. Conclusion This study investigated the influence of variations in climatic vari­ ables on ozone. Correlation analysis and four machine learning algo­ rithms, random forest, support vector regression, decision trees regression, and linear regression, were used to investigate the relation­ ship between predictors and Ozone (O3). The correlation analysis shows a very strong relationship between temperature and ozone in all the ten air pollution monitoring stations. There is a moderate to strong corre­ lation between wind speed and the ozone, while relative humidity showed an inverse relationship. Also, climatic variables from six stations, Tasek Ipoh, Bukit Rambai, Nilai, jaya II perai Petaling Jaya, and Kelang, exhibit a very high cor­ relation with ozone, indicating their vulnerability to the air pollutant. Also, the machine learning algorithms confirmed the reliability of cli­ matic variables for ozone concentration prediction. The random forest exhibits the highest performance, followed by linear regression, support vector machine, and decision tree regression. The study concludes that climate change exerts considerable influ­ ence on air quality in urban centres due to variations in climatic factors such as temperature, wind speed and relative humidity. Residents of the six most vulnerable locations are at risk of respiratory and cardiovas­ cular problems. Therefore, this study’s outcome provides a sound basis for implementing evidence-based interventions in the most susceptible areas considering climatic variations. Therefore, future works should focus on trend and time-series analysis of climate variables and ozone to better understand if a future rise or fall of the climatic variables corre­ lates with the rise and fall of the ozone concentration. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgement The authors gratefully acknowledge the Department of Environment (DoE) Malaysia for providing the air quality data used in this study. References Ab Rahman, A.K., Abdullah, R., Balu, N., Shariff, F.M., 2013. ’The impact of La Niña and El Niño events on crude palm oil prices: an econometric analysis. Oil Palm Ind. Econ. J. 13, 38–51. Abdullah, S., Nasir, N.H.A., Ismail, M., Ahmed, A.N., Jarkoni, M.N.K., 2019. ’Development of ozone prediction model in urban area. Int. J. Innovative Technol. Explor. Eng. 8, 2263–2267. Afroz, R., Hassan, M.N., Ibrahim, N.A., 2003. Review of air pollution and health impacts in Malaysia. Environ. Res. 92, 71–77. Ahamad, F., Latif, M.T., Tang, R., Juneng, L., Dominick, D., Juahir, H., 2014. ’Variation of surface ozone exceedance around Klang Valley, Malaysia. Atmos. Res. 139, 116–127. 9 Chemosphere 299 (2022) 134250 A.-L. Balogun and A. Tella Djalalova, I., Delle Monache, L., Wilczak, J., 2015. ’PM2. 5 analog forecast and Kalman filter post-processing for the Community Multiscale Air Quality (CMAQ) model. Atmos. Environ. 108, 76–87. Dou, J., Yunus, A.P., Tien Bui, D., Merghadi, A., Sahana, M., Zhu, Z., Chen, C.-W., Khosravi, K., Yang, Y., Pham, B.T., 2019. Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci. Total Environ. 662, 332–346. EEA, 2016. Air Pollutants and Global Effects. European Environment. https://www.eea. europa.eu/publications/2599XXX/page009.html. EPA, 2018. ’Actions You Can Take to Reduce Air Pollution. United States Environmental Protection Agency. https://www3.epa.gov/region1/airquality/reducepollution. html#:~:text=On%20Days%20when%20High%20Ozone,Walk%20to%20errands% 20when%20possible. (Accessed 22 June 2020). EPA, 2021. ’Study Provides New Insights into Impacts of Wildland Fires on Ozone Monitoring Equipment. https://www.epa.gov/sciencematters/study-provides-ne w-insights-impacts-wildland-fires-ozone-monitoring-equipment#:~:text=Study% 20Provides%20New%20Insights%20Into%20Impacts%20of%20Wildland%20Fires %20on%20Ozone%20Monitoring%20E. (Accessed 28 November 2021). quipment ,-EPA%20research%20team&text=States%20have%20observed%20unexplained% 20increases,active%20wildfires%20or%20prescribed%20burns. Ezimand, K., Kakroodi, A., 2019. ’Prediction and spatio–Temporal analysis of ozone concentration in a metropolitan area. Ecol. Indicat. 103, 589–598. Fong, S., Abdullah, S., Ismail, M., 2018. ’Forecasting of particulate matter (PM 10) concentration based on gaseous pollutants and meteorological factors for different monsoon of urban coastal area in Terengganu. J. Sustain. Sci. Manag. 5, 3–17. Fu, T.-M., Tian, H., 2019. Climate change penalty to ozone air quality: review of current understandings and knowledge gaps. Curr. Pollut. Rep. 5, 159–171. Fuzzi, S., Baltensperger, U., Carslaw, K., Decesari, S., Denier van der Gon, H., Facchini, M.C., Fowler, D., Koren, I., Langford, B., Lohmann, U., 2015. ’Particulate matter, air quality and climate: lessons learned and future needs. Atmos. Chem. Phys. 15, 8217–8299. Gaveau, D.L., Salim, M.A., Hergoualc’h, K., Locatelli, B., Sloan, S., Wooster, M., Marlier, M.E., Molidena, E., Yaen, H., DeFries, R., 2014. ’Major atmospheric emissions from peat fires in Southeast Asia during non-drought years: evidence from the 2013 Sumatran fires. Sci. Rep. 4, 6112. Gorai, A., Tuluri, F., Tchounwou, P., Ambinakudige, S., 2015. Influence of local meteorology and NO 2 conditions on ground-level ozone concentrations in the eastern part of Texas, USA. Air Qual. Atmos. Health 8, 81–96. Hamid, M.A., Long, K.Q., 2017. An assessment of environmental impacts assessment (EIA) in Malaysia. In: SHS Web Of Conferences, 00018. EDP Sciences. Hanaoka, T., Masui, T., 2019. ’Exploring Effective Short-Lived Climate Pollutant Mitigation Scenarios by Considering Synergies and Trade-Offs of Combinations of Air Pollutant Measures and Low Carbon Measures towards the Level of the 2◦ C Target in Asia. Environmental Pollution, p. 113650. Hassan, N., Hashim, Z., Hashim, J., 2015. Impact of Climate Change on Air Quality and Public Health in Urban Areas. Asia-Pacific journal of public health/Asia-Pacific Academic Consortium for Public Health. He, H.-d., 2017. ’Multifractal Analysis of Interactive Patterns between Meteorological Factors and Pollutants in Urban and Rural Areas. Atmospheric Environment. Hedegaard, G.B., Brandt, J., Christensen, J.H., Frohn, L.M., Geels, C., Hansen, K.M., Stendel, M., 2008. ’Impacts of climate change on air pollution levels in the Northern Hemisphere with special focus on Europe and the Arctic. In: Air Pollution Modeling and its Application XIX. Springer. Hinkle, D.E., Wiersma, W., Jurs, S.G., 2003. Applied Statistics for the Behavioral Sciences. Houghton Mifflin College Division. Hong, C., Zhang, Q., Zhang, Y., Davis, S.J., Tong, D., Zheng, Y., Liu, Z., Guan, D., He, K., Schellnhuber, H.J., 2019. ’Impacts of climate change on future air quality and human health in China. Proc. Natl. Acad. Sci. Unit. States Am. 116, 17193–17200. How, C.Y., Ling, Y.E., 2016. The influence of PM2. 5 and PM10 on air pollution index (API). Environ. Eng. Hydraul. Hydrol.: Proc. Civil Eng. Univ. Teknol. Malays. Johor, Malays. 3, 132. Ishak, A.B., Daoud, M.B., Trabelsi, A., 2017. ’Ozone concentration forecasting using statistical learning approaches. J. Mater. Environ. Sci. 8, 4532–4543. Iskandaryan, D., Ramos, F., Trilles Oliver, S., 2020. Air quality prediction in smart cities using machine learning technologies based on sensor data: a review. Appl. Sci. 10, 2401. Jasaitis, D., Vasiliauskienė, V., Chadyšienė, R., Pečiulienė, M., 2016. Surface ozone concentration and its relationship with UV radiation, meteorological parameters and radon on the eastern coast of the Baltic sea. Atmosphere 7. Jumin, E., Zaini, N., Ahmed, A.N., Abdullah, S., Ismail, M., Sherif, M., Sefelnasr, A., ElShafie, A., 2020. ’Machine learning versus linear regression modelling approach for accurate ozone concentrations prediction. Eng. Appl. Computat. Fluid Mech. 14, 713–725. Kadavi, P.R., Lee, C.-W., Lee, S., 2019. ’Landslide-susceptibility mapping in Gangwon-do, South Korea, using logistic regression and decision tree models. Environ. Earth Sci. 78, 116. Kalisa, E., Fadlallah, S., Amani, M., Nahayo, L., Habiyaremye, G., 2018. ’Temperature and air pollution relationship during heatwaves in Birmingham, UK. Sustain. Cities Soc. 43, 111–120. Khan, M.F., Sulong, N.A., Latif, M.T., Nadzir, M.S.M., Amil, N., Hussain, D.F.M., Lee, V., Hosaini, P.N., Shaharom, S., Yusoff, N.A.Y.M., Hoque, H.M.S., Chung, J.X., Sahani, M., Mohd Tahir, N., Juneng, L., Maulud, K.N.A., Abdullah, S.M.S., Fujii, Y., Tohno, S., Mizohata, A., 2016. ’Comprehensive assessment of PM2.5 physicochemical properties during the Southeast Asia dry season (southwest monsoon), 589-14,611 J. Geophys. Res. Atmos. 121 (14). Kim, S.E., Honda, Y., Hashizume, M., Kan, H., Lim, Y.-H., Lee, H., Kim, C.T., Yi, S.-M., Kim, H., 2017. Seasonal analysis of the short-term effects of air pollution on daily mortality in Northeast Asia. Sci. Total Environ. 576, 850–857. Kompalli, S.K., Babu, S.S., Moorthy, K.K., Manoj, M., Kumar, N.K., Shaeb, K.H.B., Joshi, A.K., 2014. Aerosol black carbon characteristics over Central India: temporal variation and its dependence on mixed layer height. Atmos. Res. 147, 27–37. Kwan, M.S., Tangang, F.T., Juneng, L., 2013. ’Projected changes of future climate extremes in Malaysia. Sains Malays. 42, 1051–1059. Latif, M.T., Othman, M., Idris, N., Juneng, L., Abdullah, A.M., Hamzah, W.P., Khan, M.F., Nik Sulaiman, N.M., Jewaratnam, J., Aghamohammadi, N., Sahani, M., Xiang, C.J., Ahamad, F., Amil, N., Darus, M., Varkkey, H., Tangang, F., Jaafar, A.B., 2018. Impact of regional haze towards air quality in Malaysia: a review. Atmos. Environ. 177, 28–44. Lee, D., Robertson, C., Ramsay, C., Gillespie, C., Napier, G., 2019. Estimating the health impact of air pollution in Scotland, and the resulting benefits of reducing concentrations in city centres. Spatial Spatio-temp. Epidemiol. 29, 85–96. Li, R., Cui, L., Meng, Y., Zhao, Y., Fu, H., 2019. ’Satellite-based prediction of daily SO2 exposure across China using a high-quality random forest-spatiotemporal Kriging (RF-STK) model for health risk assessment. Atmos. Environ. 208, 10–19. Ma, J., Cheng, J.C.P., 2016. Identifying the influential features on the regional energy use intensity of residential buildings based on Random Forests. Appl. Energy 183, 193–201. Ma, J., Ding, Y., Cheng, J.C.P., Jiang, F., Tan, Y., Gan, V.J.L., Wan, Z., 2020. ’Identification of high impact factors of air quality on a national scale using big data and machine learning techniques. J. Clean. Prod. 244, 118955. Mabahwi, N.A., Leh, O.L.H., Omar, D., 2015. ’Urban air quality and human health effects in Selangor, Malaysia. Procedia-Soc. Behav. Sci. 170, 282–291. Manimaran, P., Narayana, A., 2018. ’Multifractal detrended cross-correlation analysis on air pollutants of University of Hyderabad Campus, India. Phys. Stat. Mech. Appl. 502, 228–235. Melkonyan, A., Wagner, P., 2013. ’Ozone and its projection in regard to climate change. Atmos. Environ. 67, 287–295. Micheletti, N., Foresti, L., Robert, S., Leuenberger, M., Pedrazzini, A., Jaboyedoff, M., Kanevski, M., 2014. ’Machine learning feature selection methods for landslide susceptibility mapping. Math. Geosci. 46, 33–57. Moeini, O., Tarasick, D.W., McElroy, C.T., Liu, J., Osman, M.K., Thompson, A.M., Parrington, M., Palmer, P.I., Johnson, B., Oltmans, S.J., 2020. Estimating wildfiregenerated ozone over North America using ozonesonde profiles and a differential back trajectory technique. Atmos. Environ. X 7, 100078. Moustris, K., Nastos, P., Larissi, I., Paliatsos, A., 2012. Application of multiple linear regression models and artificial neural networks on the surface ozone forecast in the greater Athens area, Greece. Adv. Meteorol. 2012. Mukaka, M.M., 2012. Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med. J. : J. Med. Assoc. Malawi 24, 69–71. Nassikas, N., Spangler, K., Fann, N., Nolte, C.G., Dolwick, P., Spero, T.L., Sheffield, P., Wellenius, G.A., 2020. Ozone-related asthma emergency department visits in the US in a warming climate. Environ. Res. 183, 109206. Nazif, A., Mohammed, I., Malakahmad, A., Abualqumboz, M., 2018. ’Multivariate analysis of monsoon seasonal variation and prediction of particulate matter episode using regression and hybrid models. Int. J. Environ. Sci. Technol. 16. Ng, K.Y., Awang, N., 2018. ’Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia. Environ. Monit. Assess. 190, 63. Nguyen, G.T.H., Shimadera, H., Uranishi, K., Matsuo, T., Kondo, A., 2019. Numerical assessment of PM2. 5 and O3 air quality in Continental Southeast Asia: impacts of potential future climate change. Atmos. Environ. 215, 116901. Nolte, C.G., Gilliland, A.B., Hogrefe, C., Mickley, L.J., 2008. Linking global to regional models to assess future climate impacts on surface ozone levels in the United States. J. Geophys. Res. Atmos. 113. Nur Shaziayani, W., Zia Ul-Saufie, A., Libasin, Z., Norsyiha Ahmad Shukri, F., Sarimah Syed Abdullah, S., Mohamed Noor, N., 2020. A review of PM10 concentrations modelling in Malaysia. IOP Conf. Ser. Earth Environ. Sci. 616, 012008. Oleniacz, R., Bogacki, M., Szulecka, A., Rzeszutek, M., Mazur, M., 2016. Assessing the impact of wind speed and mixing-layer height on air quality in Krakow (Poland) in the years 2014–2015. J. Civil Eng. Environ. Arch. 63, 315–342. Organization, W. H., 2013. Review of Evidence on Health Aspects of Air Pollution–REVIHAAP Project: Final Technical Report. WHO European Centre for Environment and Health, Bonn. Organization, W. H., 2016. Ambient Air Pollution: A Global Assessment of Exposure and Burden of Disease. Orru, H., Ebi, K., Forsberg, B., 2017. The interplay of climate change and air pollution on health. Curr. Environ. Health Rep. 4, 504–513. Plocoste, T., Calif, R., Jacoby-Koaly, S., 2019. ’Multi-scale time dependent correlation between synchronous measurements of ground-level ozone and meteorological parameters in the Caribbean Basin. Atmos. Environ. 211, 234–246. Pu, X., Wang, T., Huang, X., Melas, D., Zanis, P., Papanastasiou, D., Poupkou, A., 2017. Enhanced surface ozone during the heat wave of 2013 in Yangtze River Delta region, China. Sci. Total Environ. 603, 807–816. Rahmati, O., Tahmasebipour, N., Haghizadeh, A., Pourghasemi, H.R., Feizizadeh, B., 2017. ’Evaluation of different machine learning models for predicting and mapping the susceptibility of gully erosion. Geomorphology 298, 118–137. Rani, N.L.A., Azid, A., Khalit, S.I., Juahir, H., Samsudin, M.S., 2018. Air pollution index trend analysis in Malaysia, 2010-15. Pol. J. Environ. Stud. 27. Rovira, J., Domingo, J.L., Schuhmacher, M., 2020. ’Air quality, health impacts and burden of disease due to air pollution (PM10, PM2.5, NO2 and O3): application of 10 A.-L. Balogun and A. Tella Chemosphere 299 (2022) 134250 Wang, W., Liu, C., Ying, Z., Lei, X., Wang, C., Huo, J., Zhao, Q., Zhang, Y., Duan, Y., Chen, R., 2019. Particulate air pollution and ischemic stroke hospitalization: how the associations vary by constituents in Shanghai, China. Sci. Total Environ. 695, 133780. Watson, Gregory L., Telesca, Donatello, Reid, Colleen E., Pfister, Gabriele G., Jerrett, Michael, 2019. Machine learning models accurately predict ozone exposure during wildfire events. Environ. Pollut. 254, 112792. Wen, C., Liu, S., Yao, X., Peng, L., Li, X., Hu, Y., Chi, T., 2019. A novel spatiotemporal convolutional long short-term neural network for air pollution prediction. Sci. Total Environ. 654, 1091–1099. Wong, L.P., Alias, H., Aghamohammadi, N., Ghadimi, A., Sulaiman, N.M.N., 2017. Control measures and health effects of air pollution: a survey among public transportation commuters in Malaysia. Sustainability 9, 1616. Wu, B., Li, T., Baležentis, T., Štreimikienė, D., 2019. Impacts of income growth on air pollution-related health risk: exploiting objective and subjective measures. Resour. Conserv. Recycl. 146, 98–105. Wu, S., Mickley, L.J., Leibensperger, E.M., Jacob, D.J., Rind, D., Streets, D.G., 2008. Effects of 2000–2050 global change on ozone air quality in the United States. J. Geophys. Res. Atmos. 113. Xie, M., Shu, L., Wang, T.-j., Liu, Q., Gao, D., Li, S., Zhuang, B.-l., Han, Y., Li, M.-m., Chen, P.-l., 2017. Natural emissions under future climate condition and their effects on surface ozone in the Yangtze River Delta region, China. Atmos. Environ. 150, 162–180. Xu, J., Yan, F., Xie, Y., Wang, F., Wu, J., Fu, Q., 2015. Impact of meteorological conditions on a nine-day particulate matter pollution event observed in December 2013, Shanghai, China. Particuology 20, 69–79. Yang, J., Shi, B., Shi, Y., Marvin, S., Zheng, Y., Xia, G., 2020. Air pollution dispersal in high density urban areas: research on the triadic relation of wind, air pollution, and urban form. Sustain. Cities Soc. 54, 101941. Ye, Z., Yang, J., Zhong, N., Tu, X., Jia, J., Wang, J., 2020. Tackling environmental challenges in pollution controls using artificial intelligence: a review. Sci. Total Environ. 699, 134279. Yusoff, M.F., Latif, M.T., Juneng, L., Khan, M.F., Ahamad, F., Chung, J.X., Mohtar, A.A. A., 2019. Spatio-temporal assessment of nocturnal surface ozone in Malaysia. Atmos. Environ. 207, 105–116. Zeinalnezhad, M., Chofreh, A.G., Goni, F.A., Klemeš, J.J., 2020. Air pollution prediction using semi-experimental regression model and Adaptive Neuro-Fuzzy Inference System. J. Clean. Prod. 261, 121218. Zhan, Y., Luo, Y., Deng, X., Grieneisen, M.L., Zhang, M., Di, B., 2018. Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment. Environ. Pollut. 233, 464–473. Zhao, W., Fan, S., Guo, H., Gao, B., Sun, J., Chen, L., 2016. Assessing the impact of local meteorological variables on surface ozone in Hong Kong during 2000–2015 using quantile and multiple line regression models. Atmos. Environ. 144, 182–193. AirQ+ model to the Camp de Tarragona County (Catalonia, Spain). Sci. Total Environ. 703, 135538. Sayad, Y.O., Mousannif, H., Al Moatassime, H., 2019. ’Predictive modeling of wildfires: a new dataset and machine learning approach. Fire Saf. J. 104, 130–146. Seinfeld, J.H., Pandis, S.N., 2016. Atmospheric Chemistry and Physics: from Air Pollution to Climate Change. John Wiley & Sons. Sharma, S., Chatani, S., Mahtta, R., Goel, A., Kumar, A., 2016. ’Sensitivity analysis of ground level ozone in India using WRF-CMAQ models. Atmos. Environ. 131, 29–40. Steinwart, I., Christmann, A., 2008. Support Vector Machines. Springer Science & Business Media. Suárez Sánchez, A., García Nieto, P.J., Riesgo Fernández, P., del Coz Díaz, J.J., IglesiasRodríguez, F.J., 2011. Application of an SVM-based regression model to the air quality study at local scale in the Avilés urban area (Spain). Math. Comput. Model. 54, 1453–1466. Suhaimi, N., Ghazali, N.A., Nasir, M.Y., Mokhtar, M.I.Z., Ramli, N.A., Yusof, N.F.F.M., UlSaufie, A.Z., 2019. Daytime ozone concentration prediction using statistical models. J. Sustain. Sci. Manag. 14 (3), 7–11. Tan, K.C., San Lim, H., Jafri, M.Z.M., 2016. Prediction of column ozone concentrations using multiple regression analysis and principal component analysis techniques: a case study in peninsular Malaysia. Atmos. Pollut. Res. 7, 533–546. Tang, K.H.D., 2019. Climate change in Malaysia: trends, contributors, impacts, mitigation and adaptations. Sci. Total Environ. 650, 1858–1871. Tang, X., Gao, X., Li, C., Zhou, Q., Ren, C., Feng, Z., 2020. Study on spatiotemporal distribution of airborne ozone pollution in subtropical region considering socioeconomic driving impacts: a case study in Guangzhou, China. Sustain. Cities Soc. 54, 101989. Tella, A., Balogun, A.-L., 2021. GIS-based air quality modelling: spatial prediction of PM10 for Selangor State, Malaysia using machine learning algorithms. Environ. Sci. Pollut. Control Ser. 1–17. Tella, A., Balogun, A.-L., Adebisi, N., Abdullah, S., 2021a. Spatial assessment of PM10 hotspots using random forest, K-nearest neighbour and Naïve Bayes. Atmos. Pollut. Res. 12, 101202. Tella, A., Balogun, A.-L., Faye, I., 2021b. Spatio-temporal modelling of the influence of climatic variables and seasonal variation on PM10 in Malaysia using multivariate regression (MVR) and GIS. Geomatics, Nat. Hazards Risk 12, 443–468. Tong, C.H.M., Yim, S.H.L., Rothenberg, D., Wang, C., Lin, C.-Y., Chen, Y.D., Lau, N.C., 2018. Projecting the impacts of atmospheric conditions under climate change on air quality over the Pearl River Delta region. Atmos. Environ. 193, 79–87. Tong, W., 2020. Chapter 5 - machine learning for spatiotemporal big data in air pollution. In: Li, Lixin, Zhou, Xiaolu, Tong, Weitian (Eds.), Spatiotemporal Analysis of Air Pollution and its Application in Public Health. Elsevier. Ueno, H., Tsunematsu, N., 2019. Sensitivity of ozone production to increasing temperature and reduction of precursors estimated from observation data. Atmos. Environ. 214, 116818. Wang, H.-W., Li, X.-B., Wang, D., Zhao, J., He, H.-d., Peng, Z.-R., 2020. ’Regional prediction of ground-level ozone using a hybrid sequence-to-sequence deep learning approach. J. Clean. Prod. 253, 119841. 11