Materials Today Communications 35 (2023) 105793 Contents lists available at ScienceDirect Materials Today Communications journal homepage: www.elsevier.com/locate/mtcomm Robust extreme gradient boosting regression model for compressive strength prediction of blast furnace slag and fly ash concrete M. Iqbal Khan *, Yassir M. Abbas Department of Civil Engineering, King Saud University, Riyadh 800–11421, Saudi Arabia A R T I C L E I N F O A B S T R A C T Keywords: Concrete Compressive strength Supplementary cementitious materials Machine learning Regression Grading boost In this study, a novel machine learning (ML) technique, eXtreme Gradient Boosting (XG Boost), was employed to train an extremely precise ML model. The developed XG Boost model was highly interpretable, filling the gap and opening black boxes in the literature. The study provides further a simple and free user interface to support the design of normal- and high-strength Blast Furnace Slag (BFS), and fly ash (FA) concrete. The compressive strength of 1030 concrete mixes containing cement (C), BFS, and FA were collected and analyzed. The baseline model tend to overfit, with R2 values of 0.996 and 0.919 for the training and testing datasets, respectively. The hyperparameters of the model have been optimized using vector multi-objective optimization to maximize the prediction capability of the model. The optimized XG Boost model exhibited a superior prediction performance with R2 of 0.992 and 0.949 for the training and testing datasets. Based on Gini indexes and SHAP values, C, FA, water, and aggregate were the most significant model parameters. According to this study, the best BFS, FA, sand, and superplasticizer contents for concrete strength optimization were 100–200, 100–200, 600–800, and 7–13 kg/m3, respectively. The SP has a negligible effect on concrete’s compressive strength at low water contents (less than 180 kg/m3), but a stochastic effect at high contents. The various chemical properties of high-range water reducers may have resulted in the randomly generated response in the current study. 1. Introduction 1.1. Concrete– an overview Due to its availability of raw materials, cost-effectiveness, and ad­ vantageous mechanical properties, concrete is the most consumable structural material. Every year, around 32 billion tons of concrete are produced worldwide (more than 4 tons per capita and the demand for concrete is expected to grow). A growing urban population (the global population is projected to rise from 30% to 54% in 2050) will drive more concrete demand for production in the coming decades [1,2]. However, about 8% of all anthropogenic greenhouse gas emissions can be attrib­ uted to concrete’s current ecological footprint [3]. A variety of supple­ mentary cementitious materials [SCM such as fly ash (FA), blast furnace slag (BFS), and silica fume (SF)] have been shown to enhance the con­ crete’s strength and durability and scale down its environmental impact [4–6]. Due to its robust relationship with structural reliability, concrete’s compressive strength is probably its most significant characteristic [7, 8]. A concrete’s compressive strength is crucial to the design of structures, and it is practically correlated with various other properties. The compressive strength of concrete is not solely dependent on the water–binder ratio but also on the cementitious types and contents [9, 10]. Overall, this mechanical property is extremely nonlinear and responsive to all constituents of concrete and age. It was shown the strength of concrete is enhanced by the increase of sodium hydroxide molarity and sodium hydroxide to sodium silicate, whereas it decreases by increasing the curing temperature [11]. A key factor affecting a concrete’s strength is the percentage of FA and BFS and the ratio of water to binders [12]. 1.2. Machine learning – an overview ML is a form of artificial intelligence that enables systems to learn just like humans without explicit programming. In general, ML models are based on side-range datasets (training data) [13,14]. It is typical to categorize ML methodologies according to the feedback to the learning algorithm into supervised-, unsupervised-, and reinforcement-learning. A list of the common supervised and unsupervised ML systems is given in Table 1. It is noteworthy that the XG Boost and t-Distributed Sto­ chastic Neighbor Embedding (t-SNE) [15] are the most modern and * Corresponding author. E-mail address: miqbal@ksu.edu.sa (M.I. Khan). https://doi.org/10.1016/j.mtcomm.2023.105793 Received 24 January 2023; Received in revised form 7 March 2023; Accepted 10 March 2023 Available online 11 March 2023 2352-4928/© 2023 Elsevier Ltd. All rights reserved. M.I. Khan and Y.M. Abbas Materials Today Communications 35 (2023) 105793 businesses [23]. Moreover, the t-SNE approach is an unsupervised ML approach that was originally proposed by Hinton and Roweis [24] and considered a technique of visualization in prototype form [25]. It has been gaining traction in real-life problems over the past decade. ML of quantum states has also proven beneficial using such a technique. A study by [26] concluded that using of t-SNE in this type of application is one of the most promising techniques. Using a state-of-the-art dimensionality reduction algorithm, usable data distributions can be represented in a non-linear manner, resulting in a low-dimensional representation (map) [27,28]. In order to conduct SNE operations, two steps must be completed. The first step in SNE consists of permuting the distance be­ tween two data points into a probability based on their similarity in multidimensional space. Secondly, the second significant component of SNE is that it combines the conditional probability of a point in high Nomenclature n xi , yi x x, y ai âi ai L(y, f(x)) M α size of sample A sample point indexed with i The sample mean of x variable and analogously for y Sample means for variables x and, respectively The targeted (observed) strength of the UHPC The predicted strength of the UHPC by the RDF model The mean of targeted (observed) strength of the UHPC Loss function with differentiability Learners with a weak response Learning rate Table 1 The popular algorithms of supervised and unsupervised ML. ML technique The goal Example algorithms Supervised learning Classification and regression Unsupervised learning Clustering and data visualization Artificial neural networks (ANNs), support vector machine (SVM), Decision trees (DT), random forest, Lasso regression, Multiple regression (MR), Multiple additives (MA), Light gradient boosting machine (LGBM), Gradient boosting, and Extreme gradient boosting (XG Boost). K-means, Mean-shift clustering, DBSCAN clustering, Gaussian mixture, Spectral Clustering, Agglomerative Clustering, and an interactive high-dimensional data visualization technique called "t-SNE"[15]. promising ML approaches that have shown superior accuracy, and merit special mention. The XG Boost is a decision tree-based collective ML system that routinely uses gradient boosting to develop a regression of the classifi­ cation models [16]. The algorithm was initially developed by Chen et al. [17] as an efficient application of the gradient boosting methodology introduced by Friedman et al. [18]. The XG Boost algorithm is charac­ terized by various advantages over gradient boosting, such as smart splitting of trees, short leaf nodes, randomization, Newton-Raphson boosting, and out-of-core modeling [19]. As a result of its integration into the Python programming language, and use in various Kaggle [20] competitions, it has become increasingly popular. The XG Boost algo­ rithm has been the basis for wide range of pioneering applications in recent years. Recent years have seen a wide range of pioneering appli­ cations based on the XG Boost algorithm. This has included but is not limited to, diagnosing human health problems [21], regarding the COVID-19 pandemic [22], and forecasting financial bankruptcy in dimensions with the conditional probability of other map points in low dimensions in order to arrive at a conditional probability of a point in high dimensions [29]. Because of the large number of observations that t-SNE must accommodate, it has the disadvantage of being unable to be scaled [28,30]. Literature has well-documented successful ML applications in structural engineering analysis and design [31–33]. The popularity of ML methods has led scientists to focus more on predicting concrete properties by using these approaches. Table 2 summarizes the reported studies in predicting concrete’s compressive strength by ML-based regression models. Although there have been a variety of ML methods used, ANN-based approaches are the most common. There is no practical application of these methods, as they are considered black boxes and cannot be widely applied. Furthermore, the performance of the modern gradient boosting ensemble methods (such as AdaBoost, LGBM, and XG Boost) was the highest. Moreover, reliable models are lacking that are accurately capable of predicting the compressive strength of concrete Table 2 Summary of some previous studies. Ref. Concrete type Data size Number of input variables [34] [35] [9] High-performance concrete (HPC) Normal concrete (NC) HPC 727 864 1030 8 9 8 [36] HPC 1030 8 [37] HPC 152 8 [38] Ultrahigh performance concrete 931 17 2 ML method R2 (Test data) ANN ANN ANN MR SVM MA DT ANN Bagged ANN Gradient-boosted ANN Wavelet bagged ANN Wavelet Gradient-boosted ANN LGBM CAT boost regressor Gradient boosting regressor AdaBoost regressor XG Boost XG Boost 0.914 0.578 0.909 0.611 0.886 0.911 0.890 0.909 0.928 0.927 0.940 0.953 0.950 0.950 0.960 0.900 0.940 0.892 M.I. Khan and Y.M. Abbas Materials Today Communications 35 (2023) 105793 efficiency of a ML model [47]. As such, establishing a model of concrete compressive strength requires the existence of vast amounts of data from diverse real-world environments. In the current study, the raw experi­ mental data were collected from the University of California, Irvine (UCI), repository [48,49]. A total of 1030 concrete mixes containing OPC (type I), BFS, and FA under normal moisture curing were gathered from 11 different experimental sources [50–60]. The data attributes consist of eight inputs and one quantitative output (the compressive strength of concrete). The units and coding system of these variables are displayed in Table 3. Among the characteristics of the collected dataset are a maximum aggregate size of 10 mm and naphthalene-based superplasticizers. Additionally, concrete strength is determined by standard methods using 150 × 300 mm cylinders. Table 4 presents the statistical analysis of the variables in the ML model, while Fig. 1 displays their frequency distributions. A majority of the model’s variables have reasonable frequency distributions for use in ML regression. Further, the developed model will probably apply to ordinary and high-strength concrete (with strengths ranging from 2.3 to 83.6 MPa) containing cement, BFS, and FA of 102–540, 0–359, and 0–200 kg/m3, respectively. In this research, an evaluation of the linear correlation between the model variables was analyzed through the calculation of Pearson’s correlation coefficient (rxy , Eq. 1) during data preprocessing. This coef­ ficient measures the standard deviation of a line of covariance, with values ranging from − 1–+1 [61]. The result of this analysis is presented in Fig. 2. The maximum positive effect of these variables on the model’s label (CS) was for C, SP, and Age, whereas increases in FA, CA, and water are likely to reduce the model’s compressive strength. The results are fairly predictable since FA has a favorable effect on the durability properties of concrete, but a negative effect on its strength. Table 3 The database coding system. Variable Portland Cement Blast Furnace Slag Fly Ash Water Superplasticizer Coarse Aggregate Sand Age Compressive strength Coded variable C BFS FA W SP CA S A CS Unit kg/m Variable type 3 days MPa Predictor (feature) Target (label) with different classes (for example, NC and HPC, or HPC and UHPC). The standard non-optimized ML model generally converges to a local optimum or overtrains, has slow calculation speeds, and does not incorporate optimizations [39–41]. In the absence of regular optimiza­ tion of these models, the forecasting precision often remains low due to the subjective criteria that are used for determining the parameters. Thus, it is commonly used to optimize parameters via optimization al­ gorithms [e.g., particle swarm optimization algorithm (PSO) or genetic algorithm (GA)] to improve prediction accuracy [42]. It is important to recognize, however, that there are a number of inherent limitations associated with these optimization algorithms. Some of the reasons for these weaknesses are insufficient calculation speed and inability to reach a global optimal [43]. The Mind Evolutionary Algorithm (MEA) has been proposed by Chengyi et al. [44] as a way to overcome the shortcomings of existing algorithms. It is noteworthy that hyper­ parameters refer to a set of parameters that can significantly impact forecasting accuracy [45]. Prior to the modeling process, it is important to optimize the hyperparameters of a ML model to ensure that it will work successfully. n ∑ (xi − x)(yi − y) i=1 ̅√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ̅ rxy = √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ n n ∑ ∑ (xi − x) (yi − y) 1.3. Research objectives, significance, and rationale i=1 A substantial amount of time and effort is required to optimize the strength of concrete. Experimental studies would be cost-efficient and time-saving if a rational and robust prediction model could be devel­ oped. Typically, linear and nonlinear empirical-based models are less accurate, since they rely on a limited sample of data, ambient and curing conditions, and testing norms [46]. The shortcomings of empirical models could be addressed by implementing ML tools. This study aimed to efficiently model and deploy the compressive strength of NC and HPC containing BFS and FA. For this purpose, the Python code with the XG boost regression algorithm was developed and fine-tuned. The model was eventually made more practical by the development of a graphical user interface (GUI). (1) i=1 Where, n is the size of sample, (xi , yi ) is A sample point indexed with i, x, y is the sample mean of x variable and analogously for y. 2.2. Feature engineering Another phase of data preprocessing was performed to identify outliers in the database. Accordingly, boxplots for the data features were analyzed, as shown in Fig. 3. There were several outliers for the age feature because relatively few discrete data were available for long-term strength measurements (180, 270, and 365 days). It is well reported that the XG Boost method can handle outliers without notable sensitivity [38], thus no outliers were removed from the database in the present study. In addition, the removal of the outlier reduces the generalization of the model. Feature engineering ultimately culminates in randomly separating the entire database into training and test dataset sets. In the current study, the training data sets were split by 75% and 25%, respectively. Thereby, a total of 772 datasets were used to train the model, and a further 258 datasets were used to test its validity. 2. Data population 2.1. Data collection, description, and statistical analysis The generality of training data plays a significant role in the Table 4 Descriptive statistics of the variables. Variable Mean Std. Dev. Median Minimum Maximum Q1 Q3 C BFS FA Water SP CA Sand Age CS 281.2 73.9 54.2 181.6 6.2 972.9 773.6 45.7 35.8 104.5 86.3 64.0 21.4 6.0 77.8 80.2 63.2 16.7 272.9 22.0 0.0 185.0 6.4 968.0 779.5 28.0 34.4 102.0 0.0 0.0 121.8 0.0 801.0 594.0 1.0 2.3 540.0 359.4 200.1 247.0 32.2 1145.0 992.6 365.0 82.6 192.0 0.0 0.0 164.9 0.0 932.0 730.3 7.0 23.7 350.0 143.0 118.3 192.0 10.2 1029.4 824.3 56.0 46.2 3 M.I. Khan and Y.M. Abbas Materials Today Communications 35 (2023) 105793 Fig. 1. Graphical summary of the variables of the model. Fig. 2. (a) Pearson’s encoded matrix, and (b) linear correlations between features and label. Fig. 3. Boxplots of the datasets. 4 M.I. Khan and Y.M. Abbas Materials Today Communications 35 (2023) 105793 ̂f (x) = ̂f M (x) = M ∑ (7) ̂f m (x) m=0 In Eqs. (2–4), L(y, f(x)) denotes the loss function that has differen­ tiability behavior. Further, in Eq. (5), α represents the learning rate. For XG Boost single trees, the model continually evaluates the loss function in order to choose the leaf node that has the highest gain. Splitting features allow the algorithm to add regression trees (i.e., introduce a new predictor, ̂f (x), to eliminate the residuals from pre­ m vious calculations). Finally, the model prediction could be evaluated by adding up the scores for each predictor. The current investigation was coded in Python [62], using the flow chart shown in Fig. 4. 3.2. Model performance indicators Normally, the coefficient of determination (R2 ) is used to test the results of a regression model. Due to its vulnerability to ML averaging procedures, it cannot be used solely to evaluate model output [63]. Therefore, the root mean squared error (RMSE), mean absolute percent error (MAPE), and normalized mean bias error (NMBE) were evaluated as well. A list of the performance indicators used in this study is pre­ sented in Eqs. (8–11). Fig. 4. XG Boost algorithm flowchart. Table 5 Hyperparameters of the main developed models. Hyperparameter Baseline model Optimized model Maximum tree depth for base learners Subsample ratio of the training instance Number of gradient boosted trees Boosting learning rate ( ) L1 yi , θ regularization term on weights Subsample ratio of columns for each level 3 1 100 0.1 0 4 0.8 400 0.175 0.1 1 MAPE = n 1∑ |ai − âi | … n i=1 | âi | ⎛ 0.3 1 ⎜n ⎜ NMBE = ⎜ ⎝ 3. XG boost regression 3.1. Model formulation and development (9) ⎞ n ∑ (ai − âi )2 ⎟ ⎟ i=1 ⎟ ⎠ ai (10) √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ √∑ √n √ (ai − âi )2 √ RMSE = i=1 n As a general rule, XG Boost algorithms are prepared using the following steps: Step 1. set the model’s initial value, ̂f , to a constant as: (11) (0) ̂f (0) (x) = argθ min n ∑ N ∑ (2) L(yi , θ) 2 R = 1− i=1 Step 2. for m = 1to M (Learners with a weak response): - Calculate the slopes (̂ g m ) and canvases ( ̂ h m ): [ ] ∂L〈yi , f (xi )〉 ̂ g m (xi ) = ∂f (xi ) f (x)=̂ f (m− 1) (x) [ ∂L〈yi , f (xi )〉 ̂ h m (xi ) = ∂f (xi )2 (3) 4. Results and discussion 4.1. The baseline model (4) 1) (x) This research began by developing a baseline model based on stan­ dardized hyperparameters (Table 5). Table 6 and Fig. 5 summarize the prediction capacity of this baseline model. The performance indicators of the test data were marginally inferior to those of the training data, which indicated that the baseline model exhibited a tendency toward overfitting. These hyperparameters have thus been tuned to maximize the model’s prediction abilities. [ ]2 N ∑ ̂ g (xi ) 1̂ ̂ m = argϕ∈Φ min h m (xi ) − m ϕ − ϕ(xi ) ̂ 2 h m (xi ) i=1 Table 6 Performance indicators of the main developed models. (5) Performance indicator - Modify the model as follows: ̂f m (x) = ̂f (m− 1) (x) + ̂f m (x)… (12) âi 2 Where, ai is the targeted (observed) strength of concrete, âi is the pre­ dicted strength of the by the ML model, and ai is the mean of targeted (observed) strength of concrete. - Solve the following optimization problem using the training set, { } ̂ xi , − g m (xi ) , to fit the base learner (tree): ̂h m (xi ) ̂f m (x) = α ϕ ̂ m (x)… n ∑ i=1 ] f (x)=̂ f (m− (ai − âi )2 i=1 MAPE NMBE RMSE R2 (6) Step 3. The final model is 5 Baseline model Optimized model Training set Testing set Training set Testing set 0.381 0.999 0.999 0.996 2.945 21.857 4.675 0.919 0.932 2.321 1.524 0.992 2.428 13.780 3.712 0.949 M.I. Khan and Y.M. Abbas Materials Today Communications 35 (2023) 105793 Fig. 5. Model vs. target for the baseline model: (a) training, and (b) testing datasets. 4.2. The optimized model 4.2.2. Important features Two methods were used in this study to evaluate the feature importance of the developed ML model. The first approach was based on the Gini index, which was calculated by evaluating the total gain of all when the feature was employed [65]. Fig. 8(a) shows the obtained feature importance by this method, where C, FA, water, and both aggregate types were the important features of the database. According to Gini-based feature analysis, significant features with unique values are more likely to be detected [66]. Likewise, the model’s sensitivity to various features has been investigated using the SHAP (an acronym from SHapley Additive Ex­ planations) value-based approach. A feature significance analysis re­ veals that low aggregate and cement content has a significant negative impact on the model’s prediction, while high weights have a strong positive impact as shown in Fig. 8(b). In contrast, low water content had a substantial positive impact on the model response, whereas high water content had a negative effect. It is interesting to note that some SHAP values are similar to Pearson’s coefficient, especially for features with negative Pearson’s values (i.e., FA, W, CA, and S). 4.2.1. Properties and performance During this study, the most significant hyperparameters (Table 5) were optimized by trial and error to achieve the highest R2 values. This was achieved by using vector-based (Pareto [64]) optimization. The red point in Fig. 6 shows the optimum Pareto frontier that was obtained. The optimized parameters were discovered at this point. The present inves­ tigation demonstrated superior predictive performance compared to most ML models reported in the literature (Table 2) with scores of 0.992 and 0.949 (Table 6), respectively. As shown in Fig. 7, predicted-target data were within an accuracy range of ± 85% for the test data, which proves that the optimized model is a tremendous prediction tool. The following sections present the feature importance and establish particle dependence plots (PDPs) established with the aid of the calibrated model. 4.2.3. Partial dependence analysis In this study, partial dependence analysis was carried out for each of the independent variables (i.e., C, BFS, FA, W, SP, CA, S, and A) employed in the ML model. Fig. 9 shows the partial dependence plots (PDPs) of concrete’s CS in response to different predictors. Various in­ dependent variables exhibit varying ranges of compressive strength in this figure. In terms of strength difference, the A, C, and W were the most influential parameters, while the CA, SP, and FA were the least ones. This finding is consistent with that obtained from SHAP values [Fig. 8 (b)], and that reported in [67]. Fig. 9 also illustrates that concrete’s strength increases as its cement content increases; however, an increase in water content will result in a significant decline in strength. Further, the strength of concrete in­ creases significantly up to about 30 days, but afterward, it remains Fig. 6. Pareto frontier results for hyperparameter optimization. Fig. 7. Model vs. target for the optimized model: (a) training, and (b) testing datasets. 6 M.I. Khan and Y.M. Abbas Materials Today Communications 35 (2023) 105793 Fig. 8. Feature importance by: (a) Gini index-, and (b) SHAP value-based methods. Fig. 9. CS PDPs of: (a) C, (b) BFS, (c) FA, (d)W, (e) SP, (f) CA, (g) S, (h) A. Fig. 10. PDPs of C and BFS: (a) isoresponse contours, and (b) response surface. relatively stable. These findings are widely known, which strengthens the reliability of the developed model. The concrete constituent mate­ rials are also given optimum values in Fig. 9. A general rule of thumb is that the ideal FA and SP contents are 100–200 and 7–13 kg/m3, respectively. The concrete’s compressive strength PDPs shown in Fig. 10 through Fig. 17 illustrate some of the most important mutual relationships (in the existence of C and W). These plots would guide the precise selection of concrete’s constituent materials. The 2D and 3D isoresponses of C and BFS are provided in Fig. 10. A slight decrease in concrete’s compressive 7 M.I. Khan and Y.M. Abbas Materials Today Communications 35 (2023) 105793 Fig. 11. PDPs of C and FA: (a) isoresponse contours, and (b) response surface. Fig. 12. PDPs of C and W: (a) isoresponse contours, and (b) response surface. strength was associated with an increase in BFS content. An observation similar to that made by Türkmen et al. [68] concluded that BFS incor­ poration led to strength reductions in concrete, especially at early con­ crete ages. As a general guideline, the best content for C and BFS can be found between 400 and 450 and 100–200 kg/m3, respectively. An illustration of the concrete’s strength sensitivity to the C-FA combination is shown in Fig. 11. In this figure, FA had a relatively small effect on the concrete compressive strength for cement contents below 350 kg/m3, as no noticeable strength changes were observed with FA addition. The C and W PDPs are presented in Fig. 12. As expected, the maximum strength was attained at the lowest water content (less than 160 kg/m3) and highest cement content (more than 380 kg/m3). Fig. 13 depicts the compressive strength of concrete as dependent on C and S. The figure indicates that a higher ratio of sand to cement will result in concrete with higher strength. The improvement in fracture resistance of the concrete is likely to be caused by the increased inter­ locking at higher sand contents [69]. It seems that 600–800 kg/m3 of sand is the optimal amount for concrete, as shown in Fig. 13. Fig. 14 shows the development of the concrete’s compressive strength with age for different cement dosages. It is evident from the figure that strength increases significantly at early ages (up to 30 days), but little strength gains occur at older ages. In general, it is known that concrete materials will significantly increase in strength over time, with a characteristic strength assessment taking place at 28 days. Fig. 15 shows the combined effect of water and BFS on concrete’s compressive strength. This figure illustrates, as discussed earlier, that increasing BFS content marginally decreases strength at a constant amount of water. The increase in water, however, significantly reduces Fig. 13. PDPs of C and S: (a) isoresponse contours, and (b) response surface. 8 M.I. Khan and Y.M. Abbas Materials Today Communications 35 (2023) 105793 Fig. 14. PDPs of C and A: (a) isoresponse contours, and (b) response surface. Fig. 15. PDPs of W and BFS: (a) isoresponse contours, and (b) response surface. Fig. 16. PDPs of W and FA: (a) isoresponse contours, and (b) response surface. the strength. Normally, a water-binder ratio below 0.2 (about 150 kg/ m3 water content) is required for hydration [70]. As more water is added, hardened cement will break away from the aggregate surface (as a result of water lubrication at the molecular level). An illustration of the concrete’s response to water and FA in­ teractions is shown in Fig. 16. No significant change was noted for FA replacement compared to cement in terms of compressive strength. It is inconsistent with the findings of various investigators [71–73], that FA can insignificantly decrease concrete’s compressive strength at early ages (1–7 days); however, it will not alter its long-term strength. Fig. 17 shows how water and superplasticizer affect the concrete’s compressive strength. As shown in the figure, SP has a negligible effect on compressive strength at low water contents (less than 180 kg/m3) but has a stochastic effect at high water contents. According to the current study data, the random response is likely caused by the different chemical properties of the high-range water reducers. 4.3. Deployment of the model This study offers a free and easy-to-use graphical user interface (GUI) to facilitate user interaction with the developed XG Boost model. A sliding control system has been implemented in Python and Gradio [74], allowing input values to be limited between minimum and maximum (Table 4). The GUI developed assists in optimizing and predicting the strength of concrete containing BFS and FA. According to Fig. 18, this GUI consists of four main components, 9 M.I. Khan and Y.M. Abbas Materials Today Communications 35 (2023) 105793 Fig. 17. PDPs of W and SP: (a) isoresponse contours, and (b) response surface. Fig. 18. GUI for XG Boost model-based prediction of the compressive strength of concrete. namely input features with slider controls, output results, explanations, and some examples. The model outputs the concrete’s strength (in MPa), as well as its class ("normal concrete" if it has a strength lower than 60 MPa, otherwise "high strength concrete"). A GUI explanation is based on SHAP values [Fig. 8(b)], where the user can see how it is possible to affect concrete’s compressive strength by varying the amount of the input variables. The GUI also displays three examples of input variables that can be chosen and submitted to view the model’s output. Through the use of the GUI developed in this study, the normal and high strength of concrete can be optimized in a shorter time period, at a lower cost, with fewer efforts required. The study further provides 1D, 2D, and 3D PDPs. As a result of this study, the following implications were drawn: • Using the developed model, it is likely to be possible to predict the strength of concrete that contains up to 360 kg/m3 BFS and 200 kg/ m3 FA at up to one year of age. • As a starting point, a baseline model based on standardized hyper­ parameters was developed. The baseline model exhibited a tendency toward overfitting, with R2 values of 0.996 and 0.919 for the training and testing datasets, respectively. • The hyperparameters of the model have been optimized using vector multi-objective optimization to maximize the prediction capability of the model. According to this study, the number of gradient boosted 5. Implications, recommendations, and outlook This study developed an XG Boost model that accurately predicted concrete compressive strength. A total of 1030 concrete mixes con­ taining OPC (type I), BFS, and FA were collected from 11 different laboratories for concrete under normal moisture curing conditions. Additionally, the study provides a simple and free GUI to support the design of normal- and high-strength concrete containing BFS and FA. 10 Materials Today Communications 35 (2023) 105793 M.I. Khan and Y.M. Abbas trees, boosting learning rate, and subsample ratio were the most influential hyperparameters in the model performance. • The optimized XG Boost model in the current investigation exhibited a superior prediction performance with R2 of 0.992 and 0.949 for the training and testing datasets. The optimized model results in pre­ dicted–target data for the test dataset of accuracy in the range of [8] [9] [10] ± 85%. • Based on Gini indexes, C, FA, water, and both aggregate types were the most significant model parameters. An analysis of SHAP values resulted in consistent findings. • According to this study, the best BFS, FA, S, and SP contents for concrete strength optimization were 100–200, 100–200, 600–800, and 7–13 kg/m3, respectively. • The SP has a negligible effect on concrete’s compressive strength at low water contents (less than 180 kg/m3), but a stochastic effect at high contents. The various chemical properties of high-range water reducers may have resulted in the randomly generated response in the current study. [11] [12] [13] [14] [15] [16] In future studies, the scope would be expanded to include concrete with a wide variety of SCMs (e.g., SF, metakaolin, rice husk ash,.etc.) and fiber reinforcement systems. By using unseen data from the model, the reliability of the model will be further examined. [17] [18] [19] [20] [21] CRediT authorship contribution statement It is confirmed that neither the manuscript nor any parts of its con­ tent are currently under consideration or published in another journal. All authors have approved the manuscript and agree with its submission to Materials Today Communications. [22] [23] Declaration of Competing Interest [24] The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. [25] [26] Data availability [27] [28] Data will be made available on request. Acknowledgments [29] The authors extend their appreciation to Researcher Supporting Project number (RSPD2023R692), King Saud University, Riyadh, Kingdom of Saudi Arabia. [30] References [31] [1] B. Cohen, Urbanization, city growth, and the new united nations development agenda, Cornerstone 3 (2) (2015) 4–7. [2] M. Rangelov, H. Dylla, B. Dobling, J. Gudimettla, N. Sivaneswaran, M. Praul, Readily implementable strategies for reducing embodied environmental impacts of concrete pavements in the United States, Transp. Res. Rec. (2022), 03611981221086934. [3] L. Rodgers, Climate change: The massive CO2 emitter you may not know about, 2022. (Accessed August 27 2022). [4] D. Ndahirwa, H. Zmamou, H. Lenormand, N. Leblanc, The role of supplementary cementitious materials in hydration, durability and shrinkage of cement-based materials, their environmental and economic benefits: a review, Clean. Mater. 100123 (2022). [5] M.C. Juenger, R. Snellings, S.A. Bernal, Supplementary cementitious materials: new sources, characterization, and performance insights, Cem. Concr. Res. 122 (2019) 257–273. [6] Z. Guo, T. Jiang, J. Zhang, X. Kong, C. Chen, D.E. Lehman, Mechanical and durability properties of sustainable self-compacting concrete with recycled concrete aggregate and fly ash, slag and silica fume, Constr. Build. Mater. 231 (2020), 117115. [7] Y.M. Abbas, L.A. Hussain, M.I. Khan, Constitutive compressive stress–strain behavior of hybrid steel-PVA high-performance fiber-reinforced concrete, J. Mater. [32] [33] [34] [35] [36] [37] [38] 11 Civ. Eng. 34 (1) (2022), 04021401, https://doi.org/10.1061/(ASCE)MT.19435533.0004041. I.B. Topcu, M. Sarıdemir, Prediction of compressive strength of concrete containing fly ash using artificial neural networks and fuzzy logic, Comput. Mater. Sci. 41 (3) (2008) 305–311. J.-S. Chou, C.-K. Chiu, M. Farfoura, I. Al-Taharwa, Optimizing the prediction accuracy of concrete compressive strength based on a comparison of data-mining techniques, J. Comput. Civ. Eng. 25 (3) (2011) 242–253. M.S. Reddy, P. Dinakar, B.H. Rao, Mix design development of fly ash and ground granulated blast furnace slag based geopolymer concrete, J. Build. Eng. 20 (2018) 712–722. A.A. Aliabdo, M. Abd Elmoaty, M.A. Emam, Factors affecting the mechanical properties of alkali activated ground granulated blast furnace slag concrete, Constr. Build. Mater. 197 (2019) 339–355. L. Baoju, X. Youjun, Z. Shiqiong, L. Jian, Some factors affecting early compressive strength of steam-curing concrete with ultrafine fly ash, Cem. Concr. Res. 31 (10) (2001) 1455–1458. Wikipedia contributors, ML, 2022. (Accessed 27 August 2022. J.J. Lee, D. Kim, S.K. Chang, C.F.M. Nocete, An improved application technique of the adaptive probabilistic neural network for predicting concrete strength, Comput. Mater. Sci. 44 (3) (2009) 988–998. L. Van der Maaten, G. Hinton, Visualizing data using t-SNE, J. ML Res. 9 (11) (2008). V. Rathakrishnan, A.N. Ahmed, Predicting compressive strength of highperformance concrete with high volume ground granulated blast-furnace slag replacement using boosting ML algorithms, Sci. Rep. 12 (1) (2022) 1–16. T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794. J. Friedman, T. Hastie, R. Tibshirani, Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors), Ann. Stat. 28 (2) (2000) 337–407. Wikipedia contributors, XGBoost, 2022. (Accessed August 29 2022). Wikipedia contributors, Outlier, 2022. (Accessed February 18 2023). H. Xu, H. Wang, C. Yuan, Q. Zhai, X. Tian, L. Wu, Y. Mi, Identifying diseases that cause psychological trauma and social avoidance by GCN-Xgboost, BMC Bioinforma. 21 (16) (2020) 1–16. M. Kivrak, E. Guldogan, C. Colak, Prediction of death status on the course of treatment in SARS-COV-2 patients with deep learning and ML methods, Comput. Methods Prog. Biomed. 201 (2021), 105951. P. Carmona, A. Dwekat, Z. Mardawi, No more black boxes! explaining the predictions of a ML XGBoost classifier algorithm in business failure, Res. Int. Bus. Financ. 61 (2022), 101649. G.E. Hinton, S. Roweis, Stochastic neighbor embedding, Adv. Neural Inf. Process. Syst. 15 (2002). Y. Kawase, K. Mitarai, K. Fujii, Parametric t-stochastic neighbor embedding with quantum neural network, Phys. Rev. Res. 4 (4) (2022), 043199, https://doi.org/ 10.1103/PhysRevResearch.4.043199. K. Ch’ng, N. Vazquez, E. Khatami, Unsupervised ML account of magnetic transitions in the Hubbard model, Phys. Rev. E 97 (1) (2018), 013306. L. Van Der Maaten, Accelerating t-SNE using tree-based algorithms, J. ML Res. 15 (1) (2014) 3221–3245. A.C. Belkina, C.O. Ciccolella, R. Anno, R. Halpert, J. Spidlen, J.E. Snyder-Cappione, Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets, Nat. Commun. 10 (1) (2019) 5415, https://doi.org/10.6084/m9.figshare.9927986.v1. H. Liu, J. Yang, M. Ye, S.C. James, Z. Tang, J. Dong, T. Xing, Using t-distributed Stochastic Neighbor Embedding (t-SNE) for cluster analysis and spatial zone delineation of groundwater geochemistry data, J. Hydrol. 597 (2021), 126146, https://doi.org/10.1016/j.jhydrol.2021.126146. V. van Unen, T. Höllt, N. Pezzotti, N. Li, M.J. Reinders, E. Eisemann, F. Koning, A. Vilanova, B.P. Lelieveldt, Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types, Nat. Commun. 8 (1) (2017) 1740, https://doi.org/10.1038/s41467-017-01689-9. E.M. Golafshani, A. Behnood, M. Arashpour, Predicting the compressive strength of normal and high-performance concretes using ANN and ANFIS hybridized with Grey Wolf Optimizer, Constr. Build. Mater. 232 (2020), 117266, https://doi.org/ 10.1016/j.conbuildmat.2019.117266. H. Sun, H.V. Burton, H. Huang, ML applications for building structural design and performance assessment: state-of-the-art review, J. Build. Eng. 33 (2021), 101816. J. Naranjo-Pérez, M. Infantes, J.F. Jiménez-Alonso, A. Sáez, A collaborative MLoptimization algorithm to improve the finite element model updating of civil engineering structures, Eng. Struct. 225 (2020), 111327. I.-C. Yeh, Modeling of strength of high-performance concrete using artificial neural networks, Cem. Concr. Res. 28 (12) (1998) 1797–1808. R. Gupta, M.A. Kewalramani, A. Goel, Prediction of concrete strength using neuralexpert system, J. Mater. Civ. Eng. 18 (3) (2006) 462–466. H.I. Erdal, O. Karakurt, E. Namli, High performance concrete compressive strength forecasting using ensemble models based on discrete wavelet transform, Eng. Appl. Artif. Intell. 26 (4) (2013) 1246–1254. E. Ghafari, M. Bandarabadi, H. Costa, E. Júlio, Prediction of fresh and hardened state properties of UHPC: comparative study of statistical mixture design and an artificial neural network model, J. Mater. Civ. Eng. 27 (11) (2015), 04015017. N.-H. Nguyen, J. Abellán-García, S. Lee, E. Garcia-Castano, T.P. Vo, Efficient estimating compressive strength of ultra-high performance concrete using XGBoost M.I. Khan and Y.M. Abbas [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] Materials Today Communications 35 (2023) 105793 [56] W.S. Langley, G.G. Carette, V. Malhotra, Structural concrete incorporating high volumes of ASTM class fly ash, Mater. J. 86 (5) (1989) 507–514. [57] C. Lee, A study on dry shrinkage and creep property of HPC, National Taiwan Univ. of Science and Technology. Taipei, Taiwan (1994). [58] M. Lessard, O. Challal, P.-C. Aticin, Testing high-strength concrete compressive strength, Mater. J. 90 (4) (1993) 303–307. [59] F. Lin, Mixture proportion and quality of HPC, National Taiwan Univ. of Science and Technology. Taipei, Taiwan (1994). [60] H.L. Mo, A study on high performance concrete, National Taiwan Univ. of Science and Technology, Taipei, Taiwan, 1995. [61] Wikipedia contributors, Pearson correlation coefficient, 2022. (Accessed 28 August 2022. [62] G.V. Rossum, J.F. Drake, Python reference manual, Centrum voor Wiskunde en Informatica Amsterdam 1995. [63] S.K. Babanajad, A.H. Gandomi, A.H. Alavi, New prediction models for concrete ultimate strength under true-triaxial stress states: An evolutionary approach, Adv. Eng. Softw. 110 (2017) 55–68. [64] V. Pareto, Cours d′ économie politique, Librairie Droz1964. [65] T. Hastie, R. Tibshirani, J.H. Friedman, J.H. Friedman, The elements of statistical learning: data mining, inference, and prediction, Springer,, 2009. [66] J. Gong, S. Chu, R.K. Mehta, A.J. McGaughey, XGBoost model for electrocaloric temperature change prediction in ceramics, npj Comput. Mater. 8 (1) (2022) 1–10. [67] H.-V.T. Mai, T.-A. Nguyen, H.-B. Ly, V.Q. Tran, Investigation of ann model containing one hidden layer for predicting compressive strength of concrete with blast-furnace slag and fly ash, Adv. Mater. Sci. Eng. 2021 (2021). [68] İm Türkmen, R. Gül, C. Çel k, R. Dem rboğa, Determination by the Taguchi method of optimum conditions for mechanical properties of high strength concrete with admixtures of silica fume and blast furnace slag, Civ. Eng. Environ. Syst. 20 (2) (2003) 105–118. [69] G. Giaccio, R. Zerbino, Failure mechanism of concrete: combined effects of coarse aggregates and strength level, Adv. Cem. Based Mater. 7 (2) (1998) 41–48. [70] M. Khan, Y. Abbas, G. Fares, Review of high and ultrahigh performance cementitious composites incorporating various combinations of fibers and ultrafines, J. King Saud. Univ. -Eng. Sci. 29 (4) (2017) 339–347, https://doi.org/ 10.1016/j.jksues.2017.03.006. [71] E. Kearsley, P. Wainwright, The effect of high fly ash content on the compressive strength of foamed concrete, Cem. Concr. Res. 31 (1) (2001) 105–112. [72] X.-Y. Wang, K.-B. Park, Analysis of compressive strength development of concrete containing high volume fly ash, Constr. Build. Mater. 98 (2015) 810–819. [73] R. Siddique, M.I. Khan, Supplementary cementing materials, Springer Science & Business Media, 2011. [74] A. Abid, A. Abdalla, A. Abid, D. Khan, A. Alfozan, J. Zou, Gradio: Hassle-free sharing and testing of ML models in the wild, arXiv preprint arXiv:1906.02569 (2019). model, J. Build. Eng. 52 (2022), 104302, https://doi.org/10.1016/j. jobe.2022.104302. Z. Chao, Y. Dang, Y. Pan, F. Wang, M. Wang, J. Zhang, C. Yang, Prediction of the shale gas permeability: A data mining approach, Geomech. Energy Environ. (2023), 100435, https://doi.org/10.1016/j.gete.2023.100435. W. Zhang, D. Shi, Z. Shen, W. Shao, L. Gan, Y. Yuan, P. Tang, S. Zhao, Y. Chen, Reduction of the calcium leaching effect on the physical and mechanical properties of concrete by adding chopped basalt fibers, Constr. Build. Mater. 365 (2023), 130080, https://doi.org/10.1016/j.conbuildmat.2022.130080. Z. Chao, B. Gong, W. Yue, X. Xu, D. Shi, C. Yang, T. Hu, Experimental study on stress-dependent gas permeability and porosity of artificially cracked cement mortar, Constr. Build. Mater. 359 (2022), 129290, https://doi.org/10.1016/j. conbuildmat.2022.129290. H. Al Khalifah, P. Glover, P. Lorinczi, Permeability prediction and diagenesis in tight carbonates using ML techniques, Mar. Pet. Geol. 112 (2020), 104096, https:// doi.org/10.1016/j.marpetgeo.2019.104096. H. Wang, J. Shen, An improved model combining evolutionary algorithm and neural networks for PV maximum power point tracking, IEEE Access 7 (2018) 2823–2827, https://doi.org/10.1109/ACCESS.2018.2881888. S. Chengyi, S. Yan, X. Keming, Mind-evolution-based ML and applications, Proceedings of the 3rd World Congress on Intelligent Control and Automation (Cat. No. 00EX393), IEEE, 2000, pp. 112–117. E. Alpaydin, Introduction to ML, MIT press,, 2020. W.B. Chaabene, M. Flah, M.L. Nehdi, ML prediction of mechanical properties of concrete: Critical review, Constr. Build. Mater. 260 (2020), 119889. R. Siddique, P. Aggarwal, Y. Aggarwal, Prediction of compressive strength of selfcompacting concrete containing bottom ash using artificial neural networks, Adv. Eng. Softw. 42 (10) (2011) 780–786. Concrete Compressive Strength Data Set, in: P.I.-C. Yeh (Ed.) August 3, 2007. I.C. Yeh, Modeling of strength of high-performance concrete using artificial neural networks, Cement. 302 and Concrete Research 28(12) (1998).https://doi. org/10.1016/S0008–8846(98)00165–3. T.P. Chang, F.C. Chuang, H.C. Lin, A mix proportioning methodology for highperformance concrete, J. Chin. Inst. Eng. 19 (6) (1996) 645–655. C. Chang, Research on the mix proportion of high flowing eugenic concrete, Chung Hua Univ., Hsin Chu, Taiwan (1997). F. Chung, Study on characteristic of coarse aggregate in high-performance concrete, National Taiwan Univ. of Science and Technology. Taipei, Taiwan (1995). G. Giaccio, C. Rocco, D. Violini, J. Zappitelli, R. Zerbino, High-strength concretes incorporating different coarse aggregates, Mater. J. 89 (3) (1992) 242–246. O.E. Gjorv, P.J. Monteiro, P.K. Mehta, Effect of condensed silica fume on the steelconcrete bond, Mater. J. 87 (6) (1990) 573–580. T. Hwang, Compressive strength of blast furnace slag concrete, National Chiao Tung Univ. Hsin Chu, Taiwan (1966). 12