pubs.acs.org/IC Article Data Mining and Graph Network Deep Learning for Band Gap Prediction in Crystalline Borate Materials Ruihan Wang,∥ Yeshuang Zhong,∥ Xuehua Dong, Meng Du, Haolun Yuan, Yurong Zou, Xin Wang, Zhien Lin,* and Dingguo Xu* Downloaded via INDIAN INST OF TECH MANDI on September 5, 2023 at 05:00:37 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles. Cite This: Inorg. Chem. 2023, 62, 4716−4726 ACCESS Metrics & More Read Online Article Recommendations sı Supporting Information * ABSTRACT: Crystalline borates are an important class of functional materials with wide applications in photocatalysis and laser technologies. Obtaining their band gap values in a timely and precise manner is a great challenge in material design due to the issues of computational accuracy and cost of first-principles methods. Although machine learning (ML) techniques have shown great successes in predicting the versatile properties of materials, their practicality is often limited by the data set quality. Here, by using a combination of natural language processing searches and domain knowledge, we built an experimental database of inorganic borates, including their chemical compositions, band gaps, and crystal structures. We performed graph network deep learning to predict the band gaps of borates with accuracy, and the results agreed favorably with experimental measurements from the visible-light to the deep-ultraviolet (DUV) region. For a realistic screening problem, our ML model could correctly identify most of the investigated DUV borates. Furthermore, the extrapolative ability of the model was validated against our newly synthesized borate crystal Ag3B6O10NO3, supplemented by the discussion of an ML-based material design for structural analogues. The applications and interpretability of the ML model were also evaluated extensively. Finally, we implemented a web-based application, which could be utilized conveniently in material engineering for the desired band gap. The philosophy behind this study is to use cost-effective data mining techniques to build high-quality ML models, which can provide useful clues for further material design. ■ INTRODUCTION Borates, the largest group of oxide minerals, have been widely employed as non-linear optical (NLO) crystals, photocatalysts, flame retardants, and luminescent materials.1−4 An understanding and manipulation of the band structures of borate crystals are important for designing novel advanced functional materials. For instance, deep-ultraviolet (DUV; wavelength, λ, < 200 nm) NLO crystals require a band gap of >6.2 eV,5 whereas visible-light photocatalysts require a band gap of <3.2 eV.6 Therefore, the rapid and precise prediction of band gaps can accelerate the discovery of new borate materials with favorable properties, and chemists can benefit from predicting band gaps before synthesis. Supervised machine learning (ML) is a powerful and efficient tool for predicting the band gap,7−9 adsorption volume,10,11 formation energy,12,13 and stability.14,15 In the context of band gap predictions, Omprakash et al.16 applied graph neural networks (GNNs) to predict varying perovskite band gaps in a few milliseconds. The GNN model was trained using a database of 24,501 perovskites created based on density functional theory (DFT) calculations. Xie and Grossman17 developed a generalized crystal-graph convolution neural © 2023 American Chemical Society network (CGCNN) to predict the band gaps of 46,744 inorganic crystals. Once trained, the properties of thousands of materials could be predicted within seconds. The relatively low computational demand makes the ML technology an extremely promising option for novel material discovery paradigms. The foundation of a successful ML project depends on the data set. Most public databases have been generated based on DFT calculations at the less accurate generalized gradient approximation (GGA) level; these include the Automatic Flow (AFLOW),18 Open Quantum Materials database (OQMD),19 and Materials Project.20 However, a well-known drawback is that the band gaps determined by GGA severely underestimate the actual band gaps.21,22 This underestimation is even more pronounced when calculating ultrawide band gap compositions, particularly for some DUV borates. He et al.5 compared Received: January 19, 2023 Published: March 8, 2023 4716 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726 Inorganic Chemistry pubs.acs.org/IC Article Figure 1. (a) Data mining procedure in this study. (b) Schematic diagram to illustrate the CGCNN model. the calculated band gap obtained from the GGA level and the experimental band gap for 10 DUV borate crystals, such as βBaB2O423 (4.30 vs 6.57 eV), KBe2BO3F224 (5.79 vs 8.27 eV), and KAl2B2O725 (3.98 vs 6.89 eV). Therefore, if we employ the GGA level in DFT calculations to generate the data set for subsequent ML model construction, we could miss some candidate structures in the screening for target properties. A practical remedy for such GGA underestimation is to use the many-body-perturbation-theory-based GW approach26 or the hybrid exchange−correlation (XC) functional, including exact Hartree−Fock exchange,27 to generate data sets. Furthermore, theoretical methods that consume computational resources are currently unsuitable for high-throughput calculation research. Directly using experimental values as the data sets for the ML training process has emerged as a more efficient approach to bypass the band gap underestimation problem of the Perdew−Burke−Ernzerhof (PBE)28 functional or the timeconsuming problem of the hybrid XC functional. For example, Zhou et al.29 recorded 3896 experimental band gap samples for inorganic solids into a highly useful database and constructed an ML model based on the properties of the constituent elements. However, no structural information was provided in this database, limiting the development of structure-based algorithms and the discussion of structure−function relationships. To improve the accuracy of predictions for the experimental band gap, Chen et al.30 developed multi-fidelity graph networks (MF-GNNs) that allowed for the efficient learning of latent structural characteristics from large amounts of low-fidelity computed data. When using these previously published ML models to predict specific systems, such as the band gaps of borates, some inaccuracies still exist (see Figure S1). Possible reasons for this may include a limited number of borates in the training set, and the lack of a widely accessible borate database of experimental band gaps. Therefore, we propose that frequent updates of databases and system-specific ML models are essential to fulfill diverse prediction purposes. In this study, we constructed a borate database of experimental band gaps by employing data mining techniques to extract text from scientific literature. Several state-of-the-art CGCNN models were applied to predict the band gaps of borate crystals. More importantly, the predictions agreed reasonably well with experimental measurements, achieving a mean absolute error (MAE) of 0.40 eV for the test set. A new borate crystal, Ag3B6O10NO3, was synthesized to validate the robustness of the workflow. For a realistic screening of borates at a short ultraviolet−visible (UV) cutoff edge (λcutoff < 200 nm), our prediction accuracy was the highest compared to that of previously published ML models and DFT calculations. This excellent extrapolation capability also motivated us to extend the trained model to a larger computational database consisting of approximately 2000 borates. Furthermore, the resulting learned characteristics allowed us to conduct interesting investigations into the interpretability of crystalgraph networks. Finally, we developed a web tool that could enable users to conveniently upload borate crystallographic information files (CIFs) to obtain experiment-based band gaps. ■ METHODOLOGY Data Set Collation. Numerous studies31−34 have focused on text extraction from scientific literature based on a combination of the natural language processing (NLP)35−37 toolkit and full-text publisher application programming interfaces (APIs).38 Details on the mining procedure can be found in the ref 36, and only a brief summary is provided herein. From four publishers, namely, Elsevier, the American Chemical Society, the Royal Society of Chemistry, and Springer, we compiled a corpus of borates including approximately 1000 papers. The chemical names and band gaps were automatically mined from the paragraphs of each article in this highly specialized borate corpus using keywords [band gap, ultraviolet−visible (UV−vis) spectroscopy, and cutoff edges]. 4717 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726 Inorganic Chemistry pubs.acs.org/IC Article Figure 2. (a) Heat map plot of the periodic table shows the elements that are shown in all borates in the database. (b) Density of the distribution of band gaps for all the structures contained in the database. Despite the efficiency of automated extraction techniques, additional steps must be considered to acquire a complete data set. First, several studies have only provided theoretical band gaps based on DFT calculations. Consequently, in our analysis, each extracted piece of data was manually examined to ensure that the results were obtained from experimental measurements. Second, the NLP approach is unable to capture some of the band gap data from the UV−vis images. Thus, the band gap must be extracted using domain knowledge. The band gap corresponds to the absorption limit, which was roughly estimated by the following equation.39 Eg = hc eV = 1240 The kinetic energy cutoff for the plane-wave basis set expansion was set to 520 eV. A Gamma-centered grid of kpoints was used for integration in the reciprocal space. During relaxation, the thresholds for the forces on the atoms were set to less than 0.05 eV/Å. A convergence criterion of 10−5 eV was chosen for the electronic optimization. The quickly use VASP (qvasp)42 program was used to obtain the band gaps of all borates. CGCNN Model and Training. A diagram of the CGCNN calculation procedure is shown in Figure 1b; more details can be found elsewhere.17 Essentially, nodes and edges are characterized by vectors for atoms and bonds, respectively. The overall architecture of the CGCNN model consists of four steps. The first step, named the preprocessing layer, encodes the node and edge features into specified dimension vectors. Next are the graph convolution layers, which perform the convolution operation for iterative updates. Once all atom properties have been aggregated, the pooling layer then offers a vector for the overall representation of the crystal graph. Finally, a fully connected layer and output layer are used to predict the target. To avoid any bias introduced by overfitting, we used L2 regularization algorithms with a weight decay of 0.0001. Synthesis and Characterization of Ag3B6O10NO3. Colorless crystals of Ag3B6O10NO3 were obtained by heating a mixture of AgNO3 (0.340 g), LiNO3 (0.069 g), and H3BO3 (0.927 g) in a Teflon-lined stainless-steel vessel at 230 °C for 4 d (76% yield based on silver). Single-crystal X-ray diffraction (XRD) data for this compound were collected using a New Gemini, Dual, Cu at zero, and EosS2 diffractometer at room temperature. The crystal structure was determined using a direct method. The structure was refined on F2 by full-matrix least-squares methods using the SHELXTL program package.43,44 The powder XRD pattern of the as-synthesized compound obtained using a Shimadzu XRD-6100 diffrac- eV where Eg is the band gap (eV), h is Planck’s constant (6.62 × 10−34 J s), c is the light velocity (3 × 108 m/s), and λ is the wavelength (nm). Most band gaps derived from the publications were not accompanied by adequate crystallographic data; only compositional information was provided.29 Thus, to build a more comprehensive database, we needed to obtain the structure files corresponding to each chemical formula of the borates. We compared the chemical names obtained from the Cambridge Crystallographic Data Centre to determine the structural details of each borate component. Another difficult problem is that some downloaded crystal structures have atom partial occupancy issues. After the data cleaning operation, the data set was left with 276 useable data and valid CIF files, as shown in Figure 1a. CIF formats were selected because of their superiority in representing material information. DFT Calculation. The electronic structures of some borates were determined using the Vienna ab initio simulation package (VASP).40 GGA with the PBE functional was applied.28 The core−valence electron interactions were described using the projector augmented wave method.41 4718 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726 Inorganic Chemistry pubs.acs.org/IC tometer with Cu Kα radiation (λ = 1.5418 Å) was in agreement with the simulated pattern, confirming its phase purity (Figure S2). Furthermore, thermogravimetric analysis, performed using a Mettler Toledo TGA/DSC 2/1600 thermal analyzer, showed that it remained stable up to 500 °C (Figure S3). The weight loss between 500 and 650 °C is caused by the decomposition of NO3 groups, assuming the solid residue to be a mixture of Ag2O and B2O3 (observed 91.7%; calculated 91.2%). The IR spectrum was measured using a Nicolet Impact 410 FTIR spectrometer, indicating the coexistence of NO3, BO3, and BO4 groups in the structure (Figure S4). The strong band at 1379 cm−1 can be attributed to the stretching vibration of the NO3 groups. The bands at 1339 and 1173 cm−1 are attributed to asymmetric stretching and symmetric stretching of the BO3 groups, respectively. The strong band at 1013 cm−1 is attributed to the asymmetric stretching of B−O in the BO4 tetrahedra. The bands at approximately 810−890 cm−1 arise from the bending vibration of NO3 groups and the B−O symmetric stretching in the BO4 tetrahedra. Article The Pearson product−moment correlation coefficient (PCC), root-mean-squared error (RMSE), and MAE were calculated to evaluate the performance of the CGCNN models. Notably, higher PCC values and lower RMSE and MAE values indicate better prediction accuracies. To visualize the model performance, the predictions of the CGCNN model were obtained by selecting the best among the 10 random seeds. The accuracy of the CGCNN model can be further verified by visualizing scatter plots, as shown in Figure 3, in which most of the points are observed to be distributed ■ RESULTS AND DISCUSSION Extracted Data Set. Utilizing both human and NLP searches, we constructed a database that could be as representative as possible. The database was first analyzed to capture variations in data dispersion. Data collection consisted of 48 different elements. Figure 2a demonstrates that the Ba element (24%) has the largest probability of appearing in the database, and alkali metal elements (Li, Na, and K) also account for a relatively high proportion. However, the proportion of some elements (Nd, Pd, and Sm) is less than 1%, indicating that the database may have some degree of imbalance, which could further limit the effectiveness of subsequent applications of the developed ML model. The heatmap displayed in Figure 2a can determine whether the new borate to be tested contains elements that have previously been identified in the training set. It would be more reliable to predict the band gaps for any of the above-mentioned types. However, it is essential that more samples be added to the other types of borates by employing data mining technology. We believe that this will be an important direction for future research. However, we must also evaluate the distribution of label values in the regression task. Figure 2b shows that the band gaps range from 2.43 eV (Pb3OBO3F45 ) to 9.92 eV (SrB4O746), and the highest density regions are located at 4.05 and 5.50 eV, respectively. In particular, 20% of the samples have band gaps greater than 6.20 eV (wavelength λ < 200 nm), which implies that this part of the data can be trained for the ML model to improve the underestimation of PBE calculations. In general, all information in the database, including chemical formulas, band gaps, structure files (CIF), and references, is available in the Supporting Information (Table S1). Model Training and Evaluation. The CGCNN17 architecture was selected for predicting the borate band gaps. The entire database was partitioned randomly into two parts: an 85% training set and 15% test set, using the same 10 random seeds, which resulted in similar performance. For a maximum of 1000 epochs, two graph convolution layers and a fully connected layer after the pooling operation were trained using the relevant training data. The hyperparameters were determined using a Bayesian optimization search strategy, and all optimized hyperparameters are summarized in Table S2. Figure 3. Accuracy representations for the CGCNN model. The red diagonal line indicates a perfect correlation between experimental band gaps and the values predicted by regression models. close to the diagonal line. The statistical PCC, RMSE, and MAE for the test set were 0.90, 0.54, and 0.40 eV, respectively. Previously, Zhuo et al.29 obtained an MAE of 0.75 eV and an RMSE of 1.46 eV for experimental band gaps using a support vector regression model. Chen et al.30 developed multi-fidelity graph networks to train with disordered and ordered crystals for experimental band gaps, and the corresponding MAEs were 0.51 and 0.37 eV, respectively. Compared to these reports, we believe that our ML model has adequate accuracy. Robustness of the Model. To demonstrate the robustness of our predictive band gap models, we predicted 24 borates that were not included in the data set, as shown in Table 1. Notably, only a range of UV cutoff edges was provided for all 24 borates, rather than the band gap values. For example, BaBe2BO3F347 has a short UV cutoff edge (λcutoff < 185 nm), indicating that the band gap is larger than 6.70 eV. The band gaps of these borates are all greater than 6.2 eV, belonging to the DUV type, according to the results of UV spectroscopy. For a more comprehensive evaluation, we selected a commonly used PBE functional in this high-throughput screening task for comparison. In addition, we selected two previously reported GNN models, the MEGNet64 and MFGNN30 models. Although MEGNet performed well on the DFT computational database,64 and the MF-GNN model performed well on experimental data sets,30 neither of these models discussed the performance of DUV type crystals. For instance, experimental measurements indicate that the band gap of GdBe2B5O1163 is larger than 6.20 eV. However, the calculated value based on the PBE functional was only 5.45 eV. The values predicted based on the MEGNet and MF-GNN 4719 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726 Inorganic Chemistry pubs.acs.org/IC Article Table 1. Some DUV-Type Borate Crystals without Specific Values of Band Gapsa borate materials exp PBE MEGNet MF-GNN this work refs. Cd3LiNa4Be4B10O24F Na3B4O7Br Li5Rb2B7O14 Na3B4O7Cl BaBe2BO3F3 K2Ba[B4O5(OH)4]2·10H2O Li3Ba4Sc3(BO3)4(B2O5)2 K2B4O11H8 KBe4B3O9 CsBe4(BO3)3 BaB8O12F2 Li3KB4O8 K5B19O31 K7BaY2(B5O10)3 Li6CuB4O10 LiBeBO3 K2B5O8(OH)·2H2O BaB2O3F2 GdBe2B5O11 YBe2B5O11 Sr3LiNa4Be4B10O24F K7SrY2(B5O10)3 LiRbB5O8(OH)·H2O K7CaY2(B5O10)3 >6.2 >6.2 >6.53 >6.2 >6.7 >6.2 >6.2 >6.89 >6.2 >6.2 >6.89 >6.53 >6.89 >6.53 >6.2 >6.2 >6.22 >6.89 >6.2 >6.2 >6.2 >6.52 >6.22 >6.52 3.55 4.16 4.27 4.53 6.37 0.14 4.44 5.46 5.76 5.98 6.72 5.48 5.43 4.47 <0 6.31 4.72 6.90 5.45 6.72 4.69 4.51 4.58 4.53 3.70 4.29 4.69 4.34 6.39 3.98 4.00 4.04 5.82 5.31 6.08 4.98 5.45 4.61 0.52 6.06 4.00 6.26 3.34 5.65 4.87 4.78 4.89 5.03 4.30 5.65 6.88 6.37 8.72 5.42 5.03 5.50 7.85 7.69 6.36 7.15 6.52 5.47 0.52 8.20 6.16 7.71 3.98 6.07 6.92 5.62 6.89 5.45 6.37 7.19 6.64 6.34 8.20 7.07 6.47 5.87 8.34 8.53 10.07 6.71 5.96 6.05 6.37 8.51 6.86 8.58 6.52 6.33 6.36 5.79 5.85 5.88 48 49 50 49 47 51 52 53 54 55 56 57 58 59 2 60 61 62 63 63 48 59 61 59 a Exp represents the range experimentally calculated by the cutoff edges. PBE stands for DFT calculation using PBE functionals. MEGNet and MFGNN are GNNs trained by Chen et al. released in the GitHub repository. The numbers in bold represent failed predictions in this work. models were even worse with corresponding results of 3.34 and 3.98 eV, respectively. The prediction errors were clearly too large. However, although this crystal was not included in our data set, our ML model could provide a predicted value of 6.52 eV, which is in good agreement with the experimental estimation. An additional proof has been provided for Li6CuB4O10;2 metal characterization can be predicted using the PBE calculation, MEGNet, and MF-GNN models. However, such a prediction is inconsistent with the experimental fact that this material shows insulator characteristics, and the band gap of Li6CuB4O102 is measured to be greater than 6.2 eV. Impressively, the result predicted by our model is 6.37 eV, indicating that our model is accurate enough for DUV-type crystals. Note that we do not emphasize the merits of the various GNN model algorithms themselves but only evaluate the practical application of these trained models. Upon further analysis of Table 1, we infer that the predicted values obtained from MEGNet are generally close to the PBE calculation results, which is not unexpected considering that the best model presented in MEGNet was trained using the band gap data set constituted by the PBE calculated values obtained from the Materials Project.20 To evaluate the prediction models using the compounds listed in Table 1, for which the band gaps were all greater than 6.2 eV, we calculated the identification accuracy of the PBE method calculation to be 13%. However, the MEGNet model failed for all 24 borates. Compared with the PBE calculation and MEGNet model, the prediction of the MF-GNN model was improved to a certain extent (identification accuracy of 42%). In addition to training using low-precision PBE functionals, the MF-GNN includes an additional step by considering higherlevel computational methods and incorporating experimental values for inorganic crystalline materials. This clearly demonstrates that the main bottlenecks in ML are the data quality and quantity constraints. Therefore, in this study, we focused on the quality of the data, particularly for borates, by constructing a complete experimental band gap database using human and NLP searches, with an identification accuracy of 75%. At the algorithm level, the CGCNN model still has potential for improvement, as shown in our previous study11 involving the addition of domain knowledge of material information to the graph network. To assess the degree of deviation of the prediction value, we used a more expensive hybrid functional, Heyd−Scuseria− Ernzerhof-06 (HSE06),65,66 to calculate the band gaps, and this approach is commonly adopted to obtain a more accurate band gap.67 Considering the computational expense, we chose BaBe2BO3F347 with 60 atoms in the unit cell as a template compound (Figure S5). Previous experimental research has indicated that the band gap value is larger than 6.70 eV. On a workstation with a configuration of 192 cores with an Intel Xeon E5-2692 CPU, the band gap calculation requires a total of 279,153 s, demonstrating that this computationally resource-intensive theoretical approach is currently unsuitable for high-throughput computing research. The calculated result based on the HSE06 hybrid functional is 8.18 eV, which agrees well with our predicted value of 8.20 eV. This verifies the reliability of the proposed model. Impressively, less than 1 s is required to obtain this value on our local server. In general, the robustness and low computing costs of our model are adequate for the application of high-throughput virtual screening in practical situations. Experimental Verification. Notably, the 24 selected borates contained elements that were previously identified in the database, as illustrated in Figure 2a. To assess the extrapolative ability of our model, we examined the band gap 4720 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726 Inorganic Chemistry pubs.acs.org/IC of a new borate crystal containing an element (e.g., silver) that was not included in the original database. Based on this idea, we introduced AgNO3 into a borate reaction system and successfully obtained a new borate crystal Ag3B6O10NO3 as the target compound. Single-crystal XRD analysis revealed that the compound crystallized in the orthorhombic space group, Pnma (no. 62). Note that the compound has a three-dimensional structure containing a B6O13 cluster as the building unit. The center of the oxo-boron cluster is occupied by one oxygen atom, which is bonded to three tetrahedral boron atoms. The remaining three boron atoms in the cluster are surrounded by three oxygen atoms to provide triangular coordination. Each B6O13 cluster shares six corner oxygen atoms with the adjacent clusters, forming a three-dimensional open-framework structure. Silver atoms and nitrate groups were encapsulated within their free voids (Figure 4a). By considering the B6O13 cluster Article type network constructed from B6O13 clusters, except for their extra-framework species. UV−vis diffuse reflectance spectra showed that Ag3B6O10NO3 had a band gap of 4.16 eV, whereas K3B6O10Cl has a band gap of 6.89 eV. The band gap difference between the two compounds could arise from their extraframework species. Our ML model can recognize the effect of extra-framework species on their band gaps, and it accurately predicts a value of 3.88 eV for Ag3B6O10NO3 and 6.27 eV for K3B6O10Cl. Applications and Interpretability. Chemical substitution has been widely explored in molecular engineering design for the discovery of many NLO materials. Considerable efforts have been made for the replacement of transition metal ions with main group elements to avoid d−d electronic transitions and increase the band gap of NLO materials. Lin et al. reported the design and synthesis of a selenite-based NLO material Pb2GaF2(SeO3)2Cl with a larger band gap (Eg = 4.32 eV) than its transition metal analogue Pb2TiOF(SeO3)2Cl (Eg = 3.34 eV).70,71 The extrapolation tests revealed that the predicted band gap values obtained from our model were closer to the experimental values than those predicted by the PBE calculations and other GNN models. In this regard, our results could improve the exploration of new borate materials with tunable band gaps using a ML-based band engineering strategy. For example, our model can quickly show that the replacement of Ag+ by Na+ in Ag3B6O10NO3 will cause the band gap to increase by approximately 0.55 eV, which is consistent with the trend calculated by the HSE06 hybrid functional (an increase of 0.94 eV). The dual-site substitution of Ag+ and NO3− with Na+ and Br− in Ag3B6O10NO3 could produce a hypothetical borate crystal Na3B6O10Br with a larger band gap of 5.71 eV, which agrees well with the HSE06 calculated value of 6.06 eV. This satisfactory quantitative prediction further illustrated the feasibility of ML-based metal substitution strategies. The next step is to utilize the resulting ML model to predict more borate structures from a larger database to obtain experiment-based band gaps and further enrich and expand the optical field. The Materials Project (www.materialsproject.org) employs high-throughput computing to uncover the properties of all known inorganic materials and has built a sizable materials database that contains computed structural and electronic data for over 33,000 compounds.20 The Materials Project API allows anyone to have direct access to current, upto-date information from the Materials Project database in a structured manner. Next, we collected 1673 borates in the chemical form MxByOz (M is a different metal) using the Materials Project API. The downloaded data also contained 1673 DFT calculated band gaps (PBE level). The green violin plot in Figure 5a clearly represents the distribution of these PBE calculated values, and the median of the data is 2.90 eV. The purple violin plot in Figure 5a shows 276 experimental band gaps of borates obtained via data mining. The median of the violin plot for the experimental database was 5.09 eV. Following this, we used the resulting ML model on 1673 borates and obtained predicted ML results, as illustrated in the yellow violin plot. Evidently, the distribution of the ML prediction results was closer to the experimental database than the distribution of the PBE calculation results. It is interesting to compare the band gaps predicted by the ML model with those obtained using the PBE calculation. Figure 5b shows that the band gaps calculated by the PBE functional severely underestimate the experimental band gaps Figure 4. (a) View of the framework structure of Ag3B6O10NO3 encapsulated with Ag+ ions and NO3− groups. (b) UV−vis diffusive reflectance spectrum for Ag3B6O10NO3. as a six-connected node, the borate framework can be simplified as a pcu net (Figure S6). Such a framework topology has also been observed in several porous materials such as MOF-5. 68 The experimental band gap of Ag3B6O10NO3 is approximately 4.16 eV according to the absorption spectrum converted from the diffuse reflectance spectrum using the Kubelka−Munk function (Figure 4b). We employed a previously published GNN network to evaluate the band gap of Ag3B6O10NO3, and the MEGNet and MF-GNN models resulted in predicted values of 1.57 and 3.26 eV, respectively. This demonstrates that these GNN models, without special considerations of the borate database, cannot accurately predict newly synthesized borate crystals. Next, we performed first-principles calculations. The DFT calculated values using the PBE and HSE06 functionals were 2.21 and 3.91 eV, respectively. It is clear that the DFT calculated values for the HSE06 functional closely match the experimental values, as discussed in the previous section. Impressively, our prediction of 3.88 eV is better than the PBE functional calculation result, and a difference of only 0.03 eV can be observed compared to the result of the expensive HSE06 functional. Thus, when presented with newly synthesized crystals, our model can also provide good prediction accuracy. Notably, our ML model could clearly distinguish the band gap difference between Ag3B6O10NO3 and its structural analogue K3B6O10Cl.69 The two borates had the same pcu4721 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726 Inorganic Chemistry pubs.acs.org/IC Article Figure 5. (a) Probability densities of data distributions are shown via violin plots. The data range is represented by the stretched black line, which has maximum and lowest values at both ends, while the white point is the median. (b) Comparison of DFT calculation based on PBE functional and experimentally measured values. (c) Comparison of DFT calculation based on PBE functional and our CGCNN model prediction values. The colors of the heatmaps correspond to the number of samples, where the red means high density, and blue denotes low density. for most borates. If the ML predictions are close to the experimental values, then PBE calculated values should be lower than the ML predictions. As expected, most points in Figure 5c are below the red diagonal line, which indicates that most of the PBE calculation results are lower than the ML predictions. To a certain extent, the ML model trained based on the experimental values can correct the previous highthroughput PBE level DFT calculation values to the experiment-based prediction values. This avoids missing key candidates for future screening. Further investigations should be conducted to verify this aspect. Therefore, we obtained experiment-based band gaps for the 1673 borates, which were openly released in the GitHub repository as our initial contribution to the borate field. Depending on the value of interest, chemists may filter the chemical data based on certain values, such as selecting a band gap > 6.2 eV or < 3.2 eV. Another interesting aspect of ML is the exploration of the interpretability of neural networks. After graph convolution and pooling operations, each constructed borate crystal graph finally becomes a 698 dimensional vector in our CGCNN model. However, 698 dimensions are difficult to visually analyze. Here, we used the principal component analysis (PCA) algorithm72 for reduction to 3D visualization. Notably, PCA is an effective multivariate mathematical technique for reduction analysis. After PCA, the original [276 × 698] matrix of 276 borates was reduced to a [276 × 3] matrix. The experimental band gap of each borate is shown in Figure 6 in a color scale. The points changed from red to purple, indicating an increase in the band gap. It is evident that data points that are close in proximity have similar band gaps. For example, the predicted values based on our ML model for β-BaB2O4 (BBO),5 Ba2Mg(B3O6)2 (BMBO),5 Ba2Ca(B3O6)2 (BCBO),5 and Bi2ZnOB2O6 (BZBO)73 are 6.56, 7.39, 6.97, and 3.34 eV, while the corresponding experiment band gap values are 6.57, 6.97, 6.96, and 3.75 eV, respectively. Figure 6 clearly demonstrates that BBO, BMBO, and BCBO are clustered together, whereas BZBO is far away owing to a low band gap of 3.34 eV. Interestingly, the former three materials are wellknown DUV NLO materials,5 whereas the latter is a polar material that undergoes photocatalytic degradation of Rhod4722 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726 Inorganic Chemistry pubs.acs.org/IC Article Figure 6. Visualization of crystal-graph feature space. PCA plot of the crystal-graph level feature from the pooling layer for CGCNN trained on the experimental data set of 276 borates, with each point representing an individual borate crystal. Each point is color coded according to its band gap. amine B.74 This analysis supports the conclusion that the ML model can distinguish between different types of borates and learn the characteristics of different atoms and bond information in the crystal graphs. Web Server. Our group has created a web-based prediction tool (http://www.predborate.com/) that includes a regression model based on the CGCNN algorithm. The online prediction tool enables researchers to submit borate structures (in CIF format) and obtain prediction results for band gaps. Furthermore, if millions of structures are predicted, a structural file can be uploaded in a rar compressed format. The predicted outcomes can be observed on the webpage or saved directly to a local workstation for further screening. Code Availability. All source codes for this study are freely available under the MIT license. The source code and database are available at https://github.com/ruihwang/ bandgapboraterpred. The ChemDataExtractor version 2.0 code is available at http://www.chemdataextractor2.org/ download. of the ML model, a new borate crystal, Ag3B6O10NO3, was synthesized in this study. The prediction error obtained based on a comparison with experiments was only 0.28 eV (3.88 vs 4.16 eV), which is much better than that for the PBE functional and is close to the expensive HSE06 functional calculation result. The resulting ML model was used to obtain the experiment-based band gaps of approximately 2000 borates extracted from the Materials Project database. The newly developed web application may serve as a powerful tool for chemists who wish to estimate the band gaps of borates. More importantly, the entire procedure can be easily applied to various property or material systems. Therefore, future material design research will benefit from the computational techniques described in this paper to predict more properties, such as birefringence, the second-harmonic generation coefficient, or carrier mobility, and accelerate the discovery of advanced functional materials. ■ ASSOCIATED CONTENT sı Supporting Information * ■ The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.inorgchem.3c00233. Performance of the previous GNN model, experimental band gap database for borates, characterizations, and optimized hyperparameters (PDF) CONCLUSIONS In summary, we used a data-driven framework to develop a band gap prediction approach for borates, and its accuracy was found to approach that of experiment-based models on the second scale. A database of inorganic borates including 276 experimental band gaps was built by extraction from scientific literature. We then trained an ML model to predict the experimentally measured band gaps of borates. The ML model achieved an MAE of 0.40 eV and PCC of 0.90 on the test set, guiding practical applications. For a realistic screening problem of DUV borates, our model demonstrated a high extrapolation capacity and low computing cost. To examine the robustness Accession Codes CCDC 2237125 contains the supplementary crystallographic data for this paper. These data can be obtained free of charge via www.ccdc.cam.ac.uk/data_request/cif, or by emailing data_request@ccdc.cam.ac.uk, or by contacting The Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, UK; fax: +44 1223 336033. 4723 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726 Inorganic Chemistry ■ pubs.acs.org/IC Second-Order Nonlinear Optical Materials Based on π-conjugated [BO3]3‑ Groups. Coord. Chem. Rev. 2018, 366, 1−28. (3) Wang, G. J.; Jing, Y.; Ju, J.; Yang, D. F.; Yang, J.; Gao, W. L.; Cong, R. H.; Yang, T. Ga4B2O9: An Efficient Borate Photocatalyst for Overall Water Splitting without Cocatalyst. Inorg. Chem. 2015, 54, 2945−2949. (4) Szczeszak, A.; Grzyb, T.; Barszcz, B.; Nagirnyi, V.; Kotlov, A.; Lis, S. Hydrothermal Synthesis and Structural and Spectroscopic Properties of the New Triclinic Form of GdBO3:Eu3+ Nanocrystals. Inorg. Chem. 2013, 52, 4934−4940. (5) (a) He, R.; Huang, H. W.; Kang, L.; Yao, W. J.; Jiang, X. X.; Lin, Z. S.; Qin, J. G.; Chen, C. T. Bandgaps in the Deep Ultraviolet Borate Crystals: Prediction and Improvement. Appl. Phys. Lett. 2013, 102, 231904. (b) Zhao, W. Z.; Zhang, Y. N.; Lan, Y. Z.; Cheng, J. W.; Yang, G. Y. Ba2B10O16(OH)2·(H3BO3)(H2O): A Possible DeepUltraviolet Nonlinear-Optical Barium Borate. Inorg. Chem. 2022, 61, 4246−4250. (c) Zhao, S. G.; Gong, P. F.; Bai, L.; Xu, X.; Zhang, S. Q.; Sun, Z. H.; Lin, Z. S.; Hong, M. C.; Chen, C. T.; Luo, J. H. Beryllium-free Li4Sr(BO3)2 for Deep-Ultraviolet Nonlinear Optical Applications. Nat. Commun. 2014, 5, 4019. (6) Horiuchi, Y.; Toyao, T.; Saito, M.; Mochizuki, K.; Iwata, M.; Higashimura, H.; Anpo, M.; Matsuoka, M. Visible-Light-Promoted Photocatalytic Hydrogen Production by Using an Amino-Functionalized Ti(Iv) Metal-Organic Framework. J. Phys. Chem. C 2012, 116, 20848−20853. (7) Stanley, J.; Gagliardi, A. Machine Learning Bandgaps of Inorganic Mixed Halide Perovskites. IEEE 18th International Conference on Nanotechnology; IEEE 2018, 1−4. (8) Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T. Machine Learning Bandgaps of Double Perovskites. Sci. Rep. 2016, 6, 19375. (9) Knøsgaard, N. R.; Thygesen, K. S. Representing Individual Electronic States for Machine Learning Gw Band Structures of 2d Materials. Nat. Commun. 2022, 13, 468. (10) Wang, R. H.; Zhong, Y. S.; Bi, L. M.; Yang, M. L.; Xu, D. G. Accelerating Discovery of Metal-Organic Frameworks for Methane Adsorption with Hierarchical Screening and Deep Learning. ACS Appl. Mater. Interfaces 2020, 12, 52797−52807. (11) Wang, R. H.; Zou, Y. R.; Zhang, C. C.; Wang, X.; Yang, M. L.; Xu, D. G. Combining Crystal Graphs and Domain Knowledge in Machine Learning to Predict Metal-Organic Frameworks Performance in Methane Adsorption. Microporous Mesoporous Mater. 2022, 331, 111666. (12) Liang, Y. Z.; Chen, M. W.; Wang, Y. A.; Jia, H. X.; Lu, T. L.; Xie, F. K.; Cai, G. H.; Wang, Z. G.; Meng, S.; Liu, M. A Universal Model for Accurately Predicting the Formation Energy of Inorganic Compounds. Sci. China Mater. 2022, 66, 343−351. (13) Ward, L.; Agrawal, A.; Choudhary, A.; Wolverton, C. A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials. npj Comput. Mater. 2016, 2, 16028. (14) Stanley, J. C.; Mayr, F.; Gagliardi, A. Machine Learning Stability and Bandgaps of Lead-Free Perovskites for Photovoltaics. Adv. Theory Simul. 2020, 3, 1900178. (15) Schmidt, J.; Shi, J. M.; Borlido, P.; Chen, L. M.; Botti, S.; Marques, M. A. L. Predicting the Thermodynamic Stability of Solids Combining Density Functional Theory and Machine Learning. Chem. Mater. 2017, 29, 5090−5103. (16) Omprakash, P.; Manikandan, B.; Sandeep, A.; Shrivastava, R.; Viswesh, P.; Panemangalore, D. B. Graph Representational Learning for Bandgap Prediction in Varied Perovskite Crystals. Comput. Mater. Sci. 2021, 196, 110530. (17) Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 2018, 120, 145301. (18) Curtarolo, S.; Setyawan, W.; Hart, G. L. W.; Jahnatek, M.; Chepulskii, R. V.; Taylor, R. H.; Wang, S. D.; Xue, J. K.; Yang, K. S.; Levy, O.; Mehl, M. J.; Stokes, H. T.; Demchenko, D. O.; Morgan, D. Aflow: An Automatic Framework for High-Throughput Materials Discovery. Comput. Mater. Sci. 2012, 58, 218−226. AUTHOR INFORMATION Corresponding Authors Zhien Lin − MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China; orcid.org/00000002-5897-9114; Email: zhienlin@scu.edu.cn Dingguo Xu − MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China; Research Center for Materials Genome Engineering, Sichuan University, Chengdu, Sichuan 610065, PR China; orcid.org/0000-0002-98348296; Email: dgxu@scu.edu.cn Authors Ruihan Wang − MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China Yeshuang Zhong − Department of Physics, School of Biology and Engineering, Guizhou Medical University, Guiyang, Guizhou 550025, PR China Xuehua Dong − MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China Meng Du − MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China Haolun Yuan − MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China Yurong Zou − MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China Xin Wang − MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China Complete contact information is available at: https://pubs.acs.org/10.1021/acs.inorgchem.3c00233 Author Contributions ∥ R.W. and Y.Z. contributed equally. Notes The authors declare no competing financial interest. ■ ACKNOWLEDGMENTS ■ REFERENCES Article This study was sponsored by the Natural Science Foundation of Sichuan, China (no. 2022NSFSC0029), and supported by the National Natural Science Foundation of China (grant no. 21973064 to D.X. and no. 21971164 to Z.L.) and the Guizhou Provincial Natural Science Foundation (2022-406). Some of the results described in this paper were obtained from the National Supercomputing Center of Guangzhou and the Supercomputing Center of Sichuan University. We also thank Leming Bi for web app development. (1) (a) Becker, P. Borate Materials in Nonlinear Optics. Adv. Mater. 1998, 10, 979−992. (b) Tran, T. T.; Yu, H.; Rondinelli, J. M.; Poeppelmeier, K. R.; Halasyamani, P. S. Deep Ultraviolet Nonlinear Optical Materials. Chem. Mater. 2016, 28, 5238−5258. (2) (a) Mutailipu, M.; Poeppelmeier, K. R.; Pan, S. Borates: A Rich Source for Optical Materials. Chem. Rev. 2021, 121, 1130−1202. (b) Shen, Y. G.; Zhao, S. G.; Luo, J. H. The Role of Cations in 4724 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726 Inorganic Chemistry pubs.acs.org/IC (19) Kirklin, S.; Saal, J. E.; Meredig, B.; Thompson, A.; Doak, J. W.; Aykol, M.; Rühl, S.; Wolverton, C. The Open Quantum Materials Database (OQMD): Assessing the Accuracy of DFT Formation Energies. npj Comput. Mater. 2015, 1, 15010. (20) Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G.; Persson, K. A. Commentary: The Materials Project: A Materials Genome Approach to Accelerating Materials Innovation. APL Mater. 2013, 1, 011002. (21) Crowley, J. M.; Tahir-Kheli, J.; Goddard, W. A. Resolution of the Band Gap Prediction Problem for Materials Design. J. Phys. Chem. Lett. 2016, 7, 1198−1203. (22) Lee, J.; Seko, A.; Shitara, K.; Nakayama, K.; Tanaka, I. Prediction Model of Band Gap for Inorganic Compounds by Combination of Density Functional Theory Calculations and Machine Learning Techniques. Phys. Rev. B 2016, 93, 115104. (23) Chuangtian, C.; Bochang, W.; Aidong, J.; Guiming, Y. A NewType Ultraviolet SHG Crystal�β-BaB2O4. Sci. China, Ser. B 1985, 28, 235−243. (24) Chen, C. T.; Wang, G. L.; Wang, X. Y.; Xu, Z. Y. Deep-UV Nonlinear Optical Crystal KBe2BO3F2-Discovery, Growth, Optical Properties and Applications. Appl. Phys. B: Lasers Opt. 2009, 97, 9− 25. (25) Ye, N.; Zeng, W. R.; Jiang, J.; Wu, B. C.; Chen, C. T.; Feng, B. H.; Zhang, X. L. New Nonlinear Optical Crystal K2Al2B2O7. J. Opt. Soc. Am. B 2000, 17, 764−768. (26) Aryasetiawan, F.; Gunnarsson, O. The Gw Method. Rep. Prog. Phys. 1998, 61, 237−312. (27) Perdew, J. P.; Ernzerhof, M.; Burke, K. Rationale for Mixing Exact Exchange with Density Functional Approximations. J. Chem. Phys. 1996, 105, 9982−9985. (28) Perdew, J. P.; Burke, K.; Ernzerhof, M. Generalized Gradient Approximation Made Simple. Phys. Rev. Lett. 1996, 77, 3865−3868. (29) Zhuo, Y.; Mansouri Tehrani, A. M.; Brgoch, J. Predicting the Band Gaps of Inorganic Solids by Machine Learning. J. Phys. Chem. Lett. 2018, 9, 1668−1673. (30) Chen, C.; Zuo, Y.; Ye, W.; Li, X.; Ong, S. P. Learning Properties of Ordered and Disordered Materials from Multi-Fidelity Data. Nat. Comput. Sci. 2021, 1, 46−53. (31) Kim, E.; Huang, K.; Saunders, A.; McCallum, A.; Ceder, G.; Olivetti, E. Materials Synthesis Insights from Scientific Literature Via Text Extraction and Machine Learning. Chem. Mater. 2017, 29, 9436−9444. (32) Georgescu, A. B.; Ren, P. W.; Toland, A. R.; Zhang, S. T.; Miller, K. D.; Apley, D. W.; Olivetti, E. A.; Wagner, N.; Rondinelli, J. M. Database, Features, and Machine Learning Model to Identify Thermally Driven Metal-Insulator Transition Compounds. Chem. Mater. 2021, 33, 5591−5605. (33) Jensen, Z.; Kwon, S.; Schwalbe-Koda, D.; Paris, C.; GómezBombarelli, R.; Román-Leshkov, Y.; Corma, A.; Moliner, M.; Olivetti, E. A. Discovering Relationships between Osdas and Zeolites through Data Mining and Generative Neural Networks. ACS Cent. Sci. 2021, 7, 858−867. (34) Zhang, Y.; Wang, C.; Soukaseum, M.; Vlachos, D. G.; Fang, H. Unleashing the Power of Knowledge Extraction from Scientific Literature in Catalysis. J. Chem. Inf. Model. 2022, 62, 3316−3330. (35) Hawizy, L.; Jessop, D. M.; Adams, N.; Murray-Rust, P. Chemicaltagger: A Tool for Semantic Text-Mining in Chemistry. J. Cheminf. 2011, 3, 17. (36) Swain, M. C.; Cole, J. M. Chemdataextractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature. J. Chem. Inf. Model. 2016, 56, 1894−1904. (37) Zhu, M.; Cole, J. M. Pdfdataextractor: A Tool for Reading Scientific Text and Interpreting Metadata from the Typeset Literature in the Portable Document Format. J. Chem. Inf. Model. 2022, 62, 1633−1643. (38) Lammey, R. Crossref’s Text and Data Mining Services. Learn. Publ. 2014, 27, 245−250. Article (39) Saif, S.; Tahir, A.; Asim, T.; Chen, Y. S.; Khan, M.; Adil, S. F. Green Synthesis of Zno Hierarchical Microstructures by Cordia Myxa and Their Antibacterial Activity. Saudi J. Biol. Sci. 2019, 26, 1364− 1371. (40) Kresse, G.; Furthmüller, J. Efficient Iterative Schemes for Ab Initio Total-Energy Calculations Using a Plane-Wave Basis Set. Phys. Rev. B: Condens. Matter Mater. Phys. 1996, 54, 11169−11186. (41) Kresse, G.; Joubert, D. From Ultrasoft Pseudopotentials to the Projector Augmented-Wave Method. Phys. Rev. B: Condens. Matter Mater. Phys. 1999, 59, 1758−1775. (42) Yi, W. C.; Tang, G.; Chen, X.; Yang, B. C.; Liu, X. B. Qvasp: A Flexible Toolkit for Vasp Users in Materials Simulations. Comput. Phys. Commun. 2020, 257, 107535. (43) Sheldrick, G. M. A Short History of Shelx. Acta Crystallogr., Sect. A: Found. Adv. 2008, 64, 112−122. (44) Sheldrick, G. M. SHELXTL-97, Program for Crystal Structure Solution; University of Göttingen: Germany, 1997. (45) Zhao, W. W.; Pan, S. L.; Dong, X. Y.; Li, J. J.; Tian, X. L.; Fan, X. Y.; Chen, Z. H.; Zhang, F. F. Synthesis, crystal structure and properties of a new lead fluoride borate, Pb3OBO3F. Mater. Res. Bull. 2012, 47, 947−951. (46) Pan, F.; Shen, G. Q.; Wang, R. J.; Wang, X. Q.; Shen, D. Z. Growth, Characterization and Nonlinear Optical Properties of SrB4O7 Crystals. J. Cryst. Growth 2002, 241, 108−114. (47) Guo, S.; Jiang, X. X.; Liu, L. J.; Xia, M. J.; Fang, Z.; Wang, X. Y.; Lin, Z. S.; Chen, C. T. BaBe2BO3F3: A KBBF-Type Deep-Ultraviolet Nonlinear Optical Material with Reinforced [Be2BO3F2](Infinity) Layers and Short Phase-Matching Wavelength. Chem. Mater. 2016, 28, 8871−8875. (48) Xiao-Shan, W.; Li-Juan, L.; Ming-Jun, X.; Xiao-Yang, W.; Chuang-Tian, C. Two Isostructural Multi-Metal Borates: Syntheses, Crystal Structures and Characterizations of M3LiNa4Be4B10O24F (M = Sr, Cd). Chin. J. Struct. Chem. 2016, 34, 1617−1625. (49) Bai, C. Y.; Han, S. J.; Pan, S. L.; Zhang, B. B.; Yang, Y.; Li, L.; Yang, Z. H. Na3B4O7X (X = Cl, Br): Two New Borate Halides with a 1D Na-X (X = Cl, Br) Chain Formed by the Face-Sharing XNa6 Octahedra. RSC Adv. 2015, 5, 12416−12422. (50) Yang, Y.; Pan, S. L.; Hou, X. L.; Dong, X. Y.; Su, X.; Yang, Z. H.; Zhang, M.; Zhao, W. W.; Chen, Z. H. Li5Rb2B7O14: A New Congruently Melting Compound with Two Kinds of B-O OneDimensional Chains and Short Uv Absorption Edge. CrystEngComm 2012, 14, 6720−6725. (51) Lin, F.; Dong, Y. P.; Peng, J. Y.; Wang, L. P.; Li, W.; Yang, B. A New Acentric Borate of K2Ba[B4O5(OH)4]2·10H2O: Synthesis, Structure and Nonlinear Optical Property. Phase Transitions 2016, 89, 996−1005. (52) Meng, X. H.; Xia, M. J.; Li, R. K. Li3Ba4Sc3(BO3)4(B2O5)2: Featuring the Coexistence of Isolated BO3 and B2O5 Units. New J. Chem. 2019, 43, 11469−11472. (53) Luo, X. C.; Pan, S. L.; Fan, X. Y.; Wang, J. D.; Liu, G. Crystal Growth and Characterization of K2B4O11H8. J. Cryst. Growth 2009, 311, 3517−3521. (54) Wang, S. C.; Ye, N.; Zou, G. H. A New Alkaline Beryllium Borate KBe4B3O9 with Ribbon Alveolate [Be2BO5](Infinity) Layers and the Structural Evolution of ABe4B3O9 (A = K, Rb and Cs). CrystEngComm 2014, 16, 3971−3976. (55) Huang, H. W.; Yao, W. J.; He, R.; Chen, C. T.; Wang, X. Y.; Zhang, Y. H. Synthesis, Crystal Structure and Optical Properties of a New Beryllium Borate, CsBe4(BO3)3. Solid State Sci. 2013, 18, 105− 109. (56) Zhang, Z. Z.; Wang, Y.; Li, H.; Yang, Z. H.; Pan, S. L. BaB8O12F2: A Promising Deep-UV Birefringent Material. Inorg. Chem. Front. 2019, 6, 546−549. (57) Wu, H. P.; Yu, H. W.; Pan, S. L.; Jiao, A. Q.; Han, J.; Wu, K.; Han, S. J.; Li, H. Y. New Type of Complex Alkali and Alkaline Earth Metal Borates with Isolated (B12O24)12‑ Anionic Group. Dalton Trans. 2014, 43, 4886−4891. 4725 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726 Inorganic Chemistry pubs.acs.org/IC (58) Huang, S. Z.; Zhou, C.; Cheng, S. C.; Yu, F. K5B19O31: A DeepUltraviolet Congruent Melting Compound. ChemistrySelect 2019, 4, 10436−10441. (59) Mutailipu, M.; Xie, Z. Q.; Su, X.; Zhang, M.; Wang, Y.; Yang, Z. H.; Janjua, M. R. S. A.; Pan, S. L. Chemical Cosubstitution-Oriented Design of Rare-Earth Borates as Potential Ultraviolet Nonlinear Optical Materials. J. Am. Chem. Soc. 2017, 139, 18397−18405. (60) Yao, W. J.; Jiang, X. X.; Huang, R. J.; Li, W.; Huang, C. J.; Lin, Z. S.; Li, L. F.; Chen, C. T. Area Negative Thermal Expansion in a Beryllium Borate LiBeBO3 with Edge Sharing Tetrahedra. Chem. Commun. 2014, 50, 13499−13501. (61) Shi, Y. T.; Luo, M.; Lin, C. S.; Peng, G.; Ye, N. Two Deep Ultraviolet Hydrated Borate Crystals: Centrosymmetric LiRbB5O8(OH)·H2O and Non-Centrosymmetric K2B5O8(OH)·2H2O. Cryst. Growth Des. 2019, 19, 3052−3059. (62) Huang, C. M.; Zhang, F. F.; Li, H.; Yang, Z. H.; Yu, H. H.; Pan, S. L. BaB2O3F2: A Barium Fluorooxoborate with a Unique2(Infinity) [B2O3F]‑ Layer and Short Cutoff Edge. Chem.�Eur. J. 2019, 25, 6693−6697. (63) Yan, X.; Luo, S. Y.; Lin, Z. S.; Yao, J. Y.; He, R.; Yue, Y. C.; Chen, C. T. ReBe2B5O11 (Re = Y, Gd): Rare-Earth Beryllium Borates as Deep-Ultraviolet Nonlinear-Optical Materials. Inorg. Chem. 2014, 53, 1952−1954. (64) Chen, C.; Ye, W. K.; Zuo, Y. X.; Zheng, C.; Ong, S. P. Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. Chem. Mater. 2019, 31, 3564−3572. (65) Krukau, A. V.; Vydrov, O. A.; Izmaylov, A. F.; Scuseria, G. E. Influence of the Exchange Screening Parameter on the Performance of Screened Hybrid Functionals. J. Chem. Phys. 2006, 125, 224106. (66) Heyd, J.; Scuseria, G. E.; Ernzerhof, M. Hybrid Functionals Based on a Screened Coulomb Potential. J. Chem. Phys. 2003, 118, 8207−8215. (67) Zhang, B. B.; Zhang, X. D.; Yu, J.; Wang, Y.; Wu, K.; Lee, M. H. First-Principles High-Throughput Screening Pipeline for Nonlinear Optical Materials: Application to Borates. Chem. Mater. 2020, 32, 6772−6779. (68) Li, H.; Eddaoudi, M.; O’Keeffe, M.; Yaghi, O. M. Design and Synthesis of an Exceptionally Stable and Highly Porous MetalOrganic Framework. Nature 1999, 402, 276−279. (69) Wu, H. P.; Pan, S. L.; Poeppelmeier, K. R.; Li, H. Y.; Jia, D. Z.; Chen, Z. H.; Fan, X. Y.; Yang, Y.; Rondinelli, J. M.; Luo, H. S. K3B6O10Cl: A New Structure Analogous to Perovskite with a Large Second Harmonic Generation Response and Deep UV Absorption Edge. J. Am. Chem. Soc. 2011, 133, 16317. (70) You, F. G.; Liang, F.; Huang, Q.; Hu, Z. G.; Wu, Y. C.; Lin, Z. S. Pb2GaF2(SeO3)2Cl: Band Engineering Strategy by Aliovalent Substitution for Enlarging Bandgap While Keeping Strong Second Harmonic Generation Response. J. Am. Chem. Soc. 2019, 141, 748− 752. (71) Cao, X. L.; Hu, C. L.; Xu, X.; Kong, F.; Mao, J. G. Pb2TiOF(SeO3)2Cl and Pb2NbO2(SeO3)2Cl: Small Changes in Structure Induced a Very Large SHG Enhancement. Chem. Commun. 2013, 49, 9965−9967. (72) Wold, S.; Esbensen, K.; Geladi, P. Principal Component Analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37−52. (73) Li, F.; Hou, X. L.; Pan, S. L.; Wang, X. Growth, Structure, and Optical Properties of a Congruent Melting Oxyborate, Bi2ZnOB2O6. Chem. Mater. 2009, 21, 2846−2850. (74) Liu, J.; Zhao, W. W.; Wang, B.; Yan, H. Bi2ZnOB2O6: A Polar Material Capable of Photocatalytic Degradation of Rhodamine B. J. Mater. Sci.: Mater. Electron. 2018, 29, 13803−13809. Article Recommended by ACS A Study on Efficient Technique for Generating Vertex-based Topological Characterization of Boric Acid 2D Structure Sahaya Vijay Jeyaraj and Roy Santiago JUNE 09, 2023 ACS OMEGA READ Convolutional Neural Networks to Assist the Assessment of Lattice Parameters from X-ray Powder Diffraction Juan Iván Gómez-Peralta, Patricia Quintana, et al. AUGUST 30, 2023 THE JOURNAL OF PHYSICAL CHEMISTRY A READ Accelerating Materials Discovery through Machine Learning: Predicting Crystallographic Symmetry Groups Yousef A. Alghofaili, Fahhad H. Alharbi, et al. AUGUST 11, 2023 THE JOURNAL OF PHYSICAL CHEMISTRY C READ Revealing Hidden Patterns through Chemical Intuition and Interpretable Machine Learning: A Case Study of Binary Rare-Earth Intermetallics RX Volodymyr Gvozdetskyi, Arthur Mar, et al. JANUARY 30, 2023 CHEMISTRY OF MATERIALS READ Get More Suggestions > 4726 https://doi.org/10.1021/acs.inorgchem.3c00233 Inorg. Chem. 2023, 62, 4716−4726