banner

Blog

Jun 05, 2025

Prognostication of advanced CO2 capture using tunable solvents with an ensemble learning-based decision tree model | Scientific Reports

Scientific Reports volume 15, Article number: 19694 (2025) Cite this article

Metrics details

This study presents a robust method for predicting CO2 solubility in Deep Eutectic Solvents (DESs) using the stochastic gradient boosting (SGB) algorithm. DESs, promising green solvents for CO2 capture, require precise solubility data for practical applications in industrial and environmental settings. The model incorporates key parameters such as temperature, pressure, mole percent of salt and hydrogen bond donor (HBD) compounds, HBD melting points, molecular weights of salts and HBDs, and other critical factors. Using a dataset of 1951 experimental data points spanning temperatures (293.15–343.15 K) and pressures (26.3–12,730 kPa), the SGB model demonstrated excellent predictive accuracy, achieving an R2 of 0.9928 and an AARD% of 2.3107. Variable importance analysis identified pressure as the most influential factor. The model’s applicability, confirmed through William’s plot, encompassed 97.5% of data points within a safety margin, ensuring reliability, versatility, and broad applicability. Moreover, the SGB model outperformed previous methods, including ANN, RF, and thermodynamic models like PR-EoS and COSMO-RS, as validated by statistical metrics. This research highlights the SGB model’s potential as a superior and practical tool for evaluating CO2 solubility in DESs, advancing the field of green solvent development for sustainable and efficient CO2 capture technologies.

The urgent climate challenge of our time is the release of anthropogenic CO2 emissions into the atmosphere, which has led to the development of materials for capturing and sequestering the CO2 produced by burning fossil fuels1. The atmospheric CO2 concentration, as reported by the U.S. National Oceanic and Atmospheric Administration (NOAA)2, has increased from approximately 316–418 ppm between 1959 and 2022, and is projected to reach over 500 ppm by 2050, even if emissions are stabilized. The primary source of anthropogenic CO2 emissions is the burning of fossil fuels, including coal, oil, and natural gas.

To mitigate climate change, reducing CO2 emissions from industrial processes is crucial3, Carbon capture and storage (CCS) technologies, including post-combustion, pre-combustion, and oxy-combustion capture, are vital until cleaner energy sources are fully implemented1. Post-combustion methods, such as absorption, membrane separation, adsorption, and microalgal biofixation, are commonly used4. Among these, amine-based absorption–desorption is the most mature and cost-effective4. However, it has drawbacks, including amine reagent loss, water introduction, chemical degradation leading to corrosive byproducts, and high energy consumption during regeneration. Additionally, its CO2 capture capacity is insufficient5.

In the pursuit of creating safer solvents, scientists have investigated a new category of solvents known as “green” solvents for capturing CO26,7. These solvents are both non-volatile and environmentally friendly. The initial work by Bates et al. led to the development of an Ionic Liquid (IL) containing -NH2 groups, which demonstrated a CO2 absorption capacity of 7.4 wt%6,8. Other studies have also identified Imidazole ILs as having excellent CO2 solubility and selectivity for CO2 capture6,9. These ILs exhibit favorable characteristics such as high CO2 solubility, thermal and electrochemical stability, and low vapor pressure5,6. However, there are some drawbacks to using these solvents, including their high cost and the complexity of their purification process, which generates substantial waste streams6,7. Additionally, they have high viscosity, may be toxic10, and most are derived from fossil resources, making them non-biodegradable6,11. In contrast, the emerging deep eutectic solvents (DESs) are considered a more sophisticated evolution of ILs and show considerable potential as suitable alternatives12.

DESs share many intriguing solvent properties with conventional ILs but offer additional benefits like cost-effectiveness, biodegradability, renewability, and low toxicity13. The preparation of DES is straightforward by mixing a hydrogen bond donor (HBD) and a hydrogen bond acceptor (HBA), resulting in a eutectic mixture with a lower melting point than its precursor components through a hydrogen-bond network14,15. The range of HBDs has expanded to include alcohols, sugars, organic acids, and amides, while HBAs mainly focus on quaternary ammonium salts and quaternary phosphium salts16,17,18,19,20,21,22. DESs have been extensively utilized in various fields14, such as extraction23,24, separation25, chemical reactions26, electroplating27, drug delivery28, membranes29, and lignocellulosic biomass processing27. These applications have shown promising results, with a particular focus on CO2 absorption5,30,31,32,33, which has garnered significant attention. Haghbakhsh et al.34 and Wang et al.35 conducted comprehensive literature reviews that encompass experimental data on CO2 solubility in DESs under various conditions.

Up until now, the majority of research on CO2 absorption using DESs has been limited to experimental measurements of CO2 solubility in only a few DES candidates35. However, given the vast number of potential combinations of HBA and HBD at various ratios, this approach has only scratched the surface of available options36,37,38,39. The traditional experimental trial-and-error method to explore a wide range of DESs is both costly and time-consuming, making it impractical to investigate all possibilities. Therefore, there is a strong need for a reliable model capable of predicting CO2 absorption capacity in DESs35,40,41.

Recent advancements have adapted traditional thermodynamic models (e.g., Non-Random Two-Liquid (NRTL), Universal Quasi-Chemical (UNIQUAC)) and equations of state (e.g., Perturbed Chain Statistical Associating Fluid Theory (PC-SAFT), Soft-SAFT, and Peng-Robinson Equation of State (PR-EOS)) to accurately predict gas solubility in DESs42,43,44,45,46. These models rely on experimentally determined parameters, limiting their applicability to known systems. In contrast, Conductor-like Screening Model (COSMO)-based models (COSMO-RS, COSMO-SAC) are popular for predicting CO2 solubility in IL solvents with reasonable accuracy47,48,49,50,51,52. These models often overestimate or underestimate gas solubilities in DESs53. Moreover, molecular simulations like Molecular Dynamics (MD) and Monte Carlo (MC) are reliable for predicting thermophysical properties and gas solubility in DESs54,55,56. However, these simulations require significant computational resources, making them impractical for analyzing a wide range of gas-in-DES solubilities35. Consequently, the key challenge is adopting versatile techniques to predict phase performance across various absorbents and conditions.

It is worth highlighting that the COSMO-RS model has primarily been utilized independently for screening solvents and estimating gas solubilities, achieving a reasonably high level of accuracy35,57. In general, COSMO-RS calculations require only the molecular structure to predict solubility and various thermodynamic properties. However, recent studies have shown that the COSMO-RS model can sometimes overestimate or underestimate gas solubility in DESs57.

Given the complexity of molecular descriptors, such as Sigma profile features derived from COSMO-RS calculations, and the limitations of linear and multilinear models in accurately capturing thermophysical properties, there has been an increasing shift toward the use of Machine Learning (ML) algorithms35,57. These algorithms are increasingly preferred for constructing advanced non-linear Quantitative Structure–Property Relationship (QSPR) models, especially for predicting physicochemical properties and phase equilibrium behavior35,57.

Considering the key factors discussed, a promising strategy is to leverage ML models within a QSPR framework, utilizing COSMO-RS S_(σ-profile) descriptors for enhanced predictive accuracy57. A thorough analysis of recent literature underscores the growing significance of hybrid models in this domain. These models have shown great promise in delivering accurate assessments of physicochemical properties, especially for DESs57. At the same time, it provides crucial insights into the connection between molecular interactions and the macroscopic properties of these substances57. Recent studies highlight the successful application of machine learning techniques in predicting various properties of DESs using inputs from the COSMO-RS model and other molecular descriptors. These efforts have demonstrated significant potential across a range of properties and applications, reinforcing the value of hybrid modeling approaches in this field.

Focusing on CO₂ solubility in DESs, Lemaoui et al.58 leveraged a dataset of 2,327 experimental data points to develop a predictive model, attaining remarkable accuracy with R2 values of 0.998 ± 0.001 for the training set and 0.986 ± 0.002 for the test set. Similarly, Wang et al.35 employed a random forest (RF) algorithm based on descriptors derived from COSMO-RS to estimate CO₂ solubility across various DESs under different conditions, achieving an Average Absolute Relative Deviation (AARD) of 14.6% using a dataset comprising 1,011 solubility measurements. Furthermore, Mohan et al.57 integrated COSMO-RS with Artificial Neural Networks (ANNs) to enhance solubility predictions, achieving a notably low AARD of 2.72%.

These studies collectively highlight the effectiveness of using ML with molecular parameters derived from COSMO-RS models to predict CO₂ solubility in DESs. While COSMO-RS-based ML integrates quantum chemical calculations to enhance predictions—particularly for unexplored DESs—it necessitates COSMO-RS computations, adding complexity. Therefore, more practical and accessible approaches are needed. In contrast, property-based ML relies solely on measurable thermodynamic and molecular properties, offering a more straightforward and feasible alternative.

The literature review reveals limited studies applying ML methods to CO2 capture using DESs. Tatar et al.59 used Adaptive Neuro Fuzzy Inference System (ANFIS) optimized with Combination of Hybrid and Particle Swarm Optimization Method (CHPSO) and Gene Expression Programming (GEP) to predict CO2 solubility in a eutectic mixture of levulinic acid (or furfuryl alcohol) and choline chloride, based on 144 experimental data points. Dashti et al.60 applied ANN, ANFIS optimized with particle optimization swarm (PSO), Least-Squares Support-Vector Machines (LSSVM) optimized with coupled simulated annealing (CSA), and an empirical model called Multivariate Polynomial Regression (MPR) to predict CO2 solubility in choline chloride mixtures with various HBDs, using 333 data points. Sagar and Upadhyayula61 used ANN to forecast CO2 solubility in mixtures of HBA and lactic acid as HBD, considering structural and thermodynamic properties. Nagulapati et al.62 investigated CO2, CO, CH4, H2, and N2 solubilities in ChCl/Urea DES, using SVM and Long Short-term Memory Auto Encoder–based (LSTM-AE) models trained on 15 data points, covering temperatures from 298.15 K to 372.15 K and pressures from 0.01 to 5 MPa.

Despite advancements in developing accurate models, a literature review reveals that a substantial amount of experimental data on CO₂ solubility in DESs remains underutilized in previous studies. Therefore, conducting a comprehensive literature search to compile a complete database of CO₂ solubility in DESs, along with implementing advanced MLs, is crucial for building a predictive model that accounts for a wider range of conditions.

Building on this foundation, this study develops a robust stochastic gradient boosting (SGB) tree algorithm to predict CO₂ solubility in DESs composed of 22 salts and 24 HBDs. The model spans a temperature range of 293.15 to 343.15 K and pressures from 26.3 to 12,730 kPa, utilizing molecular and thermodynamic properties as input features. To ensure consistency and reliability in data analysis, we specifically focused on binary salt + HBD systems. Data related to natural deep eutectic solvents (NADES) and ternary hydrophobic DESs were deliberately excluded, as their distinct solvent characteristics could introduce additional variability, potentially affecting the model’s comparability and accuracy. This targeted approach enabled the development of a more precise and reliable predictive model tailored to binary salt + HBD DESs. The model’s performance was rigorously evaluated through statistical and graphical analyses and benchmarked against alternative methods, including ANNs, RF, and the COSMO-RS model. Results demonstrated that the SGB algorithm outperformed these approaches, delivering superior predictive accuracy.

This study compiles 1951 data points on the solubility of CO2 (measured as g/kg) in different DESs. These data points cover a wide range of temperatures (ranging from 293.15 to 343.15 K) and pressures (ranging from 26.3 to 12,730 kPa). The information was gathered from various published researches. The DESs involved in the study consist of 22 salts, and 24 HBDs. Table 1 provides detailed information about the collected CO2 solubility, DES compositions, experimental temperatures, pressures, and the corresponding references.

SGB is a modern variation of the conventional Gradient Boosting (GB) technique, introduced by J.H. Friedman77. The main goal behind SGB is to improve the accuracy and speed of GB, with the ultimate objective of enhancing overall performance78,79,80. This improvement is achieved by incorporating randomization into the process, which is inspired by Breiman’s bagging method81. SGB has demonstrated its effectiveness across various fields in numerous studies cited in the literature. These successful applications have been documented in multiple domains, confirming the competence and versatility of this method82,83,84,85,86,87,88,89,90,91,92,93,94.

GB is a powerful ensemble learning technique designed to enhance the predictive performance of weak hypotheses by iteratively minimizing the model’s loss through a gradient descent-like optimization process. The method involves combining a set of weak learners, typically decision trees, in order to mitigate the risk of overfitting. The construction of these trees occurs in a stage-wise manner, where each subsequent tree aims to address the misclassifications made by its predecessors by focusing more on the examples that were incorrectly predicted. The overall output of the model is strengthened by iteratively incorporating the predictions of the updated tree into the existing sequence of trees. This iterative process effectively improves the model’s accuracy and generalization capabilities.

The training methodology employed in SGB can be visualized through the flowchart presented in Fig. 1. This flowchart illustrates a notable departure from the conventional approach, wherein instead of using the entire set of training instances for a tree, only a fraction of these instances is utilized, selected through sampling without replacement. The sampled data is then employed to train a tree, but with a twist—only a randomly sampled subset of the available features is used for making splitting decisions during the training process.

Flowchart of SGB training procedure.

Once a tree is trained, its predictions are generated, and subsequently, the residual errors are computed. These residual errors are then multiplied by a learning rate (denoted as η) and are fed as inputs to the next tree in the ensemble. This sequential process of training and updating the predictions continues until all the trees in the ensemble are fully trained.

To predict the output for a new instance using SGB, a similar procedure is followed as in traditional gradient boosting, involving the collective contribution of all the trees in the ensemble to make the final prediction. This approach of incorporating randomness and selective feature usage during the training process enhances the robustness and generalization capabilities of the stochastic gradient boosting method.

In this investigation, the SGB algorithms were employed, adhering to the guidelines specified in Friedman’s research77,78. Supplementary details concerning the mathematical facets of the SGB model can be accessed in the existing literature77,78,95,96,97.

Besides obtaining a precise and dependable experimental dataset, an essential requirement for developing an accurate model is the careful selection of independent variables as inputs. Previous studies on DESs have shown that the solubility of CO2 in these solvents increases with rising pressure (P) but decreases with increasing temperature (T). Additionally, the type and molar ratio of HBA and HBD significantly influence CO2 solubility under identical conditions35,60. Considering these factors, the input parameters for the SGB model to predict CO2 solubility in DESs include variables such as T, P, mole percent of salt, mole percent of HBD, and the molecular weights of salt and HBD as well as melting point of HBD.

The collected dataset was first divided into two distinct subsets through random selection. One subset, comprising around 85% of the data (1,656 data points), was designated as the training set and was used to develop and fine-tune the model. The remaining 15% (295 data points) was allocated as the test set, reserved solely for evaluating the model’s performance. The training dataset served as the input for the SGB tree model, enabling it to learn the relationship between the predictor variables and the target variable.

Proper tuning of hyperparameters in the SGB algorithm is crucial for optimizing the model’s ability to generalize effectively. One key parameter, the learning rate (η), plays a significant role in determining the final outcome. Research has shown that improving the model’s predictive accuracy can be accomplished by applying a weighting factor, less than 1, to each tree in successive boosting cycles. This factor is commonly known as the “learning rate”89. After a thorough process of experimentation, the most suitable value for η was identified as 0.3. The model’s effectiveness is enhanced with a η of 0.3, as illustrated in Fig. 2, resulting in a decreased AARD% of 2.3107.

Effect of learning rate on performance of the SGB model in terms of MSE value.

Figure 3 illustrates the variation of MSE values for the training and test datasets with respect to the number of trees. Initially, the error rates stabilize quickly, but as the number of trees increases beyond a certain point, the MSE for the test data begins to rise after reaching its minimum. This trend highlights the ideal tree count needed to prevent overfitting, marked by the horizontal green line. In this study, the optimal tree count was identified as 4,991.

Graph of the MSE over the successive boosting steps for the training data and the testing samples for estimation of CO2 solubility in DESs.

The dependability and precision of the SGB model is assessed using a variety of statistical metrics, including the Mean Square Error (MSE), Root Mean Square Error (RMSE), AARD, Absolute Relative Deviation (ARD), and Coefficient of Determination (R2). The following equations are used to compute these metrics:

Here \({y}^{exp}. {y}^{pre}\), N, and \(\overline{y }\) represent the experimental data, predicted results, total number of observations, and mean value, respectively.

The statistical measures for evaluating the performance of the SGB model for all, training as well as testing set can be found in Table 2. The outcomes derived from the analysis indicate that across the entire dataset, the SGB model exhibits the following metrics: an RMSE of 2.82913, an AARD% of 2.31074, and an R2 of 0.99276. It is important to note that these statistical indicators provide valuable insights into the model’s predictive capabilities. Incorporating these statistical criteria provides a comprehensive understanding of the SGB model’s efficacy in predicting the given dataset. Information extracted from Table 2, highlights that the SGB model introduced in this research for CO2 solubility data estimation demonstrates strong efficacy during the training and testing stages. In simpler terms, the results indicate a satisfactory agreement between the values derived using the SGB method and the real data found in the literature.

Additionally, the validation process included other standards proposed by researchers86,98,99,100. These standards are outlined as follows:

The calculated values for these statistical parameters are shown in Table 2. These results meet the criteria outlined by the aforementioned equations, demonstrating that the proposed SGB model is highly reliable and can be effectively used to predict CO2 solubility in DESs.

The majority of these statistics are covered in more depth within the literature101. When considering forecasting, various statistics have also been addressed in previous published works102,103,104.

Regression plots are essential for validating models, and Fig. 4 is particularly noteworthy as it presents regression lines, corresponding equations, and a 45° reference line, covering both training and testing datasets. In regression plots, a model is represented by a regression line defined by the equation \(y = ax + b\). For a perfect fit, the slope \(a\)—the coefficient that multiplies \(x\)—should be close to 1, indicating a direct proportional relationship between the predicted and actual values. Additionally, the intercept \(b\) should be close to 0, signifying minimal deviation when the predicted value is at its lowest. The resulting linear regression equation, for the entire, training and testing dataset are articulated through Eqs. (15) to (17):

Cross plot of experimental and SGB predicted solubility values.

These equations provide a quantitative insight into the relationship between the variables, underscoring the precision of the SGB model’s predictions. By successfully achieving a slope value close to 1 and a negligible intercept, the SGB model convincingly establishes its prowess in delivering remarkably accurate forecasts for the surface tension of binary mixtures. This performance is consistently upheld across both training and testing scenarios, as evident from these statistical values, further affirming the model’s robustness and reliability.

Figure 5a presents a visual representation of the cumulative frequency of data points in relation to ARD% to offer insights into the accuracy of the model predictions. In this graphical representation, a closer alignment of the curve to the vertical axis signifies a higher level of accuracy for the model under consideration. Additionally, Fig. 5b illustrates the distribution of ARD% values across different data points, showing the proportion of samples within each ARD% range. Notably, 72.68% of the predictions produced by the SGB model fall within an ARD% range of 0–1%, and 94.72% have an ARD% under 10%, indicating a high level of precision. In contrast, only 2.25% of the predictions exceed an ARD% of 20%, emphasizing the model’s consistent accuracy across most cases. When compared to a multilayer perceptron (MLP) model using molecular descriptors from the Conductor-like Screening Model for Real Solvents (COSMO-RS)58 on the same dataset, the differences are significant. As shown in Fig. 5b, only 32.39% of the MLP-COSMO-RS model’s predictions fall within the 0–1% ARD% range. This highlights the SGB model’s superior robustness and reliability, demonstrating better predictive performance across a wider range of data points compared to the MLP-COSMO-RS model.

The plots of (a) cumulative frequency versus ARD% of the SGB model and (b) distribution of the ARD% of the SGB and MLP based COSMO-RS outputs from the corresponding experimental values of mole fraction of CO2 in DESs.

Another crucial aspect in the development of a precise predictive model involves assessing its performance in both overestimating and underestimating experimental solubility data across a range of input parameter variations. In this regard, Fig. 6 illustrates trend plots depicting the projected values from the SGB model against actual data points for different pressure and temperature conditions. This figure pertains to four DES systems containing Acetylcholine Chloride: Guaiacol (1:3), Acetylcholine Chloride: 1,2,4-triazole (1:1), Choline Chloride: Triethylene glycol (1:3), and Tetrabutylammonium Chloride: Levulinic acid (1:3). The visual representation reveals that the developed models have the capability to accurately anticipate the impact of varying input parameters on CO2 solubility. This indicates that the model possesses a strong capacity to forecast the behavior of experimental data across the relevant spectrum of input parameters. This aptitude enhances the reliability and utility of the proposed model in predicting solubility trends.

Trend prediction ability of proposed SGB model versus temperature and pressure for different DES systems: (a) Acetylcholine Chloride: Guaiacol (1:3), (b) Acetylcholine Chloride: 1,2,4-triazole (1:1), (c) Choline Chloride : Triethylene glycol (1:3), and (d) Tetrabutylammonium Chloride: Levulinic acid (1:3).

In the SGB algorithm, during the construction of each decision tree, predictor statistics—such as sums of squares for regression (as simple regression trees are utilized)—are calculated for each variable at every potential split. The variable that results in the optimal split at a given node is selected to execute the split. Additionally, the algorithm calculates the average of the predictor statistics across all splits and all trees within the boosting sequence. These averages are then normalized, assigning a value of 1 to the variable with the highest average. The importance of the remaining predictors is quantified relative to this maximum, reflecting their comparative contributions based on the predictor statistic averages.

Figure 7 showcases bar graphs that portray the importance scores associated with each attribute. In this visual representation, the attribute deemed most impactful is assigned a score of 1, while the significance of other attributes is adjusted proportionally. The findings illustrated in Fig. 7 shed light on the SGB model’s pronounced sensitivity to fluctuations in pressure (P) when forecasting CO2 solubility within DESs. This discovery resonates with the outcomes observed by Wan et al.35 and Soleimani & Saeedi Dehaghani105, who employed the Random Forest (RF) and SGB based COSMO-Rs methods to predict CO2 solubility in DES systems. The sensitivity ranking of the attributes is as follows: the mole percent of the HBD component is the second most influential factor, followed by the mole percent of the salt component in third place. The Mw of the HBD ranks fourth, while the melting point of the HBD comes fifth. The molecular weight of the salt’s cation is sixth, followed by T in seventh place, and finally, the molecular weight of the salt’s anion as the eighth most influential factor.

Plot of the importance for each input variable for prediction of mole fraction of CO2 in DESs.

Detection of outliers (anomalies) can play a critical role in the development of mathematical models specifically ML algorithms106,107. Outlier detection involves identifying individual data points or groups of data that deviate from the main body of data within a dataset106,107. One statistically efficient and reliable method for outlier analysis is the Leverage approach106,108. This method involves utilizing the residuals (the differences between model results and experimental data) along with a matrix known as the Hat matrix. The Hat matrix combines the experimental data and the predicted values from the model106,107. Consequently, the use of an appropriate mathematical model is also necessary for performing the algorithm’s calculations.

The Leverage or Hat indices are computed using the Hat matrix (H) with the following definition106,107,109:

Here, X represents a two-dimensional matrix consisting of N data points (rows) and k model parameters (columns), while t signifies the transpose matrix. Notably, the hat values of data are represented by the diagonal components of the H matrix, which are obtained using Eq. (18).

The Williams plot is constructed to visually detect potential outliers or anomalous data points by utilizing the H values derived from Eq. (18). This graphical representation maps the relationship between the Hat indices and Standardized Residuals (SR), which quantify the deviations between the model’s predictions and the observed data values106,107,109. As a general guideline, a warning Leverage (H*) is typically set at a value equal to 3(k + 1)/N, where N represents the number of data points and k is the number of model input parameters106,107,109. A Leverage of 3 is commonly treated as a threshold for accepting points within a range of ± 3 standard deviations from the mean106,107,109.

As shown in Fig. 8, the presence of the majority of data points, 97.5% of whole dataset, within the ranges 0 ≤ H ≤ H* and -3 ≤ SR ≤ 3 indicates that the model and its predictions fall within the applicable domain, resulting in a statistically valid model. "Good High Leverage" points are situated within the range of H* ≤ H and -3 ≤ SR ≤ 3. These points are those that lie outside the domain of applicability of the employed model106,107,109. In essence, the model fails to reliably capture or predict the corresponding data patterns. It is crucial to emphasize that, when encountering "Good High Leverage" points, exploring alternative models grounded in diverse theoretical frameworks is recommended to avoid dependence on skewed model outputs107,110. Observations with SR outside the range of -3 ≤ SR ≤ 3, irrespective of their relationship to the H* threshold, are classified as outliers or "Bad High Leverage" points. Such inaccurate predictions often stem from concerns about the integrity or reliability of the data106,109,110.

The Williams plot of SGB model for predicting CO2 solubility in DESs.

It is important to note that no outliers were removed during dataset preparation to retain the full diversity of experimental data for model training. The Williams plot analysis was conducted post-modeling to evaluate the applicability domain and identify potential outliers based on model residuals. This ensures that our model captures the most comprehensive data distribution before determining whether any points lie outside the statistical threshold.

To evaluate the accuracy of the proposed SGB models, their performance was compared to that of other CO2 solubility models in DESs reported in the literature, as shown in Table 3. Various methodologies have been used for predicting CO2 solubility in DESs, including Equation of State (EoS) approaches such as PC-SAFT46, PR-EoS45,111, and Cubic-Plus-Association (CPA)111 models, as well as COSMO-RS35,57,112 and ML techniques like RF35 and ANN57. Literature review reveals that COSMO-RS-predicted CO2 solubilities show AARD% ranging from 10.8 to 78.2%. Remarkably, Liu et al.112 developed an MLR model grounded in COSMO-RS theory that achieved an impressive AARD% of 10.8%, surpassing the accuracy of similar models based on COSMO-RS. In contrast, Mohan et al. presented a more comprehensive COSMO-RS-based MLR57 model, leveraging a substantial dataset of 1973 entries. Despite yielding a slightly higher AARD% of 12, this model is distinguished by its reliability and is deemed satisfactory for predicting CO2 solubility in DESs. In comparison, EoS methods display higher accuracy, with AARD% ranging between 0.8 and 7.02%. It is important to note, though, that these models often rely on a smaller dataset, typically between 57 and 353 data points in their studies.

Mohan et al.57 used a dataset for their ANN-COSMO-RS model includes a wide variety of DESs, such as Type V natural deep eutectic solvents (NADES) made from natural biomolecules. This dataset uses comprehensive molecular descriptors, along with temperature and pressure, for a detailed CO2 solubility analysis. In contrast, herein SGB model dataset is simpler, focusing on common DESs formed by salts and HBDs, excluding NADES. It uses basic variables like Mw, Mp, and molar ratios, alongside temperature and pressure. Despite being less complex, the model proposed herein, i.e. SGB model achieves lower AARD%, indicating higher predictive accuracy and efficiency. This simplicity makes SGB model more practical and easier to apply, while Mohan et al.’s more detailed approach57 supports a broader range of DES types. Overall SGB model strikes an effective balance between simplicity and performance, making it suitable for real-world CO2 capture applications within diverse DESs across a wide range of temperature and pressure conditions.

In this study, a precise technique has been proposed for forecasting the solubility of CO2 in DES. A comprehensive set of 1951 data instances has been compiled, encompassing a wide array of DES varieties. This compilation integrates 22 varied salts and 24 distinct HBDs, gathered under different temperature and pressure conditions. This dataset was effectively employed to validate and optimize a proficient variant of the gradient boosting algorithm known as the SGB model. The outcomes showcased in this study highlight several key achievements of the present investigation:

The proposed SGB model exhibits exceptional predictive capabilities, achieving a high R2 value of 0.9928 and demonstrating a low AARD of merely 2.3107%.

When contrasted with recently introduced machine learning models like RF, ANN, and established approaches including traditional thermodynamic models and EoS techniques documented in existing literature, the SGB model proposed in this study displays superior precision in forecasting CO2 solubility within DESs. This makes it a valuable computational tool for screening and selecting DESs with optimal CO2 capture performance before experimental validation. However, for direct industrial implementation, further integration with process simulation tools and validation under real-world operational conditions (e.g., mass transfer limitations, solvent stability, and large-scale applicability) would be necessary.

Additionally, through the utilization of a bar graph depicting predictor importance, it becomes evident that pressure stands out as the primary variable exerting significant influence on predicting the dependent variable of interest.

To identify outliers and evaluate the scope of applicability for the introduced SGB model in this research, the Leverage mathematical algorithm was employed. The investigation unveiled that a substantial 97.5% of the entire dataset resides within the valid applicability domain, affirming the statistical robustness of the model. However, a small subset of datasets was flagged as questionable due to their deviation from the expected criteria.

Leveraging the extensive and comprehensive dataset at hand, a robust methodology was introduced to forecast CO2 solubility within DESs. However, a limitation exists: while the SGB technique demonstrates broad applicability, its predictive capacity remains confined to DES mixtures resembling those utilized in constructing the model. Applying the proposed model to DES systems that significantly deviate from the studied mixtures is not recommended, although it might offer a rudimentary estimate of CO2 solubility within such DES mixtures.

All data generated or analyzed during this study are accessible from the corresponding author upon reasonable request.

Average absolute relative deviation

Adaptive neuro fuzzy inference system

Artificial neural network

Absolute relative deviation

Carbon capture and storage

Combination of hybrid and particle swarm optimization

Conductor-like screening model for real solvents

Conductor-like screening model segment activity coefficient

Cubic-plus-association

Coupled simulated annealing

Deep eutectic solvents

Decision tree

Gradient boosting

Gene expression programming

Genetic programming

Hydrogen bond acceptor

Hydrogen bond donor

Ionic liquid

Least squares support vector machine

Long short-term memory auto encoder

Monte Carlo

Molecular dynamics

Machine learning

Multi-layer perceptron

Multivariate polynomial regression

Mean square error

Natural deep eutectic solvents

National oceanic and atmospheric administration

Non-random two-liquid

Perturbed chain statistical associating fluid theory

Peng-Robinson equation of state

Particle optimization swarm

Quantitative structure–property relationship

Random forest

Root mean square error

Stochastic gradient boosting

Soft statistical associating fluid theory

Standardized residuals

Support vector machine

Universal quasi-chemical

Hat value

Warning leverage

Number of input parameters

Coefficient of determination

Temperature

Pressure

Molecular weight

Transpose multiplier

Mole fraction of CO2

Learning rate

Total number of data points

Experimental output at the sampling point \(i\)

Output of the model

Yu, J. et al. CO2 capture and separations using MOFs: Computational and experimental studies. Chem. Rev. 117, 9674–9754 (2017).

Article CAS PubMed Google Scholar

Tans, P. & Keeling, R. Trends in atmospheric carbon dioxide, NOAA/ESRL. URL: http://www.esrl.noaa.gov/gmd/ccgg/trends (2022).

Rochelle, G. T. Amine scrubbing for CO2 capture. Science 325, 1652–1654 (2009).

Article ADS CAS PubMed Google Scholar

Yang, X. et al. Computational modeling and simulation of CO2 capture by aqueous amines. Chem. Rev. 117, 9524–9593 (2017).

Article CAS PubMed Google Scholar

García, G., Aparicio, S., Ullah, R. & Atilhan, M. Deep eutectic solvents: Physicochemical properties and gas separation applications. Energy Fuels 29, 2616–2644 (2015).

Article Google Scholar

Imteyaz, S., Suresh, C. M., Kausar, T. & Ingole, P. P. Carbon dioxide capture and its electrochemical reduction study in deep eutectic solvent (DES) via experimental and molecular simulation approaches. J. CO2 Utiliz. 68, 102349 (2023).

Article CAS Google Scholar

Zhu, S. et al. A mini-review on greenness of ionic liquids. Chem. Biochem. Eng. Q. 23, 207–211 (2009).

CAS Google Scholar

Bates, E. D., Mayton, R. D., Ntai, I. & Davis, J. H. CO2 capture by a task-specific ionic liquid. J. Am. Chem. Soc. 124, 926–927 (2002).

Article CAS PubMed Google Scholar

Shiflett, M. B., Drew, D. W., Cantini, R. A. & Yokozeki, A. Carbon dioxide capture using ionic liquid 1-butyl-3-methylimidazolium acetate. Energy Fuels 24, 5781–5789 (2010).

Article CAS Google Scholar

Biczak, R., Pawłowska, B., Bałczewski, P. & Rychter, P. The role of the anion in the toxicity of imidazolium ionic liquids. J. Hazard. Mater. 274, 181–190 (2014).

Article CAS PubMed Google Scholar

Zubeir, L. F., Lacroix, M. H. & Kroon, M. C. Low transition temperature mixtures as innovative and sustainable CO2 capture solvents. J. Phys. Chem. B 118, 14429–14441 (2014).

Article CAS PubMed Google Scholar

Francisco, M., van den Bruinhorst, A. & Kroon, M. C. Low-transition-temperature mixtures (LTTMs): A new generation of designer solvents. Angew. Chem. Int. Ed. 52, 3074–3085 (2013).

Article CAS Google Scholar

Abbott, A. P., Boothby, D., Capper, G., Davies, D. L. & Rasheed, R. K. Deep eutectic solvents formed between choline chloride and carboxylic acids: Versatile alternatives to ionic liquids. J. Am. Chem. Soc. 126, 9142–9147 (2004).

Article CAS PubMed Google Scholar

Zhang, Q., Vigier, K. D. O., Royer, S. & Jérôme, F. Deep eutectic solvents: Syntheses, properties and applications. Chem. Soc. Rev. 41, 7108–7146 (2012).

Article CAS PubMed Google Scholar

Carriazo, D., Serrano, M. C., Gutiérrez, M. C., Ferrer, M. L. & del Monte, F. Deep-eutectic solvents playing multiple roles in the synthesis of polymers and related materials. Chem. Soc. Rev. 41, 4996–5014 (2012).

Article CAS PubMed Google Scholar

Dai, Y., Van Spronsen, J., Witkamp, G.-J., Verpoorte, R. & Choi, Y. H. Ionic liquids and deep eutectic solvents in natural products research: mixtures of solids as extraction solvents. J. Nat. Prod. 76, 2162–2173 (2013).

Article CAS PubMed Google Scholar

Francisco, M., Van Den Bruinhorst, A. & Kroon, M. C. New natural and renewable low transition temperature mixtures (LTTMs): Screening as solvents for lignocellulosic biomass processing. Green Chem. 14, 2153–2157 (2012).

Article CAS Google Scholar

Hayyan, A. et al. Glucose-based deep eutectic solvents: Physical properties. J. Mol. Liq. 178, 137–141 (2013).

Article CAS Google Scholar

Kareem, M. A., Mjalli, F. S., Hashim, M. A. & AlNashef, I. M. Phosphonium-based ionic liquids analogues and their physical properties. J. Chem. Eng. Data 55, 4632–4637 (2010).

Article CAS Google Scholar

Liu, Y.-T., Chen, Y.-A. & Xing, Y.-J. Synthesis and characterization of novel ternary deep eutectic solvents. Chin. Chem. Lett. 25, 104–106 (2014).

Article CAS Google Scholar

Maugeri, Z. & de María, P. D. Novel choline-chloride-based deep-eutectic-solvents with renewable hydrogen bond donors: Levulinic acid and sugar-based polyols. RSC Adv. 2, 421–425 (2012).

Article ADS CAS Google Scholar

Siongco, K. R., Leron, R. B. & Li, M.-H. Densities, refractive indices, and viscosities of N, N-diethylethanol ammonium chloride–glycerol or–ethylene glycol deep eutectic solvents and their aqueous solutions. J. Chem. Thermodyn. 65, 65–72 (2013).

Article ADS CAS Google Scholar

Florindo, C., Branco, L. & Marrucho, I. Development of hydrophobic deep eutectic solvents for extraction of pesticides from aqueous environments. Fluid Phase Equilib. 448, 135–142 (2017).

Article CAS Google Scholar

Bergua, F., Castro, M., Muñoz-Embid, J., Lafuente, C. & Artal, M. L-menthol-based eutectic solvents: Characterization and application in the removal of drugs from water. J. Mol. Liq. 352, 118754 (2022).

Article CAS Google Scholar

Van Osch, D. J., Dietz, C. H., Warrag, S. E. & Kroon, M. C. The curious case of hydrophobic deep eutectic solvents: A story on the discovery, design, and applications. ACS Sustain. Chem. Eng. 8, 10591–10612 (2020).

Google Scholar

Khandelwal, S., Tailor, Y. K. & Kumar, M. Deep eutectic solvents (DESs) as eco-friendly and sustainable solvent/catalyst systems in organic transformations. J. Mol. Liq. 215, 345–386 (2016).

Article CAS Google Scholar

Chang, X. X. et al. A review on the properties and applications of chitosan, cellulose and deep eutectic solvent in green chemistry. J. Ind. Eng. Chem. 104, 362–380 (2021).

Article Google Scholar

Emami, S. & Shayanfar, A. Deep eutectic solvents for pharmaceutical formulation and drug delivery applications. Pharm. Dev. Technol. 25, 779–796 (2020).

Article CAS PubMed Google Scholar

Dietz, C. H., Kroon, M. C., Di Stefano, M., van Sint Annaland, M. & Gallucci, F. Selective separation of furfural and hydroxymethylfurfural from an aqueous solution using a supported hydrophobic deep eutectic solvent liquid membrane. Faraday Discuss. 206, 77–92 (2018).

Article ADS CAS Google Scholar

Liu, Y. et al. Ionic liquids/deep eutectic solvents for CO2 capture: Reviewing and evaluating. Green Energy Environ. 6, 314–328 (2021).

Article ADS CAS Google Scholar

Song, Z. et al. Systematic screening of deep eutectic solvents as sustainable separation media exemplified by the CO2 capture process. ACS Sustain. Chem. Eng. 8, 8741–8751 (2020).

Article CAS Google Scholar

Wang, J. et al. Carbon dioxide solubility in phosphonium-based deep eutectic solvents: An experimental and molecular dynamics study. Ind. Eng. Chem. Res. 58, 17514–17523 (2019).

Article CAS Google Scholar

Zhang, Y., Ji, X. & Lu, X. Choline-based deep eutectic solvents for CO2 separation: Review and thermodynamic analysis. Renew. Sustain. Energy Rev. 97, 436–455 (2018).

Article CAS Google Scholar

Haghbakhsh, R., Keshtkar, M., Shariati, A. & Raeissi, S. Experimental investigation of carbon dioxide solubility in the deep eutectic solvent (1 ChCl+ 3 triethylene glycol) and modeling by the CPA EoS. J. Mol. Liq. 330, 115647 (2021).

Article CAS Google Scholar

Wang, J. et al. Prediction of CO2 solubility in deep eutectic solvents using random forest model based on COSMO-RS-derived descriptors. Green Chem. Eng. 2, 431–440 (2021).

Article Google Scholar

Li, X., Hou, M., Han, B., Wang, X. & Zou, L. Solubility of CO2 in a choline chloride+ urea eutectic mixture. J. Chem. Eng. Data 53, 548–550 (2008).

Article CAS Google Scholar

Chen, Y. et al. Solubilities of carbon dioxide in eutectic mixtures of choline chloride and dihydric alcohols. J. Chem. Eng. Data 59, 1247–1253 (2014).

Article CAS Google Scholar

Leron, R. B., Caparanga, A. & Li, M.-H. Carbon dioxide solubility in a deep eutectic solvent based on choline chloride and urea at T= 303.15–343.15 K and moderate pressures. J. Taiwan Instit. Chem. Eng. 44, 879–885 (2013).

Article CAS Google Scholar

Leron, R. B. & Li, M.-H. Solubility of carbon dioxide in a eutectic mixture of choline chloride and glycerol at moderate pressures. J. Chem. Thermodyn. 57, 131–136 (2013).

Article ADS CAS Google Scholar

Wang, J. et al. Computer-aided design of ionic liquids as absorbent for gas separation exemplified by CO2 capture cases. ACS Sustain. Chem. Eng. 6, 12025–12035 (2018).

Article CAS Google Scholar

Jiang, C. et al. COSMO-RS prediction and experimental verification of 1, 5-pentanediamine extraction from aqueous solution by ionic liquids. Green Energy Environ. 6, 422–431 (2021).

Article CAS Google Scholar

Crespo, E. A. et al. A methodology to parameterize SAFT-type equations of state for solid precursors of deep eutectic solvents: The example of cholinium chloride. Phys. Chem. Chem. Phys. 21, 15046–15061 (2019).

Article PubMed Google Scholar

Haghbakhsh, R. & Raeissi, S. Modeling vapor-liquid equilibria of mixtures of SO2 and deep eutectic solvents using the CPA-NRTL and CPA-UNIQUAC models. J. Mol. Liq. 250, 259–268 (2018).

Article CAS Google Scholar

Haider, M. B. & Kumar, R. Solubility of CO2 and CH4 in sterically hindered amine-based deep eutectic solvents. Sep. Purif. Technol. 248, 117055 (2020).

Article CAS Google Scholar

Mirza, N. R. et al. Experiments and thermodynamic modeling of the solubility of carbon dioxide in three different deep eutectic solvents (DESs). J. Chem. Eng. Data 60, 3246–3252 (2015).

Article CAS Google Scholar

Zubeir, L. F., Held, C., Sadowski, G. & Kroon, M. C. PC-SAFT modeling of CO2 solubilities in deep eutectic solvents. J. Phys. Chem. B 120, 2300–2310 (2016).

Article CAS PubMed Google Scholar

Han, J., Dai, C., Yu, G. & Lei, Z. Parameterization of COSMO-RS model for ionic liquids. Green Energy Environ. 3, 247–265 (2018).

Article Google Scholar

Peng, D., Zhang, J., Cheng, H., Chen, L. & Qi, Z. Computer-aided ionic liquid design for separation processes based on group contribution method and COSMO-SAC model. Chem. Eng. Sci. 159, 58–68 (2017).

Article CAS Google Scholar

Yang, J. et al. A brief review of the prediction of liquid–liquid equilibrium of ternary systems containing ionic liquids by the COSMO-SAC model. J. Solut. Chem. 48, 1547–1563 (2019).

Article CAS Google Scholar

Zhang, J. et al. COSMO-descriptor based computer-aided ionic liquid design for separation processes: Part II: Task-specific design for extraction processes. Chem. Eng. Sci. 162, 364–374 (2017).

Article CAS Google Scholar

Zhang, Y., Ji, X., Xie, Y. & Lu, X. Screening of conventional ionic liquids for carbon dioxide capture and separation. Appl. Energy 162, 1160–1170 (2016).

Article ADS CAS Google Scholar

Zhao, Y., Gani, R., Afzal, R. M., Zhang, X. & Zhang, S. Ionic liquids for absorption and separation of gases: An extensive database and a systematic screening method. AIChE J. 63, 1353–1367 (2017).

Article ADS CAS Google Scholar

Kamgar, A., Mohsenpour, S. & Esmaeilzadeh, F. Solubility prediction of CO2, CH4, H2, CO and N2 in Choline Chloride/Urea as a eutectic solvent using NRTL and COSMO-RS models. J. Mol. Liq. 247, 70–74 (2017).

Article CAS Google Scholar

Soleimani, R. & Saeedi Dehaghani, A. H. A theoretical probe into the separation of CO2/CH4/N2 mixtures with polysulfone/polydimethylsiloxane—Nano zinc oxide MMM. Sci. Rep. 13, 9543 (2023).

Article ADS CAS PubMed PubMed Central Google Scholar

Hens, R. & Vlugt, T. J. Molecular simulation of vapor–liquid equilibria using the Wolf method for electrostatic interactions. J. Chem. Eng. Data 63, 1096–1102 (2017).

Article PubMed PubMed Central Google Scholar

Salehi, H. S., Hens, R., Moultos, O. A. & Vlugt, T. J. Computation of gas solubilities in choline chloride urea and choline chloride ethylene glycol deep eutectic solvents using Monte Carlo simulations. J. Mol. Liq. 316, 113729 (2020).

Article CAS Google Scholar

Mohan, M. et al. Accurate prediction of carbon dioxide capture by deep eutectic solvents using quantum chemistry and a neural network. Green Chem. 25, 3475–3492 (2023).

Article CAS Google Scholar

Lemaoui, T. et al. Predicting the CO2 capture capability of deep eutectic solvents and screening over 1000 of their combinations using machine learning. ACS Sustain. Chem. Eng. 11, 9564–9580 (2023).

Article CAS Google Scholar

Tatar, A., Barati-Harooni, A., Najafi-Marghmaleki, A. & Bahadori, A. Accurate prediction of CO2 solubility in eutectic mixture of levulinic acid (or furfuryl alcohol) and choline chloride. Int. J. Greenhouse Gas Control 58, 212–222 (2017).

Article CAS Google Scholar

Dashti, A., Raji, M., Amani, P., Baghban, A. & Mohammadi, A. H. Insight into the estimation of equilibrium CO2 absorption by Deep Eutectic Solvents using computational approaches. Sep. Sci. Technol. 56, 2351–2368 (2021).

Article CAS Google Scholar

Sagar, A. & Upadhyayula, S. Implementation of Artificial Neural Networks in the assessment of CO2 solubility in deep eutectic and ionic liquid solvents–Performance and cost comparison. Sustain. Chem. Climate Action 1, 100007 (2022).

Article Google Scholar

Nagulapati, V. M. et al. Hybrid machine learning-based model for solubilities prediction of various gases in deep eutectic solvent for rigorous process design of hydrogen purification. Sep. Purif. Technol. 298, 121651 (2022).

Article CAS Google Scholar

Liu, X., Gao, B., Jiang, Y., Ai, N. & Deng, D. Solubilities and thermodynamic properties of carbon dioxide in guaiacol-based deep eutectic solvents. J. Chem. Eng. Data 62, 1448–1455 (2017).

Article CAS Google Scholar

Li, X., Liu, X. & Deng, D. Solubilities and thermodynamic properties of CO2 in four azole-based deep eutectic solvents. J. Chem. Eng. Data 63, 2091–2096 (2018).

Article CAS Google Scholar

Deng, D., Jiang, Y., Liu, X., Zhang, Z. & Ai, N. Investigation of solubilities of carbon dioxide in five levulinic acid-based deep eutectic solvents and their thermodynamic properties. J. Chem. Thermodyn. 103, 212–217 (2016).

Article ADS CAS Google Scholar

Ghaedi, H. et al. CO2 capture with the help of Phosphonium-based deep eutectic solvents. J. Mol. Liq. 243, 564–571 (2017).

Article CAS Google Scholar

Sarmad, S., Xie, Y., Mikkola, J.-P. & Ji, X. Screening of deep eutectic solvents (DESs) as green CO2 sorbents: From solubility to viscosity. New J. Chem. 41, 290–301 (2017).

Article CAS Google Scholar

Haghbakhsh, R., Keshtkar, M., Shariati, A. & Raeissi, S. A comprehensive experimental and modeling study on CO2 solubilities in the deep eutectic solvent based on choline chloride and butane-1, 2-diol. Fluid Phase Equilib. 561, 113535 (2022).

Article CAS Google Scholar

Li, G., Deng, D., Chen, Y., Shan, H. & Ai, N. Solubilities and thermodynamic properties of CO2 in choline-chloride based deep eutectic solvents. J. Chem. Thermodyn. 75, 58–62 (2014).

Article ADS CAS Google Scholar

Leron, R. B. & Li, M.-H. Solubility of carbon dioxide in a choline chloride—Ethylene glycol based deep eutectic solvent. Thermochim. Acta 551, 14–19 (2013).

Article ADS CAS Google Scholar

Lu, M. et al. Solubilities of carbon dioxide in the eutectic mixture of levulinic acid (or furfuryl alcohol) and choline chloride. J. Chem. Thermodyn. 88, 72–77 (2015).

Article ADS CAS Google Scholar

Ali, E., Hadj-Kali, M. K., Mulyono, S. & Alnashef, I. Analysis of operating conditions for CO2 capturing process using deep eutectic solvents. Int. J. Greenhouse Gas Control 47, 342–350 (2016).

Article CAS Google Scholar

Luo, F. et al. Comprehensive evaluation of a deep eutectic solvent based CO2 capture process through experiment and simulation. ACS Sustain. Chem. Eng. 9, 10250–10265 (2021).

Article CAS Google Scholar

Haider, M. B., Jha, D., Marriyappan Sivagnanam, B. & Kumar, R. Thermodynamic and kinetic studies of CO2 capture by glycol and amine-based deep eutectic solvents. J. Chem. Eng. Data 63, 2671–2680 (2018).

Article CAS Google Scholar

Zubeir, L. F., Van Osch, D. J., Rocha, M. A., Banat, F. & Kroon, M. C. Carbon dioxide solubilities in decanoic acid-based hydrophobic deep eutectic solvents. J. Chem. Eng. Data 63, 913–919 (2018).

Article CAS PubMed PubMed Central Google Scholar

Ji, Y., Hou, Y., Ren, S., Yao, C. & Wu, W. Phase equilibria of high pressure CO2 and deep eutectic solvents formed by quaternary ammonium salts and phenol. Fluid Phase Equilib. 429, 14–20 (2016).

Article CAS Google Scholar

Friedman, J. H. Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002).

Article MathSciNet MATH Google Scholar

Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Statist. 29, 1189–1232 (2001).

Article MathSciNet MATH Google Scholar

Kriegler, B. & Berk, R. Small area estimation of the homeless in Los Angeles: An application of cost-sensitive stochastic gradient boosting. Ann. Appl. Statist. 4, 1234–1255 (2010).

Article MathSciNet MATH Google Scholar

Kuhn, M. & Johnson, K. Applied predictive modeling Vol. 810 (Springer, 2013).

Book MATH Google Scholar

Breiman, L. Arcing the edge. (Technical Report 486, Statistics Department, University of California at Berkeley, 1997).

Brillante, L. et al. Investigating the use of gradient boosting machine, random forest and their ensemble to predict skin flavonoid content from berry physical–mechanical characteristics in wine grapes. Comput. Electron. Agric. 117, 186–193 (2015).

Article Google Scholar

Godinho, S., Guiomar, N. & Gil, A. Using a stochastic gradient boosting algorithm to analyse the effectiveness of Landsat 8 data for montado land cover mapping: Application in southern Portugal. Int. J. Appl. Earth Obs. Geoinf. 49, 151–162 (2016).

ADS Google Scholar

Zhou, J., Li, X. & Mitri, H. S. Comparative performance of six supervised learning methods for the development of models of hard rock pillar stability prediction. Nat. Hazards 79, 291–316 (2015).

Article Google Scholar

Abooali, D., Soleimani, R. & Rezaei-Yazdi, A. Modeling CO2 absorption in aqueous solutions of DEA, MDEA, and DEA+ MDEA based on intelligent methods. Sep. Sci. Technol. 55, 697–707 (2020).

Article CAS Google Scholar

Dehaghani, A. H. S. & Soleimani, R. Estimation of interfacial tension for geological CO2 storage. Chem. Eng. Technol. 42, 680–689 (2019).

Article CAS Google Scholar

Hashemkhani, M. et al. Prediction of the binary surface tension of mixtures containing ionic liquids using Support Vector Machine algorithms. J. Mol. Liq. 211, 534–552 (2015).

Article CAS Google Scholar

Soleimani, R., Abooali, D. & Shoushtari, N. A. Characterizing CO2 capture with aqueous solutions of LysK and the mixture of MAPA+ DEEA using soft computing methods. Energy 164, 664–675 (2018).

Article CAS Google Scholar

Soleimani, R., Dehaghani, A. H. S. & Bahadori, A. A new decision tree based algorithm for prediction of hydrogen sulfide solubility in various ionic liquids. J. Mol. Liq. 242, 701–713 (2017).

Article CAS Google Scholar

Soleimani, R. et al. Evolving an accurate decision tree-based model for predicting carbon dioxide solubility in polymers. Chem. Eng. Technol. 43, 514–522 (2020).

Article CAS Google Scholar

Saeedi Dehaghani, A. H. & Soleimani, R. Prediction of CO2-oil minimum miscibility pressure using soft computing methods. Chem. Eng. Technol. 43, 1361–1371 (2020).

Article CAS Google Scholar

Abooali, D., Soleimani, R. & Gholamreza-Ravi, S. Characterization of physico-chemical properties of biodiesel components using smart data mining approaches. Fuel 266, 117075 (2020).

Article CAS Google Scholar

Subasi, A., El-Amin, M. F., Darwich, T. & Dossary, M. Permeability prediction of petroleum reservoirs using stochastic gradient boosting regression. J. Ambient Intell. Human. Comput. 13, 1–10 (2020).

Google Scholar

Abooali, D. & Soleimani, R. Structure-based modeling of critical micelle concentration (CMC) of anionic surfactants in brine using intelligent methods. Sci. Rep. 13, 13361 (2023).

Article ADS CAS PubMed PubMed Central Google Scholar

Gu, Y.-Q. et al. Using an SGB decision tree approach to estimate the properties of CRM made by biomass pretreated with ionic liquids. Int. J. Chem. Eng. 2021, 1–9 (2021).

Article CAS Google Scholar

Dong, L., Wang, R., Liu, P. & Sarvazizi, S. Prediction of pyrolysis kinetics of biomass: New insights from artificial intelligence-based modeling. Int. J. Chem. Eng. 2022, 1–8 (2022).

Article Google Scholar

Daneshfar, R. et al. Estimating the heat capacity of non-Newtonian ionanofluid systems using ANN, ANFIS, and SGB tree algorithms. Appl. Sci. 10, 6432 (2020).

Article CAS Google Scholar

Golbraikh, A. & Tropsha, A. Beware of q2!. J. Mol. Graph. Modell. (2002).

Golzar, K., Amjad-Iranagh, S. & Modarress, H. Prediction of thermophysical properties for binary mixtures of common ionic liquids with water or alcohol at several temperatures and atmospheric pressure by means of artificial neural network. Ind. Eng. Chem. Res. 53, 7247–7262 (2014).

Article CAS Google Scholar

Pratim Roy, P., Paul, S., Mitra, I. & Roy, K. On two novel parameters for validation of predictive QSAR models. Molecules 14, 1660–1701 (2009).

Article PubMed PubMed Central Google Scholar

Witten, I. H., Frank, E., Hall, M. A., Pal, C. J. & Data, M. Data mining 403–413 (Elsevier Amsterdam, 2025).

Google Scholar

Makridakis, S., Wheelwright, S. C. & Hyndman, R. J. Forecasting methods and applications (John Wiley & Sons, 2008).

Google Scholar

Makridakis, S., Wheelwright, S. C. & Hyndman, R. Forecasting methods for managers (Wiley, 1989).

Google Scholar

Ross, T. Indices for performance evaluation of predictive models in food microbiology. J. Appl. Bacteriol. 81, 501–508 (1996).

CAS PubMed Google Scholar

Soleimani, R. & Dehaghani, A. H. S. Unveiling CO2 capture in tailorable green neoteric solvents: An ensemble learning approach informed by quantum chemistry. J. Environ. Manage. 354, 120298 (2024).

Article CAS PubMed Google Scholar

Rousseeuw, P. J. & Leroy, A. M. Robust regression and outlier detection (John Wiley & Sons, 2005).

MATH Google Scholar

Mohammadi, A. H., Eslamimanesh, A., Gharagheizi, F. & Richon, D. A novel method for evaluation of asphaltene precipitation titration data. Chem. Eng. Sci. 78, 181–185 (2012).

Article CAS Google Scholar

Gramatica, P. Principles of QSAR models validation: Internal and external. QSAR Comb. Sci. 26, 694–701 (2007).

Article CAS Google Scholar

Gharagheizi, F. et al. Evaluation of thermal conductivity of gases at atmospheric pressure through a corresponding states method. Ind. Eng. Chem. Res. 51, 3844–3849 (2012).

Article CAS Google Scholar

Eslamimanesh, A., Gharagheizi, F., Mohammadi, A. H. & Richon, D. A statistical method for evaluation of the experimental phase equilibrium data of simple clathrate hydrates. Chem. Eng. Sci. 80, 402–408 (2012).

Article CAS Google Scholar

Pelaquim, F. P., Bitencourt, R. G., Neto, A. M. B., Dalmolin, I. A. L. & da Costa, M. C. Carbon dioxide solubility in deep eutectic solvents: Modelling using cubic plus association and Peng-Robinson equations of state. Process Saf. Environ. Prot. 163, 14–26 (2022).

Article CAS Google Scholar

Liu, Y. et al. Screening deep eutectic solvents for CO2 capture with COSMO-RS. Front. Chem. 8, 82 (2020).

Article ADS PubMed PubMed Central Google Scholar

Download references

Department of Chemical Engineering, Faculty of Chemical Engineering, Tarbiat Modares University, P.O. Box 14115-143, Tehran, Iran

Reza Soleimani

Department of Petroleum Engineering, Faculty of Chemical Engineering, Tarbiat Modares University, P.O. Box 14115-143, Tehran, Iran

Amir Hossein Saeedi Dehaghani

School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, 70504, USA

Ziba Behtouei

Department of Computer Engineering, Damavand Branch, Islamic Azad University, Damavand, Iran

Hamidreza Farahani

Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

Seyyed Mohsen Hashemi

You can also search for this author inPubMed Google Scholar

You can also search for this author inPubMed Google Scholar

You can also search for this author inPubMed Google Scholar

You can also search for this author inPubMed Google Scholar

You can also search for this author inPubMed Google Scholar

Reza Soleimani: Conceptualization, Methodology, Software, Validation, Writing—original draft, Writing—Review & Editing, Resources, Visualization, Data Curation, Investigation, Formal analysis, Supervision. Amir Hossein Saeedi Dehaghani: Project administration, Conceptualization, Validation, Data Curation, Software, Writing—Review & Editing, Methodology, Formal analysis, Supervision. Ziba Behtouei: Conceptualization, Methodology, Software, Writing—original draft, Resources, Visualization, Data Curation, Investigation, Formal analysis. Hamidreza Farahani: Methodology, Software, Writing—original draft, Resources, Data Curation, Investigation, Formal analysis. Seyyed Mohsen Hashemi: Software, Writing—original draft, Data Curation, Investigation, Formal analysis.

Correspondence to Reza Soleimani or Amir Hossein Saeedi Dehaghani.

The authors declare no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

Soleimani, R., Saeedi Dehaghani, A.H., Behtouei, Z. et al. Prognostication of advanced CO2 capture using tunable solvents with an ensemble learning-based decision tree model. Sci Rep 15, 19694 (2025). https://doi.org/10.1038/s41598-025-04318-4

Download citation

Received: 06 December 2024

Accepted: 26 May 2025

Published: 04 June 2025

DOI: https://doi.org/10.1038/s41598-025-04318-4

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

SHARE