- Open Access
A Nano-QSTR model to predict nano-cytotoxicity: an approach using human lung cells data
Particle and Fibre Toxicology volume 20, Article number: 21 (2023)
The widespread use of new engineered nanomaterials (ENMs) in industries such as cosmetics, electronics, and diagnostic nanodevices, has been revolutionizing our society. However, emerging studies suggest that ENMs present potentially toxic effects on the human lung. In this regard, we developed a machine learning (ML) nano-quantitative-structure-toxicity relationship (QSTR) model to predict the potential human lung nano-cytotoxicity induced by exposure to ENMs based on metal oxide nanoparticles.
Tree-based learning algorithms (e.g., decision tree (DT), random forest (RF), and extra-trees (ET)) were able to predict ENMs’ cytotoxic risk in an efficient, robust, and interpretable way. The best-ranked ET nano-QSTR model showed excellent statistical performance with R2 and Q2-based metrics of 0.95, 0.80, and 0.79 for training, internal validation, and external validation subsets, respectively. Several nano-descriptors linked to the core-type and surface coating reactivity properties were identified as the most relevant characteristics to predict human lung nano-cytotoxicity.
The proposed model suggests that a decrease in the ENMs diameter could significantly increase their potential ability to access lung subcellular compartments (e.g., mitochondria and nuclei), promoting strong nano-cytotoxicity and epithelial barrier dysfunction. Additionally, the presence of polyethylene glycol (PEG) as a surface coating could prevent the potential release of cytotoxic metal ions, promoting lung cytoprotection. Overall, the current work could pave the way for efficient decision-making, prediction, and mitigation of the potential occupational and environmental ENMs risks.
Engineered nanomaterials (ENMs) based on metal oxide nanoparticles offer a wide range of promising applications, including cosmetics, electronics, sunscreens, textiles, biomedical products, and diagnostic nanodevices, among others [1,2,3,4,5]. Although inorganic ENMs offer multiple technological advantages and reveal exciting physicochemical properties, understanding their interaction with a biological environment is still challenging. Growing evidence demonstrates that some inorganic ENMs (e.g., CuO, ZnO, Fe2O3, CeO2, Ag, Au, and TiO2) could be potentially more toxic than their organic counterparts, such as carbon-based ENMs . It is well-recognized that, from the occupational and molecular epidemiology point of view, engineered inorganic ENMs present a higher potential to induce several human lung epithelial perturbations mainly based on the intracellular increase of reactive oxygen species (ROS) [6, 7], which usually play an important role in the high prevalence of human lung nano-cytotoxicity of ENMs at the molecular, cellular, and subcellular levels (e.g., mitochondria, lysosomes, and nuclei) [7, 8].
Over the last few decades, various sampling strategies have been used to determine the ENMs occupational exposure. However, there is still no international consensus on measurement strategies, metrics, or exposure limits, as toxicity studies of ENMs have generally been conducted in non-human in vitro cell-based models. The assessment of individual human exposure to ENMs remains a critical issue despite recent innovative developments in personal measurement nanodevices . In this regard, most of the research institutes that synthesize and manufacture ENMs, manage detailed action plans to mitigate the personal nano-exposure of workers, mainly by the respiratory tract [9,10,11,12,13]. Herein, the current nanorisk assessment paradigm was developed by the U.S. National Academy of Sciences and the Federal Government by considering four critical steps: (i) hazard identification, (ii) dose-response assessment, (iii) exposure assessment, and (iv) nano-risk characterization [14, 15]. Despite the numerous in vitro and in vivo studies to tackle the relationship between the physicochemical properties of ENMs and their nanotoxicological responses, the obtained evidence is often contradictory or nonconclusive .
Recently, there has been unprecedented global interest in improving human nanosafety relevance, which is possible through in silico models . This interest has been significantly supported by increased top-down investment from central sources such as the EPA-Tox21 Consortium , the National Institute of Health , the International Organization for Standardization (ISO) , the European Commission through the Horizon 2020 Initiative, and the Organisation for Economic Co-operation and Development . The efforts include the development of several computational models for nanotoxicology predictions . Overall, in silico approaches are greatly beneficial to address the current concerns on ENMs nanotoxicity, as they introduce predictive animal-free technologies based on state-of-the-art machine learning (ML) methods. Newly published in Nano Today journal, Burden et al.  strongly suggest that by using sophisticated ML-based methodologies, which rigorously follow the 3Rs ethical principles adapted to nanosafety, it is possible to extrapolate in vitro exposure effects to explain human exposure [22, 23].
Following this idea, predictive in silico approaches could efficiently address Nano-Quantitative Structure-Activity/Property/Toxicity Relationships (nano-QSAR/QSPR/QSTR) of ENMs based on metal oxide nanoparticles to prevent potential human lung nano-cytotoxicity. Although the classical term to address this type of problem is QSAR, which gives a broad framework capable of integrating the most up-to-date models, in this work, the fundamental concept is nano-QSTR. Such a term derives from addressing ENMs from a toxicological point of view [24,25,26]. Given the complexity of the ENMs, the nano-QSTR model application is restricted by the use of physicochemical properties as nano-descriptors. Nevertheless, with the advancement of Artificial intelligence (AI) and Data Science in the last years, establishing relationships between the physicochemical properties (e.g., electronegativity, ionization potential, van der Waals radius, among others) of a complex system such as ENMs and their nanotoxicity is reasonable [27,28,29].
In light of the basic concept underlying the implementation of nano-QSTR predictive models, it assumes that a given set of similar structures (e.g., ENMs) have an equivalent toxicological behaviour. Thus, a subtle structural change in the ENMs composition, such as different crystallographic cores, the presence or absence of doping agents, or surface coatings, among others, should lead to a slight divergence in the toxicological paradigm. In contrast, the advances in AI and ML continue to provide large opportunities to move forward in the mechanistic understanding of nanotoxicology responses . In this regard, there are several examples of transparent and understandable techniques, including multiple linear regression (MLR), partial least squares (PLS) regression, decision tree (DT), and random forest (RF), among others . Besides, such algorithms should be supported by strategies that present a visual glance of the diversity and homogeneity of the data distributions and, together with the linear and non-linear correlation among the ENMs nano-descriptors, allow the exclusion of redundant information.
Therefore, this work aims to develop a novel and robust ML nano-QSTR model to predict the potential human lung nano-cytotoxicity induced by ENMs based on metal oxide nanoparticles. Moreover, the present work is an effort to pave the way for using in silico tools to efficiently predict ENMs potential occupational risks and make regulatory decisions in nanotoxicology and environmental health.
Materials and methods
The data of 16 ENMs for human lung carcinoma cell line A549 have been taken from the literature  and recently the same data set was used by us using a quasi-SMILES approach . The dataset contains 377 observations (N = 377) on cell viability (%), and covers several relevant assay conditions, such as the different composition of the core, doping, surface coating, diameter (nm), and concentrations (µg/ml) of the ENMs. An overview of the physicochemical composition of the ENMs is presented in Table 1; Fig. 1. Complete data of this subsection is available in the Supplementary Information (see Additional File 1, Table S1).
A crucial step in developing ML nano-QSTR models is calculating the adequate nano-descriptors. Nano-descriptors are numerical forms of nanoparticle properties (e.g., electronic, physicochemical, structural, and topological) that could be used to represent ENMs and might be quantitatively associated with cytotoxicity. Moreover, nano-descriptors contain relevant information on metal, non-metal, and semimetals obtained from the periodic table and other literature sources . A set of 31 nano-descriptors (e.g., Van der Waals radius of the active metal, the number of metallic elements, and the number of electrons of the active metal) was calculated using the Elemental-Descriptor (software version 1.0, Gdansk, Poland). Complete data of this subsection is available in the Supplementary Information (see Additional file 1, Table S2 and Table S3).
In the context of ML nano-QSTR models, applying data pre-processing techniques represents a step that efficiently helps improve data quality by extracting relevant features from the raw data. Overall, the data pre-processing includes several procedures, such as cleaning, organizing, and structuring the data into an understandable and readable input format for building ML nano-QSTR models to predict a given biological response (i.e., cytotoxicity-induce by ENMs).
In the present study, we carried out the following pre-processing steps: (i) filter the ENMs with a diameter of less than 200 nm, reducing the number of available ENMs to 11 from the original amount of 16 ENMs, and the number of observations to N = 333 from the initial amount of N = 377; (ii) encode the categorical nano-descriptors (core, doping, and surface coating) into numerical readable inputs by using a One-Hot Encoding procedure ; (iii) application of a standardization procedure based on the Z-score normalization method, where the values are centered on the mean with a unit standard deviation. The standardization procedure is represented by Eq. (1), where µ is the mean of the ENMs nano-descriptor values and σ its standard deviation:
Lastly, the iv) pre-processing step was to approximate the shape of the distribution of each numerical nano-descriptor to a Gaussian distribution by applying the Yeo-Johnson transformation . Complete data of this subsection is available in the Supplementary Information (see Additional file 2, Figure S1, Figure S2 and Figure S3).
The first step was to withhold a random sample of N = 33 from a total of N = 333 to simulate an unseen dataset. Another way to think about this step is that 33 observations were unavailable to train and evaluate the ML nano-QSTR models. Afterward, the 300 observations in the dataset were randomly divided into a training subset of N = 210 (70%) and a test subset of N = 90 (30%), respectively . The training subset was used to train the ML nano-QSTR models, whereas the test subset was employed to evaluate its predictive performance. Complete data of this subsection is available in the Supplementary Information (see Additional file 1, Table S4 and Table S5).
ML nano-QSTR model development
Considering the trade-off between ML nano-QSTR model performance and its interpretability, the ENMs nano-descriptors were combined by simple arithmetical operations and converted into new statistically significant nano-descriptors. For instance, two ENMs nano-descriptors were multiplied to be more preponderant in explaining the data variance than the same two ENMs nano-descriptors separately. The next step was to reduce the dimensionality of the data, reduce the computational cost, and minimize the redundancy between the ENMs nano-descriptors. Three key operations were performed, namely (i) remove the ENMs nano-descriptors with a low variance; (ii) remove highly inter-correlated ENMs nano-descriptors; (iii) implement a combination of various permutation importance techniques to achieve the final subset of ENMs nano-descriptors , such as Shapley additive explanations (SHAP), which explains the contribution of each ENMs nano-descriptor to the ML nano-QSTR model . Afterward, several tree-based algorithms were used to make the ML pipeline more transparent and interpretable, such as decision tree (DT), random forest (RF), and extra-trees regressor (ET) . Complete data of this subsection is available in the Supplementary Information (see Additional file 1, Table S6).
ML nano-QSTR model validation
In the present study, we followed the principles of the Organization for Economic Co-operation and Development (OECD) concerning model validation, which establishes that a reliable model should present appropriate goodness-of-fit measures, robustness, and predictivity performance . In this regard, there are two methods to evaluate the goodness of a model: (i) internal and (ii) external validation. The (i) internal validation evaluates the fitting of the model on existing data (training set); the (ii) external validation evaluates future data, i.e., how reliable the model can predict new data (test set and unseen set). Here, the (i) internal validation of the regression-based models was determined based on several statistical metrics such as determination coefficient (R2), determination coefficient based-metrics (Q2LOO), root-mean-square error (RMSE), mean absolute error (MAE), and coefficient of concordance (CCC). Moreover, the robustness of the model was represented by a 5-fold cross-validation process . The (ii) external validation of the models was determined using similar statistical parameters, such as R2ext, Q2F1, Q2F2, RMSE, MSE, and CCC. All statistical metrics were computed via DTC Lab Xternal Validation Plus (software version 1.2, India) .
According to the OECD third principle , a QSAR model to predict a given biological response (i.e., ENMs cytotoxicity) should be associated with a defined applicability domain (AD). The AD is a theoretical region of the chemical space that contains both model nano-descriptors and modelled responses, in which the model makes predictions with a given reliability . Herein, the AD was calculated using a standardization approach and retrieved via DTC Lab Applicability Domain Calculator (software version 1.0, India). For calculating the AD, Eq. (2) was used:
Where Snew (k) is Snew value for ENMk, S̅k is the mean of the standardized nano-descriptors for ENMk (from the training, test, or unseen set), and σSk is the standard deviation of standardized nano-descriptors for ENMk (from the training, test, or unseen set). Overall if the Snew (k) is lower or equal to 3, then the ENMk is not an outlier (if in the training set) or is within the AD (if in the test or unseen set) .
An overview of the implemented data-driven approach is presented in Fig. 2. All algorithms were implemented in Python (software version 3.9), using libraries such as Pycaret (software version 2.3.8) and scikit learn (software version 0.23.2). Altogether, the experiments allowed the extraction of valuable insights from the dataset and provided the baseline to construct the ML nano-QSTR models. The definition of AD made it possible to understand the limitations and boundaries where the predicted values can be trusted with confidence.
Nano-descriptors distribution and diversity
Based on the data of 16 engineered nanomaterials (ENMs), under different experimental conditions, a ML nano-QSTR model with cell viability endpoint as the dependent variable was established. As previously mentioned, the ultimate goal of the developed model was to predict the potential human lung nano-cytotoxicity.
As the application of any data-driven algorithm requires a comprehensive understanding of the data, one of the first concerns was to consider the differences in the structure and chemical composition of the ENMs, extract valuable insights from the dataset and provide a baseline to construct the ML nano-QSTR model. In this regard, Fig. 3 presents an overview of the dataset characteristics, such as the endpoint frequency distribution, and the diversity of core, doping, and surface coating nanomaterials composition.
It is not uncommon to find a skewed frequency distribution for ENMs nano-descriptors from nanotoxicological experimental data, i.e., some types of structural attributes appear much more frequently . In this regard, the evaluated endpoint (cell viability) follows a normal distribution, predominantly ranging from 75 to 125%, with a mean value of 96% and a standard deviation of 23% (Fig. 3(a)). Regarding the composition of the ENMs, based on their frequency distribution, the core and the surface coating features present a qualitative and quantitative diversity, with the most prevalent core types being Zn, Ag, SiO2, and surface coating compositions being the ENMs without surface coating, and the ENMs coated with PMAA, and sodium citrate (Fig. 3(b) and Fig. 3(d)). The frequency distribution of the different doping types is predominantly represented by the absence of doping in the ENMs and may seem skew toward that direction (Fig. 3(c1)). However, it is important to note that for ENMs with and without doping, the frequency distribution of both types is much more balanced (Fig. 3(c2)). To complement this analysis, Fig. 3(e) to Fig. 3(g) present the trend of the cell viability variation with representative examples of each nano-descriptor. Despite the existence of specific ENMs compositions that significantly vary the cell viability, the tendency of variation agrees with the mean and standard deviation previously mentioned. A similar frequency distribution analysis was performed on the set of 31 ENMs nano-descriptors, such as the number of metallic elements, the Van der Waals radius of the active metal, and the number of electrons of the active metal, among other examples (see Additional file 2, Figure S4). Overall, each nano-descriptor depicts a high degree of diversity by presenting several attributes that can span the ENMs structural space.
ML nano-QSTR model performance and validation
Through the dataset exploration and characterization conducted in the previous section, the richness of the dataset was assured, providing the basis for the development of the ML nano-QSTR model. In this particular problem, three interpretable learning algorithms, including decision tree (DT), random forest (RF), and extra-trees regressor (ET) are presented. Although DT, RF, and ET belong to the same family of learning algorithms, i.e., tree-based models that use conditional statements to make predictions, DT is the simplest. Therefore, it is expected that DT presents slightly lower statistical metrics than RF and ET. The use of such an algorithmic family is in agreement with recent studies that point out that an interpretable learning algorithm is more valuable to experimentalists than a highly predictive but black-box model since its interpretation is complex and non-trivial [31, 45]. To verify the reliability and robustness of the developed models, Table 2 presents an overview of the statistical metrics for training, validation, and test sets.
Regarding the evaluation of the developed models on existing data, i.e., training subset, DT, RF, and ET present an R2 between 0.95 and 0.96, highlighting their high statistical performance in learning the behavior of the training ENMs. Concerning DT, RF, and ET internal validation, the 5-fold cross-validation process enhanced their robustness as R2 for each model is between 0.7 and 0.8. As previously mentioned, with the slight increase in model complexity, there is an increment in the statistical performance as DT presents an R2 of 0.73, RF of 0.76, and ET of 0.79. Overall, the internal validation of each model is guaranteed by R2 and Q2LOO to be in the same order of magnitude. As for the evaluation of the developed models on new data, i.e., test subset, DT presents an R2ext of 0.76, while RF and ET present an R2ext of 0.79. This statistical parameter is complemented by the determination coefficient based-metrics (Q2F1 and Q2F2), which depict similar values. Altogether, both internal and external validation confirms that the developed models have the potential to reliably predict A549 cell line viability.
From a general point of view, the key performance indicator in selecting the model to be used in the final stage of prediction was the internal validation subset. Such a decision relies on ET presenting R2 and Q2-based metrics higher than DT in 8% and RF in 4%, while the training and test set statistical parameters are in the same order of magnitude. To corroborate this analysis, Additional file 2, Figure S5 presents the learning curves for the ET nano-QSTR model for training and validation sets, highlighting a slight trade-off between bias and variance, which could be adjusted by having more training instances. Thus, Fig. 4 presents an overview of the agreement between experimental and predicted cell viability values by the ET model for training and test sets.
Through the exploration of Fig. 4(a), it is possible to quantitative and qualitatively highlight the strong correlation between the observed and predicted cell viability values, as most ENMs are within the 95% prediction level range. Such analysis is complemented by Fig. 4(b), which enhances that training and test data points approximately follow a symmetrical distribution, tending to cluster towards the middle of the plot, around lower values of the y-axis (e.g., most of the residuals are between − 10 and 10).
Bearing in mind that the final aim of the present work is to predict the potential nano-cytotoxicity induced by ENMs on human lung carcinoma cells, it is fundamental to understand the correlation between the nano-descriptors and the cell viability. Moreover, it is noteworthy to highlight the most significant nano-descriptors in the ET nano-QSTR model performance. Thus, Fig. 5 presents an overview of some representative nano-descriptors and their correlation with cell viability.
Through the exploration of Fig. 5(a), it is possible to identify two distinct groups of nano-descriptors that present a negative correlation with cell viability. One group depicts nano-descriptors retrieved from the original dataset, such as PMAA surface coating, Cl (3%) doping, and ENMs concentration. The other group describes a set of nano-descriptors that were calculated to become more preponderant in explaining variances in the cell viability data, such as the mathematical combination of ENMs concentration and PMAA surface, ENMs concentration, and ENMs diameter. Overall, Fig. 5(a) highlights nano-descriptors that tend to decrease cell viability and might promote cytotoxicity at the cellular level. As for nano-descriptors that present a positive correlation with the endpoint, Fig. 5(c) highlights Fe3O4 core, Fe (4%) doping, and polyethylene glycol (PEG) surface coating as the group of nano-descriptors retrieved from the original dataset. In a similar analysis, the other group describes a set of nano-descriptors obtained from the Elemental descriptor calculator, such as the Van der Waals radius of the active metal, the number of metallic elements, and the number of electrons of the active metal. Comprehensively, Fig. 5(c) enhances nano-descriptors that tend to increase cell viability and might avoid cytotoxicity at the cellular level.
From a modelling point of view, Fig. 5(b) and Fig. 5(d) represent a fundamental aspect as both figures highlight the set of nano-descriptors that contributed the most to the model performance. Interestingly, the nano-descriptors that present a negative correlation with the endpoint are the most relevant from the entire set. An in-depth analysis of Fig. 5(b) shows that the nano-descriptor with the highest contribution on the model performance results from the mathematical combination of the ENMs concentration and PMAA surface coating.
Considering that ENMs surface area is described as a central factor related to the toxic potential of particulate matter, we did explore if this descriptor could play a relevant role in our model. As detailed above, several nano-descriptors linked to the surface coating reactivity properties and diameter of the ENMs were identified as some of the most relevant characteristics to predict human lung nano-cytotoxicity. To mathematically calculate the surface area of the ENMs, it is necessary to perform some assumptions, namely (i) the ENM is a perfect sphere, and (ii) the ENM size is the diameter of the sphere. Therefore, even though indirectly, the model does consider the surface area of the ENMs as a driving factor for nano-cytotoxicity. All the assumptions and mathematical equations to calculate the surface area are explained in detail by Shin et al. . Complete data on the direct influence of ENMs surface area on model performance is available in the Supplementary Information (see Additional file 1, Table S7, and Additional file 2, Table S1).
Taking advantage of such knowledge, the ET nano-QSTR model was retrained with the ENMs nano-descriptors identified in Fig. 5 and applied to new data, which was not used to develop and validate the model. Table 3 presents an overview of the statistical metrics of ET nano-QSTR model when applied to the unseen subset.
Table 3 shows a significant increase in ET nano-QSTR model performance as the R2ext increased from 0.79 to 0.93, representing an increase of 18%. Another interesting observation is the similar order of magnitude between the R2ext and Q2-based metrics. Additionally, the RMSE, MAE, and CCC performance metrics increased from 12.58 to 4.37, 6.60 to 3.52, and 0.883 to 0.96, representing an increase of 65%, 47%, and 8%, respectively. Overall the developed model presents a strong, reliable, and robust performance in predicting cell viability.
Applicability domain of the proposed model
However, it is paramount to identify the border of the optimum prediction space, i.e., applicability domain (AD). From a general point of view, the developed ET nano-QSTR model is valid in a chemical space where the ENMs possess structural and physicochemical properties similar to the ENMs used to train and validate the model. Otherwise, the ENMs might be considered outliers or even out of the AD. It is significant to mention that all the studied ENMs are within the AD. Detailed values of the AD of the ENMs are depicted in the Supplementary Information (see Additional file 1, Table S8).
Although human lung nano-cytotoxicity induced by ENMs based on metal oxide nanoparticles is among the current occupational and environmental concerns, it remains unexplored and under-researched. The current standard in vitro and in vivo models used to evaluate such a type of nano-cytotoxicity are time-consuming, costly, and could involve many ethical concerns in animal experimentation. In this sense, this work presents an ET nano-QSTR model to assist experimental scientists by providing a mechanistic interpretation learned from data of 16 ENMs for the A549 cell line [40, 46].
The mechanistic interpretation regards the set of optimal nano-descriptors, i.e., the most significant nano-descriptors in the model performance, and considers if the nano-descriptors are (i) negatively or (ii) positively correlated with the cell viability [47, 48]. Then, three well-recognized nanotoxicological mechanisms are used to complement such interpretation, including (i) the ENMs core type- and diameter-dependent release of cytotoxic metal ions (e.g., Fe2+, Fe3+, Ag+, Au+, Ti2+, Cd2+) from the ENMs core reactive surface, which could promote redox-homeostasis perturbations, (ii) the physio-pathological increase of intracellular reactive oxygen species (ROS) levels by mitochondrial dysfunction, and (iii) the nano-bio interaction with binding sites of key molecular targets, such as the human lung epithelial proteins and multiprotein junctional complexes that form the selective permeability barrier of the human lung epithelial, which may induce barrier dysfunction [7, 32, 49, 50].
An in-depth analysis of Fig. 5, which identifies six negatively and positively correlated nano-descriptors with cell viability, suggests that the presence of PEG as a surface coating of a Fe3O4 core significantly enhances cell viability and inversely attenuates human lung nano-cytotoxicity. As PEG presents excellent pharmacokinetic properties based on absorption, distribution, metabolism, and excretion (ADME), its presence as a surface coating could significantly reduce the potential release of cytotoxic ions (Fe2+) from the Fe3O4 core. Overall, such behavior is congruent with the literature, as the surface-dependent release of cytotoxic metal ions has been well documented in previous experimental works [51,52,53,54].
In this regard, the generated divalent Fe2+ ions from the ENMs based on metal oxide nanoparticles tend to increase the intracellular concentration of oxygen-free-radical groups (e.g., hydroxyl radical). Such mechanism is explained due to the occurrence of the Fenton-Haber-Weiss reaction at the subcellular level, which is directly associated with molecular lung cytotoxic mechanisms (e.g., mitochondria dysfunction promoted by Fe2+ ions) [55,56,57,58,59]. Therefore, the presence of PEG as a surface coating significantly contributes to the inhibition of these cytotoxic signaling pathways.
In opposition to the previously described behavior of PEG, the presence of PMAA as a surface coating or the Cl− (3%) as a doping condition tend to decrease cell viability, as they intensify the potential release of cytotoxic metal ions (e.g., Fe2+, Fe3+, Ag+, among others) from the inorganic ENMs core or doping composition [51,52,53,54, 60]. Such evidence is corroborated by the ET nano-QSTR model, as the most influential nano-descriptor to the model performance derives from the arithmetic combination of the ENMs concentration and PMMA surface coating. Furthermore, the nano-descriptor also presents a high negative correlation (R2 = − 0.83) with cell viability and may be inversely linked with human lung nano-cytotoxicity.
Besides, all the core-based ENMs nano-descriptors, such as the number of metallic elements (R2 = 0.81), the Van der Waals radius of the active metal (R2 = 0.71), and the number of electrons of the active metal (R2 = 0.73), are positively correlated with the cell viability. Such structural attributes are directly associated with the metal core-based reactive properties when the ENMs are found in their pristine form and have an intrinsic cytotoxic potential according to the diameter and charge of the inorganic core. Nonetheless, such nano-descriptors do not contribute to intensifying the cytotoxicity from the in silico point of view. A reasonable explanation behind this behavior focuses on the presence of a PEG surface coating that stabilizes the ENMs’ chemical surface reactivity and avoids the occurrence of direct nano-bio interactions of the metal oxide core with binding sites of the target proteins forming the human lung epithelial (i.e., A549 cells target proteins).
It is well-established that a decrease in the diameter of the metal oxide nanoparticles contributes to a significant increase in their ability to access lung cell compartments, including mitochondria, lysosomes, and nuclei. Therefore, the concentration of metal oxide nanoparticles in such organelles will increase and play a fundamental role in lung epithelial barrier dysfunction . More importantly, the diameter decrease of the evaluated metal oxide nanoparticles could contribute to an exponential increase in the number of reactive atoms expressed on the face-based crystallographic planes of the ENMs core. Such a decrease simultaneously promotes the core chemical reactivity and its potential capacity to interact with relevant lung tissues and cells [7, 61, 62]. The proposed ML nano-QSTR model corroborates this information through the negatively correlated nano-descriptors that result from the mathematical combination of ENMs concentration and ENMs diameter.
Overall, the current mechanistic interpretation brings a novel contribution to assist experimental scientists in understanding and analyzing nanotoxicological data. Nonetheless, achieving a broader domain of applicability and trustability in nano-QSTR models is still a challenge. Therefore, it is necessary to start or continue to collect human lung experimental data standardly, i.e., to implement the findability, accessibility, interoperability, and reusability (FAIR) data principles . In this regard, it is crucial to highlight that data concerning (i) the potential release of cytotoxic metal ions through time and consequently, (ii) the actual concentration of ENMs that reach the cells were not included in the modeling procedure due to the unavailability of such data for human lung cells.
Indeed, using advanced computational models to predict the effective cellular dose is fundamental to understanding the interaction of submerged materials with biological systems. Recently, DeLoid et al.  explored both three-dimensional computational fluid dynamics (CFD) and a newly-developed one-dimensional Distorted Grid (DG) model to predict the delivered dose metrics for submerged ENMs in culture media. The last model was later used and validated in a study by Kowoll et al.  to predict the deposition of particles on cellular and intercellular human lung surfaces. Interestingly, the authors highlight both model capabilities and limitations, specifically regarding the spatial distribution of particles on heterogeneous surfaces, which is the case in our study. Such considerations are even more relevant since Kowoll et al. performed the experiments in a human lung cell line (i.e., A549 cells) that fit with the same biological model considered in our in silico study.
Therefore, in future works, we plan to address these limitations by incorporating computational particokinetics models to estimate the relationship between the release of cytotoxic metal ions through time, the relevant in vitro dose criteria for the dosimetry of ENMs, and their influence on the shape of the dose-response curve . Even with all the challenges, a prospective nano-QSTR model should be performed in a useful way that could lead and orient experimental scientists to decision-making processes about nanotoxicological data.
An ML nano-QSTR model was successfully developed to predict the potential human lung nano-cytotoxicity induced by ENMs based on metal oxide nanoparticles. Results demonstrated that using tree-based learning algorithms (e.g., extra-trees regressor) allowed the development of a simple, interpretable, and robust nano-QSTR model, as ET presented R2 and Q2-based metrics of 0.95, 0.80, and 0.79 for training, internal validation, and external validation subsets. By taking advantage of the advances in AI and ML, which continue to provide opportunities to move forward in the mechanistic understanding of nanotoxicology responses, we could identify the six most significant nano-descriptors in the model performance. Therefore, we could understand if the nano-descriptors are (i) negatively or (ii) positively correlated with the cell viability. As for the (i) negatively correlated, a decrease in the diameter of the metal oxide nanoparticles contributes to a significant increase in their ability to access lung cell compartments, thus promoting lung epithelial barrier dysfunction. As for the (ii) positively correlated, the presence of PEG as a surface coating significantly stabilizes the ENMs’ chemical surface reactivity and avoids the potential release of cytotoxic ions, promoting cell viability and inversely attenuating human lung nano-cytotoxicity. By exploiting such knowledge, the ML nano-QSTR model was retrained with the most significant nano-descriptors and applied to new data (e.g., unseen subset), allowing the increase of R2 and Q2-based metrics from 0.79 to 0.92. Based on these findings, the present work may pave the way to possibly predict ENMs’ potential occupational risks and make regulatory decisions in nanotoxicology and environmental health.
The datasets supporting the conclusions of this article are included in the Supplementary Information. The python script can be retrieved from the corresponding author upon reasonable request.
Absorption, Distribution, Metabolism, and Excretion
Coefficient of Concordance
Findability, Accessibility, Interoperability, and Reusability
International Organization for Standardization
Multiple Linear Regression
Organization for Economic Co-operation and Development
Partial Least Squares
- Q2 :
Determination Coefficient Based-Metrics
- R2 :
Reactive Oxygen Species
Shapley Additive Explanations
Mu Y, Wu F, Zhao Q, Ji R, Qie Y, Zhou Y, et al. Predicting toxic potencies of metal oxide nanoparticles by means of nano-QSARs. Nanotoxicology. 2016;10:1207–14.
Karlsson HL, Cronholm P, Gustafsson J, Möller L. Copper oxide nanoparticles are highly toxic: a comparison between Metal Oxide Nanoparticles and Carbon Nanotubes. Chem Res Toxicol. 2008;21:1726–32.
De M, Ghosh PS, Rotello VM. Applications of Nanoparticles in Biology. Adv Mater. 2008;20:4225–41.
Lebre F, Chatterjee N, Costa S, Fernández-De-gortari E, Lopes C, Meneses J et al. Nanosafety: An Evolving Concept to Bring the Safest Possible Nanomaterials to Society and Environment. J Nanomater. 2022;12.
Yin N, Liu Q, Liu J, He B, Cui L, Li Z, et al. Silver nanoparticle exposure attenuates the viability of Rat Cerebellum Granule cells through apoptosis coupled to oxidative stress. Small. 2013;9:1831–41.
Wilson MR, Lightbody JH, Donaldson K, Sales J, Stone V. Interactions between ultrafine particles and transition metals in vivo and in vitro. Toxicol Appl Pharmacol. 2002;184:172–9.
Garcés M, Magnani ND, Pecorelli A, Calabró V, Marchini T, Cáceres L, et al. Alterations in oxygen metabolism are associated to lung toxicity triggered by silver nanoparticles exposure. Free Radic Biol Med. 2021;166:324–36.
Brown DM, Wilson MR, MacNee W, Stone V, Donaldson K. Size-dependent Proinflammatory Effects of Ultrafine polystyrene particles: a role for Surface Area and oxidative stress in the enhanced activity of Ultrafines. Toxicol Appl Pharmacol. 2001;175:191–9.
Iavicoli I, Fontana L, Pingue P, Todea AM, Asbach C. Assessment of occupational exposure to engineered nanomaterials in research laboratories using personal monitors. Sci Total Environ. 2018;627:689–702.
Hischier R, Walser T. Life cycle assessment of engineered nanomaterials: state of the art and strategies to overcome existing gaps. Sci Total Environ. 2012;425:271–82.
Kuhlbusch TA, Asbach C, Fissan H, Göhler D, Stintz M. Nanoparticle exposure at nanotechnology workplaces: a review. Part Fibre Toxicol. 2011;8:22.
Hodson L, Methner M, Zumwalde RD. Approaches to safe nanotechnology; managing the health and safety concerns associated with engineered nanomaterials. 2009.
Sahu M, Biswas P. Size distributions of aerosols in an indoor environment with engineered nanoparticle synthesis reactors operating under different scenarios. J Nanopart Res. 2010;12:1055–64.
Council NR. Risk assessment in the Federal Government: managing the process. 1st ed. Washington, DC: The National Academies Press; 1983.
Savolainen K, Alenius H, Norppa H, Pylkkänen L, Tuomi T, Kasper G. Risk assessment of engineered nanomaterials and nanotechnologies-A review. Toxicology. 2010;269:92–104.
von Ranke NL, Geraldo RB, Lima dos Santos A, Evangelho VGO, Flammini F, Cabral LM et al. Applying in silico approaches to nanotoxicology: Current status and future potential. Comput Toxicol. 2022;22.
Environmental Protection Agency E. ToxCast Owner’s Manual-Guidance for Exploring Data. 2018. https://www.epa.gov/sites/default/files/2018-04/documents/toxcastownermanual4252018.pdf Accessed on 6 Nov 2022.
EU-ToxRisk-About. EU-ToxRisk. https://www.eu-toxrisk.eu/page/en/about-eu-toxrisk.php. Accessed 6 Nov 2022.
International Organization for Standardization I. ISO - ISO/TC 229 - Nanotechnologies. 2011. https://www.iso.org/committee/381983.html. Accessed 6 Nov 2022.
Organisation for Economic Co-operation and, Development O. AOP knowledge base 2021. https://aopkb.oecd.org/. Accessed 6 Nov 2022.
Burden N, Aschberger K, Chaudhry Q, Clift MJD, Doak SH, Fowler P, et al. The 3Rs as a framework to support a 21st century approach for nanosafety assessment. Nano Today. 2017;12:10–3.
Ram RN, Gadaleta D, Allen TEH. The role of ‘big data’ and ‘in silico’ New Approach Methodologies (NAMs) in ending animal use – A commentary on progress. Comput Toxicol. 2022;23.
Schwarz-Plaschg C, Kallhoff A, Eisenberger I. Making Nanomaterials Safer by Design? NanoEthics. 2017;11:277 – 81.
Puzyn T, Rasulev B, Gajewicz A, Hu X, Dasari TP, Michalkova A, et al. Using nano-QSAR to predict the cytotoxicity of metal oxide nanoparticles. Nat Nanotechnol. 2011;6:175–8.
González-Durruthy M, Werhli AV, Seus V, Machado KS, Pazos A, Munteanu CR et al. Decrypting Strong and Weak Single-Walled Carbon Nanotubes Interactions with Mitochondrial Voltage-Dependent Anion Channels Using Molecular Docking and Perturbation Theory. Sci Rep. 2017;7.
Toropova AP, Toropov AA, Nanomaterials. Quasi-SMILES as a flexible basis for regulation and environmental risk assessment. Sci Total Environ. 2022;823.
Gajewicz A, Puzyn T, Computational, Nanotoxicology. Challenges and Perspectives:CRC Press; 2019.
Kar S, Gajewicz A, Puzyn T, Roy K, Leszczynski J. Periodic table-based descriptors to encode cytotoxicity profile of metal oxide nanoparticles: a mechanistic QSTR approach. Ecotoxicol Environ Saf. 2014;107:162–9.
Furxhi I, Murphy F, Mullins M, Arvanitis A, Poland CA. Practices and trends of machine learning application in nanotoxicology. J Nanomater. 2020;10.
Singh AV, Ansari MHD, Rosenkranz D, Maharjan RS, Kriegel FL, Gandhi K, et al. Artificial Intelligence and Machine Learning in Computational Nanotoxicology: Unlocking and Empowering Nanomedicine. Adv Healthc Mater. 2020;9:1901862.
Roy K. Advances in QSAR modeling. Applications in Pharmaceutical, Chemical, Food, Agricultural and Environmental Sciences. Volume 555, 1st ed. Cham, Switzerland: Springer; 2017. p. 39.
Shin HK, Kim S, Yoon S. Use of size-dependent electron configuration fingerprint to develop general prediction models for nanomaterials. NanoImpact. 2021;21.
Toropova AP, Meneses J, Alfaro-Moreno E, Toropov AA. The system of self-consistent models based on quasi-SMILES as a tool to predict the potential of Nano-inhibitors of human lung carcinoma cell line A549 for different experimental conditions. Drug Chem Toxicol. 2023;1–8.
Ambure P, Balasaheb Aher R, Puzyn T, Roy K. NanoBRIDGES” software: open access tools to perform QSAR and nano-QSAR modeling. Chemometr Intell Lab Syst. 2015;147:1–13.
Brownlee J. Data Preparation for Machine learning: data cleaning, feature selection, and Data Transforms in Python. 1st ed. Machine Learning Mastery; 2020.
Yeo I-k, Johnson RA. A new family of power transformations to improve normality or symmetry. Biometrika. 2000;87:954–9.
Gramatica P. Principles of QSAR models validation: internal and external. QSAR Comb Sci; 2007.
Cai J, Luo J, Wang S, Yang S. Feature selection in machine learning: a new perspective. Neurocomputing. 2018;300:70–9.
Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30.
Directorate OE, ENV/JM/MONO. (2007)2 2 OECD Environment Health and Safety Publications Series on Testing and Assessment No. 69 GUIDANCE DOCUMENT ON THE VALIDATION OF (QUANTITATIVE) STRUCTURE-ACTIVITY RELATIONSHIP [(Q)SAR] MODELS. 2007.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: Data Mining, Inference, and Prediction. 2nd ed. Springer: New York;; 2009.
Roy K, Das RN, Ambure P, Aher RB. Be aware of error measures. Further studies on validation of predictive QSAR models. Chemometr Intell Lab Syst. 2016;152:18–33.
Roy K, Kar S, Ambure P. On a simple approach for determining applicability domain of QSAR models. Chemometr Intell Lab Syst. 2015;145:22–9.
Krawczyk B. Learning from imbalanced data: open challenges and future directions. Prog Artif Intell. 2016;5:221–32.
Linardatos P, Papastefanopoulos V, Kotsiantis S, Explainable AI. A review of machine learning interpretability methods. Entropy. 2020;23:18.
Fjodorova N, Novič M. Integration of QSAR and SAR methods for the mechanistic interpretation of predictive models for carcinogenicity. Comput Struct Biotechnol J. 2012;1:e201207003.
De P, Kar S, Ambure P, Roy K. Prediction reliability of QSAR models: an overview of various validation tools. Arch Toxicol. 2022;96:1279–95.
Roy J, Roy K. Risk assessment and data gap filling of toxicity of metal oxide nanoparticles (MeOx NPs) used in nanomedicines: a mechanistic QSAR approach. Environ Sci Nano. 2022;9:3456–70.
Subramanian NA, Palaniappan A, NanoTox. Development of a parsimonious in Silico Model for Toxicity Assessment of Metal-Oxide Nanoparticles using Physicochemical features. ACS Omega. 2021;6:11729–39.
Thwala MM, Afantitis A, Papadiamantis AG, Tsoumanis A, Melagraki G, Dlamini LN, et al. Using the Isalos platform to develop a (Q)SAR model that predicts metal oxide toxicity utilizing facet-based electronic, image analysis-based, and periodic table derived properties as descriptors. Struct Chem. 2022;33:527–38.
Hahn A, Fuhlrott J, Loos A, Barcikowski S. Cytotoxicity and ion release of alloy nanoparticles. J Nanopart Res. 2012;14.
Wang Y, Cai R, Chen C. The Nano-Bio interactions of nanomedicines: understanding the biochemical driving forces and redox reactions. Acc Chem Res. 2019;52:1507–18.
Wang X, Cui X, Zhao Y, Chen C. Nano-bio interactions: the implication of size-dependent biological effects of nanomaterials. Sci China Life Sci. 2020;63:1168–82.
Batool F, Iqbal MS, Khan SUD, Khan J, Ahmed B, Qadir MI. Biologically synthesized iron nanoparticles (FeNPs) from Phoenix dactylifera have anti-bacterial activities. Sci Rep. 2021;11.
González-Durruthy M, Castro M, Nunes SM, Ventura-Lima J, Alberici LC, Naal Z, et al. QSPR/QSAR-based Perturbation Theory approach and mechanistic electrochemical assays on carbon nanotubes with optimal properties against mitochondrial Fenton reaction experimentally induced by Fe2+-overload. Carbon. 2017;115:312–30.
Toyokuni S. Iron-induced carcinogenesis: the role of redox regulation. Free Radic Biol Med. 1996;20:553–66.
Toyokuni S. Iron as a target of chemoprevention for longevity in humans. Free Radic Res. 2011;45:906–17.
Kuban-Jankowska A, Gorska M, Jaremko L, Jaremko M, Tuszynski JA, Wozniak M. The physiological concentration of ferrous iron (II) alters the inhibitory effect of hydrogen peroxide on CD45, LAR and PTP1B phosphatases. Biometals. 2015;28:975–86.
Uchiyama A, Kim JS, Kon K, Jaeschke H, Ikejima K, Watanabe S, et al. Translocation of iron from lysosomes into mitochondria is a key event during oxidative stress-induced hepatocellular injury. J Hepatol. 2008;48:1644–54.
Srivastava S, Kumar A. Comparative cytotoxicity of nanoparticles and ions to Escherichia coli in binary mixtures. Res J Environ Sci. 2017;55:11–9.
Sengul AB, Asmatulu E. Toxicity of metal and metal oxide nanoparticles: a review. Environ Chem Lett. 2020;18:1659–83.
Gliga AR, Skoglund S, Odnevall Wallinder I, Fadeel B, Karlsson HL. Size-dependent cytotoxicity of silver nanoparticles in human lung cells: The role of cellular uptake, agglomeration and Ag release. Part Fibre Toxicol. 2014;11.
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3(1):160018.
DeLoid GM, Cohen JM, Pyrgiotakis G, Pirela SV, Pal A, Liu J, et al. Advanced computational modeling for in vitro nanomaterial dosimetry. Part Fibre Toxicol. 2015;12(1):1–20.
Kowoll T, Fritsch-Decker S, Diabaté S, Nienhaus GU, Gerthsen D, Weiss C. Assessment of in vitro particle dosimetry models at the single cell and particle level by scanning electron microscopy. J Nanobiotechnol. 2018;16:1–5.
Cohen JM, Teeguarden JG, Demokritou P. An integrated approach for the in vitro dosimetry of engineered nanomaterials. Part Fibre Toxicol. 2014;11:1–12.
This research was funded by the European Union’s H2020 project Sinfonia (N.857253). SbDToolBox, with reference NORTE-01-0145-FEDER-000047, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund.
Ethics approval and consent to participate
Consent for publication
The authors declare no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material
Additional file 1. Table S1
Original dataset. Table S2 List and description of calculated nano-descriptors. Table S3 Dataset with calculated nano-descriptors. Table S4 Dataset for modelling after the filtering step. Table S5 Unseen subset after the filtering step. Table S6 Training and test data after nano-descriptors selection. Table S7 Dataset assuming data normalization by ENMs surface area. Table S8 Applicability domain of the proposed ML nano-QSTR model.
Additional file 2. Figure S1
ENMs diameter (nm) frequency distribution (a) before and (b) after the filtering step. Figure S2 Cell viability (%) frequency distribution (a) before and (b) after the filtering step. Figure S3 General representation of data normalization and transformation pre-processing steps for ENMs diameter and concentration. Figure S4 Dataset overview. (a), (b), and (c) Diversity of a set of nano-descriptors obtained from the Elemental descriptor calculator; (e), (f), and (g) Cell Viability (%) variation trend with representative examples of each nano-descriptor. Figure S5 Learning curves for ET nano-QSTR model for training and validation sets according to (a) determination coefficient (R2) and (b) root-mean-square error (RMSE). Table S1 ET nano-QSTR model performance for training, validation, test, and unseen sets assuming data normalization by ENMs surface area.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Meneses, J., González-Durruthy, M., Fernandez-de-Gortari, E. et al. A Nano-QSTR model to predict nano-cytotoxicity: an approach using human lung cells data. Part Fibre Toxicol 20, 21 (2023). https://doi.org/10.1186/s12989-023-00530-0
- Engineered nanomaterials
- Computational nanotoxicology
- Machine learning
- Lung nano-cytotoxicity
- A549 cell line