Identification Strategy of Cold and Hot Properties of Chinese Herbal Medicines Based on Artificial Intelligence and Biological Experiments

Identification Strategy of Cold and Hot Properties of Chinese Herbal Medicines Based on Artificial Intelligence and Biological Experiments

Chinese herbal medicines (CHMs) are a cornerstone of traditional Chinese medicine (TCM), with their cold and hot properties playing a pivotal role in guiding clinical applications and ensuring therapeutic efficacy. These properties reflect the inherent therapeutic tendencies of medicinal substances, influencing the balance between yin and yang and the thermal states within the human body. However, traditional methods for identifying these properties rely heavily on the subjective experience of TCM practitioners, leading to uncertainty and inconsistency. This underscores the need for more precise, rapid, and objective strategies to accurately identify the cold and hot properties of CHMs and their active ingredients.

To address these challenges, an innovative identification strategy was proposed, integrating artificial intelligence (AI) models with biological experiments. This combined approach aims to enhance the accuracy and objectivity of herbal property identification by providing a data-driven systematic framework for evaluating the cold and hot properties of CHMs. Furthermore, it offers insights into how these properties influence energy metabolism, thereby supporting the modernization of CHM clinical applications and foundational research.

The study developed and validated an AI-based model to identify the cold and hot properties of CHMs, leveraging biological mechanisms associated with energy metabolism. The model was constructed using molecular fingerprinting techniques and validated through in vitro cell assays and gene expression analyses, providing both computational and experimental evidence to support its accuracy.

Initially, all CHMs labeled as either cold or hot from the 2020 edition of the Chinese Pharmacopoeia were selected, resulting in a dataset of 266 CHMs, including 148 with cold properties and 118 with hot properties. The molecules of the ingredients contained in these CHMs were screened from the Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform (TCMSP), totaling 5550 ingredients. The 167-dimensional vectors in the Maccskeys molecular fingerprinting algorithm were used to characterize each ingredient. The Maccskeys molecular fingerprint, developed by metadata lock (MDL), incorporates 166 features with a total length of 167 bits, where the 0th bit serves as a placeholder, and bits 1–166 correspond to molecular substructure features. The ingredient vector features from each CHM were fused and normalized, resulting in 167-dimensional vectorized features for each cold- or hot-propertied CHM. A random segmentation method was applied to divide these CHMs into training, validation, and test sets. The training dataset was utilized to build a model for identifying the cold and hot properties of CHMs, the test dataset was applied to assess the model’s performance, and the validation dataset was employed to rigorously validate the model’s accuracy and robustness.

Machine learning algorithms, including support vector machines (SVMs), extreme gradient boosting (XGBoost), deep neural networks (DNNs), and random forest (RF) plots, were used to construct identification models of the cold and hot properties of CHMs. The accuracy value was used to evaluate the performance and effectiveness of each model. Among the constructed identification models, the optimal model in terms of various indices was selected as the core algorithm. To avoid bias caused by internal dataset validation, an external validation dataset was constructed based on data mining, and 10 representative CHMs with hot and cold properties and their active ingredients were selected to validate the stability and accuracy of the model.

The results showed that the model constructed by the SVM algorithm had an accuracy value of 89.5%, while the accuracy values of the models were 93.3%, 90.2%, and 86.7%, respectively, for those constructed by RF, DNN, and XGBOOST algorithms. The model constructed by the optimal algorithm RF was selected and validated using an external dataset. The validation results and distribution of 10 representative cold and hot-propertied CHMs showed that the identification results about the properties of the 10 CHMs all fell well within the corresponding intervals. The proportion of seven hot CHMs correctly predicted to be in the positive category by the model was 92.8% among all samples that were actually in the positive category. The proportion of three cold CHMs correctly predicted as positive by the model was 95.4% in all the samples that were actually positive, verifying the high accuracy of the model.

Significant differences in molecular substructures between cold and hot-propertied CHMs were obtained in the validated dataset. The OCO, CH3AACH2A, CH3ACH2A, ACH2AACH2A, CH3 >2, CH2 = A, and CH3AAACH2A were the special molecular structures of CHMs. OCO, CH3AACH2A, and CH3ACH2A occupied a high proportion in special molecular structures of cold-propertied CHMs; CH3AACH2A, CH3ACH2A, and CH3AAACH2A were important parts in hot-propertied CHMs. These differences in molecular substructures characterized the differences in the cold and hot properties of CHMs at the level of material basis.

After completing the construction and validation of the model, the results of the properties of the validation set of hot and cold-propertied CHMs in the optimal model were used as references, and the active ingredients of typical hot and cold-propertied CHMs were selected for further biological validation. Cell counting kit-8 (CCK-8) cell proliferation assays were conducted to assess the effects of the active ingredients of cold- and hot-propertied CHMs on cell growth. Additionally, polymerase chain reaction (PCR) gene detection assays were performed to explore the regulatory effects of these ingredients on key energy metabolism targets, including monoamine oxidase A (MAOA), hydroxyacyl-coenzyme A (CoA) dehydrogenase trifunctional multienzyme complex subunit beta (HADHB), enoyl-CoA hydratase and 3-hydroxyacyl CoA (EHHADH), uncoupling protein1 (Ucp1), adenosine 5′-monophosphate (AMP)-activated protein kinase (AMPK), and cytochrome c oxidase subunit 5A (Cox-5a).

These experimental methods further verified the accuracy of the AI-based medicinal property identification model, highlighting the scientific rigor and practical applicability of the identification strategy. The optimal identification model for the hot and cold properties of CHMs was successfully constructed using the AI algorithm, and biological experiments were used to validate the predictions of the model. This strategy for identifying the hot and cold properties of CHMs by combining AI techniques with experiments on the biology of energy metabolism is scientifically significant and highly applicable.

Future development trends primarily include two aspects: the application of deep learning algorithms and multi-source data fusion. Deep learning has achieved remarkable results in the fields of natural language processing and image recognition and shows great potential in the field of identifying the cold and hot properties of CHMs. By training models on large amounts of CHMs data, these models are expected to identify the cold and hot properties of CHMs more accurately. Meanwhile, combining multiple data sources, such as the chemical composition of CHMs, results of energy metabolism experiments, and clinical efficacy data, will improve the accuracy of identifying the cold and hot properties of CHMs. The fusion of data from multiple sources will help uncover the correlations between the properties of CHMs, thereby advancing our understanding of their cold and hot attributes.

Although progress has been made in identifying the cold and hot properties of CHMs based on AI and energy metabolism experiments, there are still some problems with data quality and model generalization capability. High-quality data form the basis for training accurate models. However, the quality of current CHM data is inconsistent, with many datasets containing incomplete or erroneous information. Therefore, strict data standards and quality control mechanisms must be established to improve the accuracy of identifying the cold and hot properties of CHMs. Furthermore, while current AI models perform well on training datasets, their generalizability to new CHMs and unknown contexts needs further optimization. This necessitates ongoing improvements to enhance the models’ robustness in real-world applications.

In summary, the combination of AI algorithms and biological mechanism experiments provides an effective strategy for identifying the cold and hot properties of CHMs. With the continuous development of AI and experimental research on energy metabolism biology, TCM practitioners will be able to reference not only textual descriptions of the hot and cold properties of CHMs but also scientifically grounded, dynamic explanations. Further refinement of this approach and deeper exploration of the relationship between medicinal properties and energy metabolism will offer valuable scientific insights for the future research and clinical application of CHMs.

doi.org/10.1097/CM9.0000000000003509

Was this helpful?

0 / 0