Deciphering Genetic Overlaps: A Comprehensive Study On Viral Host Determination Using Machine Learning And Deep Learning Models

Main Article Content

Pankaj Agarwal
Sapna Yadav

Abstract

The study uses machine learning and deep learning models to study the intricate relationship between viral genetic DNA sequences and host organisms. It uses a comprehensive dataset from databases like ExPASy and NCBI, which encodes crucial genetic information for viral replication.


The study aimed to create a viral DNA dataset and develop robust machine learning and deep learning models to classify viruses into eight host categories. Despite extensive experimentation using various models, performance improvement was elusive due to genetic overlaps. Viral genomes from different classes had significant shared genetic sequences, making it difficult for these models to identify unique class-specific features, blurring the lines of differentiation.


The study reduced the number of classes from eight to three, focusing on plants, animals, and microorganisms. This resulted in improved evaluation metrics, with the Random Forest Machine learning model reaching a maximum accuracy of 70% and the LSTM deep learning model surpassing 85%, overcoming earlier challenges.


The discovery that viral genomes from different classes share significant genetic overlaps challenges conventional molecular distinctions, emphasizing the complexity of molecular differentiation in viral genomes. This pragmatic approach aligns molecular understanding with genetic data in viral host determination.

Downloads

Download data is not yet available.

Article Details

How to Cite
Pankaj Agarwal, & Sapna Yadav. (2024). Deciphering Genetic Overlaps: A Comprehensive Study On Viral Host Determination Using Machine Learning And Deep Learning Models. Journal of Advanced Zoology, 45(3), 776–788. https://doi.org/10.53555/jaz.v45i3.4451
Section
Articles
Author Biographies

Pankaj Agarwal

K.R Mangalam University, Gurgaon

Sapna Yadav

Jamia Millia Islamia, Delhi

References

Zheng, N., Wang, K., Zhan, W., & Deng, L. (2018). Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches. Current Drug Metabolism, 20(3), 177–184. https://doi.org/10.2174/1389200219666180829121038

Cho, Sung-Bae & Won, Hong-Hee. (2003). Machine Learning in DNA Microarray Analysis for Cancer Classification.. Proceedings of the First Asia-Pacific bioinformatics Conference. 34. 189-198.

Nguyen, N. G., Tran, V. A., Ngo, D. L., Phan, D., Lumbanraja, F. R., Faisal, M. R., Abapihi, B., Kubo, M., & Satou, K. (2016). DNA Sequence Classification by Convolutional Neural Network. Journal of Biomedical Science and Engineering, 09(05), 280–286.https://doi.org/10.4236/jbise.2016.95021

Tampuu, A., Bzhalava, Z., Dillner, J., & Vicente, R. (2019). ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples. PLoS ONE, 14(9), 1–17. https://doi.org/10.1371/journal.pone.0222271

Santoso, W., Hulliyah, K., Nurjannah, W., & Setianingrum, A. H. (2022). Systematic Literature Review: Virus Prediction Based on DNA Sequences using Machine Learning and Deep Learning method. 2022 10th International Conference on Cyber and IT Service Management, CITSM 2022, September, 1–7. https://doi.org/10.1109/CITSM56380.2022.9935921

Muflikhah, L., Rahman, M. A., & Widodo, A. W. (2022). Profiling DNA sequence of SARS-Cov-2 virus using machine learning algorithm. Bulletin of Electrical Engineering and Informatics, 11(2), 1037–1046. https://doi.org/10.11591/eei.v11i2.3487

Chaturvedi, A., Borkar, K., Priyakumar, D., & Vinod, P. K. (2023). PREHOST: Host prediction of coronaviridae family using machine learning. https://doi.org/10.1016/j.heliyon.2023.e13646

Kwon, E., Cho, M., Kim, H., & Son, H. S. (2019). A Study on Host Tropism Determinants of Influenza Virus Using Machine Learning. Current Bioinformatics, 15(2), 121–134. https://doi.org/10.2174/1574893614666191104160927

Xu, Yanhua, and Dominik Wojtczak. “Dive into Machine Learning Algorithms for Influenza Virus Host Prediction with Hemagglutinin Sequences.” BioSystems 220, no. August (2022): 104740. https://doi.org/10.1016/j.biosystems.2022.104740.

Salama, Mostafa A., Aboul Ella Hassanien, and Ahmad Mostafa. “The Prediction of Virus Mutation Using Neural Networks and Rough Set Techniques.” Eurasip Journal on Bioinformatics and Systems Biology 2016, no. 1 (2016): 1–11. https://doi.org/10.1186/s13637-016-0042-0.

Eng, Christine L.P., Joo Chuan Tong, and Tin Wee Tan. “Predicting Zoonotic Risk of Influenza a Viruses from Host Tropism Protein Signature Using Random Forest.” International Journal of Molecular Sciences 18, no. 6 (2017). https://doi.org/10.3390/ijms18061135.

Ghosh, Dibyendu, Srija Chakraborty, Hariprasad Kodamana, and Supriya Chakraborty. “Application of Machine Learning in Understanding Plant Virus Pathogenesis: Trends and Perspectives on Emergence, Diagnosis, Host-Virus Interplay and Management.” Virology Journal 19, no. 1 (2022): 1–11. https://doi.org/10.1186/s12985-022-01767-5.

Barman, R. K., Saha, S., & Das, S. (2014). Prediction of interactions between viral and host proteins using supervised machine learning methods. PLoS ONE. https://doi.org/10.1371/journal.pone.0112034

Qiang, X., Kou, Z., Fang, G., & Wang, Y. (2018). Scoring amino acid mutations to predict avian-to-human transmission of avian influenza viruses. Molecules. https://doi.org/10.3390/molecules23071584

Agor, J. K., & Özaltın, O. Y. (2018). Models for predicting the evolution of influenza to inform vaccine strain selection. In Human Vaccines and Immunotherapeutics. https://doi.org/10.1080/21645515.2017.1423152

M. Phute, A. Sahastrabudhe, S. Pimparkhede, S. Potphode, K. Rengade and S. Shilaskar, "A Survey on Machine Learning in Lithography," 2021 International Conference on Artificial Intelligence and Machine Vision (AIMV), Gandhinagar, India, 2021, pp. 1-6, doi: 10.1109/AIMV53313.2021.9670977.