Applications of Deep Learning and Machine Learning in Computational Medicine

Computational medicine has emerged due to the advances in medical technology in parallel with big data and artificial intelligence. A new way of treating complex diseases is evolving called ‘Precision Medicine’ fueled by big data extracting meaningful information from individual variability. At the forefront is biomedical research aiming to promote the area of precision medicine. Though traditional machine learning methods have built successful models for cancer diagnosis to sars-cov2 pulmonary infection, the advent of modern deep learning methods has had phenomenal growth in genomics, electronic health records, and drug development. The challenges in Deep learning applications in medicine include lack of data, privacy, heterogeneity of data, and interpretability. Analysis and discussion on these problems provide a reference to improve the application of deep learning in medical health.


Introduction
Computational Medicine comprises many interdisciplinary subjects such as medicine, biology, mathematics, and computer science.Artificial intelligence methods are applied to understand disease mechanisms in humans by utilizing big data analytics to help in disease prediction and guiding clinical decision-making.
The pharmaceutical industry has particularly benefitted from its utility in drug discovery and clinical research including strategizing in the early and successful development of new drugs.Computational medicine can cut down the time of drug development to an average of one to two years.The field of intelligent computing is transforming health care delivery and medical practice.

Figure 1. Deep learning Applications in Computational Medicine
The key challenges in effective data mining of biomedical information are that they are often multi-dimensional which implies that pre-processing involves primarily reducing the dimension.The datasets are also small and interspersed with noise, making them un-interpretable and difficult to process.It is desirable to consider the criterion while developing protocols and methods for handling biomedical data.Deep learning as a branch of machine learning has been fraught with superior abilities in training large datasets for artificial intelligence (Han et al., 2018).
Several authors have developed effective methods of deep learning in areas like computer vision (He et al., 2016), Speech recognition (Abdel-Hamid et al., 2014), and natural language processing (NLP).In the traditional machine learning domain and experts extract features from datasets referred to as 'feature engineering' and utilize the features to construct machine learning models for analysis.Common methods like random forest and support vector machine are applied for the analysis.The manual extraction of data may lead to bias which is inefficient at performing tasks and failure in achieving a high-performance model.
Deep learning or deep neural networks (Goodfellow et al., 2014;LeCun et al., 2015;Goodfellow et al., 2016) has several success stories of artificial intelligence.Inspired by neuroscience, the architectural principle of Deep Neural Networks (DNN) and Analog Neural networks (ANN) have been used which primarily involve non-linear layers of nodes similar to the working in the brain (Fukushima et al., 1988;Riesenhuber and Poggio, 1999;Schmidhuber et al., 2015;He et al., 2016;Silver et al., 2016).
Neural networks comprise several layers of neurons with connections between the layers to be able to convert the input to the output.The high-level network enables faster and more advanced learning which traditional machine learning is unable to handle efficiently.A recent trend is to apply deep learning to biomedical data analysis as it is one of the advanced deep learning methods.Figure 1 represents the processes and outcomes of using biomedical data.However, the challenges faced by data analysts are that the data is not easily accessible for processing and not easily processible since data is unclean.If these constraints were met what are the applications of deep learning in the upcoming field of Genomics, Drug-development and Electronic Health Records?

Deep Learning Applications in Genomics
The study of structural and functional aspects of the gene is Genomics.Other applications of genomics are genomic editing which involves targeting and modification of genomic sequences.Genome editing for genetic deficiency disorders to improve public health has been carried out under stringent guidelines set by various scientific eg.WHO (WHO guidelines, 2022).Gene expression studies are also popular to identify various organisms which live symbiotically in the human body as well as diseasecausing organisms.The field of genomics has under its code hidden patterns which require advanced deep learning methods to extract meaningful information.Biological databases are at the forefront providing vital information free to the public eg.NCBI, EMBL, etc.The following paragraph explains a meta-analysis presented using publicly available data in NCBI (GEO dataset).

Meta-Analysis of the SARS-CoV2 Dataset
The unsupervised learning method (kNN and SOM) was applied to the dataset from GEO (NCBI) for classifying the dataset without labeling.The pulmonary tissue infected with the SARS-CoV2 dataset was available.Remarkable heterogeneity in expression levels of RNA as well as the spatial location has been reported (Desai et al., 2020).Neural network models have become powerful areas for development in the area of machine learning.Using SOM maps are popular algorithms for reducing dimensions with neural networks which can directly be compared.
Method: Gene expression data were downloaded from the GEO database of NCBI, using the following search: ("SARS CoV-2" AND "expression profiling" AND "spatial heterogeneity").The following datasets were obtained having accession GSE159787 and GSE159785.726 sample datasets were available in a set of two experiments with autopsy tissue from patients who had succumbed to SARS-CoV2.Gene expression of immune response genes was studied across the spectrum of high and low viral load regions of the tissue.About 48 segments were available within the same patient from a total of six infected patients which was useful in studying the heterogeneity within the tissue (Desai et al., 2020).
The dataset contained expression data from six infected patients which were collected from cadaver patients and stored freezedried.Powerful machine learning methods, hold great promise to provide biological insights from large and often heterogeneous data.Two popular deep learning methods, convolutional neural network (CNN) and recurrent neural network (RNN), are used in genomics.Every complex question in biology will need specific machine learning approaches, e.g., support vector machine (SVM) and random forest, and several other combinations.
Deep learning in genomics helps in reducing high-dimensional features into easily interpretable information and unraveling hidden features in human disease.Other genomic analyses where deep learning has benefitted example binding site prediction of RNA binding proteins (Zeng et al., 2016), gene expression abundance studies (Washburn et al., 2019;Agarwal & Shendure, 2020).Use of CNN in predicting functional activity from genomic data.To understand yeast gene expression using microarrays, Chen et al. (2016) used autoencoders (Chen et al., 2016).Others have used Deep CNN model with success in understanding genomics (Zhou et al., 2018;Yuan & Joseph, 2019;Gao et al., 2020).A combination of one-dimensional CNN, RNN has been very popular in extracting meaningful information.While the former extracts feature reducing the dimension the latter has been used to discover patterns from genomic sequence data.NLP has also been applied to gather data in innovative ways.Multimodal learning is currently popular using data from varied sources offering unique possibilities in genomic data mining.To take an example of combining gene sequence data with electronic health records and imaging records.
Major pitfalls in genomic data mining are insufficient data for efficient learning.If data is insufficient, it is often difficult to find a suitable deep-learning model for the task.This may lead to poor outcomes.For effective training to take place the dataset must be large.
Guidelines to solve the problem of lack of data in genomics: 1.To use data enhancement to expand the data layer and sample size.2.Use dropout methods to improve model ability and enhance model performance.3. To use transfer technology to train the model on a larger dataset that is unrelated and to reuse it on the desired set with adjusted parameters.

Electronic Health Records (EHR)
Similar to genomics prediction, Computational Medicine uses the following neural networks for processing one-dimensional CNN, RNN, GRU, and LSTM along with NLP.
Storage of electronic health records is generally in a format that is structured as well as unstructured.The former includes treatment information on patient demographic and diagnosis information as well as laboratory test results (Jensen et al., 2012).
One of the advantages of mining electronic health records is the delivery of timely treatment by early prediction of disease onset.It can also analyze disease-to-drug relationships which can be vital to the clinician in decision-making.
The records are generally subject-coded using medical terminology called ontologies.Such terms are extensively used to describe conditions, drugs, or even diagnostic processes.An umbrella code for the entire medical concept is a highly specialized field beyond the realm of data scientists to simplify the relationship required to code the condition of the patient.Another difficulty is extracting meaningful relationships that reflect the concept in the code.
Research in the area is geared towards mapping clinical terminology to the relevant code to convert the high dimensionality into low dimension and transform them for embedding.A complex relationship is encountered when determining patient mortality and readmission which is an indicator of quality care in hospitals.Deep learning methods can represent correctly the non-linear relationship through hidden layers of neural networks (Miotto et al., 2016).

Drug Development
With drug development reaching new heights, deep learning has immense application in the field of drug research.With typical drug development taking approx.10 years is a very resourceful and time-consuming process.Deep learning can help in studying the interaction between a drug and its target.This step can help cut down the time signature of bringing the drug to the market.Prediction of binding affinity between drug and target (Wen et al., 2017;Wang et al., 2018;Hu et al., 2019) has been possible with the use of autoencoders to study drug-target interaction.Others used LSTM on the feature space to predict the interaction between the drug and the target.Zhang et al. (2020) developed a deeplearning model for target recognition and drug reuse by learning low-dimensional vector representations of drugs and targets.
A unique method of combining sequence data with ligand fingerprints was proposed using CNN and graph neural network (Tsubaki, et al., 2019).It excelled when compared to techniques like kNN, random forest, and SVM.The databases for extracting information from drug compounds are DrugBank, KEGG, and PubChem.LSTM is a special recurrent neural network (RNN) giving superior results as compared to previous methods since it uses memory blocks for storing temporal states.
Table 1.Models developed and the author's contribution is listed.
Long Short-Term Memory (LSTM) Learning based on a single-layer decision tree (Rajkomar et al., 2018) 6. Time-aware neural network models The deep dynamic memory model (Pham et al., 2016) 7. Gated Recurrent Unit (GRU) For accurate segmentation of 3D data by applying masking & time-interval (Che et al., 2018) Challenges Some of the challenges to Deep learning in Computational medicine and certain solutions are discussed here.Learning is not simple and sufficient data is essential for efficient Deep learning.
In a healthcare setting medical datasets need to be handled differently as the patient number and the disease category are unevenly matched.
To solve the model's poor generalization ability, the dropout method is sometimes used.Other methods include data enhancement which is achieved by translation, clipping and scaling for new image generation for creating a robust model.
Transfer learning is also a useful method where data learning happens in another related task with sufficient data with model parameters being fine-tuned before finally utilizing it (Shin et al., 2016).Data integration is used by some workers to improve the model's ability (Dai et al., 2018).
Another logical solution is combining domain expertise knowledge with image information for efficient training of deep learning.
Model Interpretability is an important issue to translate the work to the clinician and be able to make a sound clinical decision based on the prediction and patient satisfaction.
The interpretability of models is very important.Because if a model can provide sufficient and reliable information, physicians will trust the results of the model leading to correct and appropriate decisions; at the same time, an interpretable model can also provide a comprehensive understanding of the progress of patients.

Conclusion
Discussion of various deep learning methods which may be used in computational medicine, among them the prominent field are genomics, electronic health record, and drug development.
Biomedical data mining can be of immense use in clinical decision-making if the methods are reliable, interpretable, and standardized.Ideas for improvement were provided for computational researchers in the development of models in the health field.

Figure 2 .
Figure 2. Proportion of type of infected tissue used in the analysis

Figure 3 .Figure 4 .
Figure 3. Number of cells of each type segregated by segment type (weighted Venn)

Table 2 .
Drug development Models developed using Deep learning and their contributions are listed.