Gene Expression Prediction
USING MACHINE LEARNING
Logistic regression
Logistic regression is a data analysis technique that uses mathematics to find the
relationships between two data factors. It then uses this relationship to predict the
value of one of those factors based on the other. The prediction usually has a finite
number of outcomes, like yes or no.
K-Nearest Neighbor (KNN)
The K-Nearest Neighbor (KNN) algorithm is a popular machine learning
technique used for classification and regression tasks. It relies on the idea that
similar data points tend to have similar labels or values. During the training phase,
the KNN algorithm stores the entire training dataset as a reference.
Support Vector Machines (SVM)
SVM works by mapping data to a high-dimensional feature space so that data
points can be categorized, even when the data are not otherwise linearly
separable. A separator between the categories is found, then the data are
transformed in such a way that the separator could be drawn as a hyperplane.
Following this, characteristics of new data can be used to predict the group to
which a new record should belong.
Random Forest
Random Forest is a supervised learning algorithm that works on the concept of
bagging. In bagging, a group of models is trained on different subsets of the
dataset, and the final output is generated by collating the outputs of all the
different models. In the case of random forest, the base model is a decision tree.
Naive Bayes
Naive Bayes is a classification technique that is based on Bayes' Theorem with an
assumption that all the features that predicts the target value are independent of
each other. It calculates the probability of each class and then pick the one with
the highest probability.
Precision
Precision is one indicator of a machine learning model's performance – the quality
of a positive prediction made by the model. Precision refers to the number of true
positives divided by the total number of positive predictions
Recall
Recall, also known as the true positive rate (TPR), is the percentage of data
samples that a machine learning model correctly identifies as belonging to a class
of interest—the “positive class”—out of the total samples for that class.
F1 SCORE
F1 score is a machine learning evaluation metric that measures a model's
accuracy. It combines the precision and recall scores of a model. The accuracy
metric computes how many times a model made a correct prediction across the
entire dataset.
Support
Support is the number of actual occurrences of the class in the specified dataset.
Imbalanced support in the training data may indicate structural weaknesses in the
reported scores of the classifier and could indicate the need for stratified sampling
or rebalancing.
Accuracy
Accuracy score in machine learning is an evaluation metric that measures the
number of correct predictions made by a model in relation to the total number of
predictions made. We calculate it by dividing the number of correct predictions by
the total number of predictions.
Macro Average
The macro-averaged F1 score (or macro F1 score) is computed using the
arithmetic mean (aka unweighted mean) of all the per-class F1 scores. This
method treats all classes equally regardless of their support values.
Weighted Average
Weighted average or weighted sum ensemble is an ensemble machine learning
approach that combines the predictions from multiple models, where the
contribution of each model is weighted proportionally to its capability or skill.
The weighted average ensemble is related to the voting ensemble.
THANK YOU