0% found this document useful (0 votes)
49 views14 pages

Gene Expression Prediction Guide

The document discusses several machine learning techniques for gene expression prediction including logistic regression, K-nearest neighbors, support vector machines, random forest, and naive Bayes. It then defines key performance metrics used to evaluate machine learning models such as precision, recall, F1 score, support, accuracy, macro average, and weighted average.

Uploaded by

rafsanvalorant7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views14 pages

Gene Expression Prediction Guide

The document discusses several machine learning techniques for gene expression prediction including logistic regression, K-nearest neighbors, support vector machines, random forest, and naive Bayes. It then defines key performance metrics used to evaluate machine learning models such as precision, recall, F1 score, support, accuracy, macro average, and weighted average.

Uploaded by

rafsanvalorant7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Gene Expression Prediction

USING MACHINE LEARNING


Logistic regression

 Logistic regression is a data analysis technique that uses mathematics to find the
relationships between two data factors. It then uses this relationship to predict the
value of one of those factors based on the other. The prediction usually has a finite
number of outcomes, like yes or no.
K-Nearest Neighbor (KNN)

 The K-Nearest Neighbor (KNN) algorithm is a popular machine learning


technique used for classification and regression tasks. It relies on the idea that
similar data points tend to have similar labels or values. During the training phase,
the KNN algorithm stores the entire training dataset as a reference.
Support Vector Machines (SVM)

 SVM works by mapping data to a high-dimensional feature space so that data


points can be categorized, even when the data are not otherwise linearly
separable. A separator between the categories is found, then the data are
transformed in such a way that the separator could be drawn as a hyperplane.
Following this, characteristics of new data can be used to predict the group to
which a new record should belong.
Random Forest

 Random Forest is a supervised learning algorithm that works on the concept of


bagging. In bagging, a group of models is trained on different subsets of the
dataset, and the final output is generated by collating the outputs of all the
different models. In the case of random forest, the base model is a decision tree.
Naive Bayes

 Naive Bayes is a classification technique that is based on Bayes' Theorem with an


assumption that all the features that predicts the target value are independent of
each other. It calculates the probability of each class and then pick the one with
the highest probability.
Precision

 Precision is one indicator of a machine learning model's performance – the quality


of a positive prediction made by the model. Precision refers to the number of true
positives divided by the total number of positive predictions
Recall

 Recall, also known as the true positive rate (TPR), is the percentage of data
samples that a machine learning model correctly identifies as belonging to a class
of interest—the “positive class”—out of the total samples for that class.
F1 SCORE

 F1 score is a machine learning evaluation metric that measures a model's


accuracy. It combines the precision and recall scores of a model. The accuracy
metric computes how many times a model made a correct prediction across the
entire dataset.
Support

 Support is the number of actual occurrences of the class in the specified dataset.
Imbalanced support in the training data may indicate structural weaknesses in the
reported scores of the classifier and could indicate the need for stratified sampling
or rebalancing.
Accuracy

 Accuracy score in machine learning is an evaluation metric that measures the


number of correct predictions made by a model in relation to the total number of
predictions made. We calculate it by dividing the number of correct predictions by
the total number of predictions.
Macro Average

 The macro-averaged F1 score (or macro F1 score) is computed using the


arithmetic mean (aka unweighted mean) of all the per-class F1 scores. This
method treats all classes equally regardless of their support values.
Weighted Average

 Weighted average or weighted sum ensemble is an ensemble machine learning


approach that combines the predictions from multiple models, where the
contribution of each model is weighted proportionally to its capability or skill.
The weighted average ensemble is related to the voting ensemble.
THANK YOU

You might also like