0% found this document useful (0 votes)
54 views10 pages

Scikit Learn

The document lists over 100 important operations in Scikit-learn organized into categories including general operations, preprocessing, supervised and unsupervised learning algorithms, model selection and evaluation, pipelines, feature extraction/selection, and more. It provides the sklearn functions for loading datasets, splitting data, scaling features, encoding labels, fitting and evaluating various classifiers and regressors, dimensionality reduction, feature selection, pipelines, and other common machine learning tasks.

Uploaded by

sairamesht
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
54 views10 pages

Scikit Learn

The document lists over 100 important operations in Scikit-learn organized into categories including general operations, preprocessing, supervised and unsupervised learning algorithms, model selection and evaluation, pipelines, feature extraction/selection, and more. It provides the sklearn functions for loading datasets, splitting data, scaling features, encoding labels, fitting and evaluating various classifiers and regressors, dimensionality reduction, feature selection, pipelines, and other common machine learning tasks.

Uploaded by

sairamesht
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 10

#_ important Scikit-learn Operations [ +100 ]

General Operations:

● sklearn.datasets.load_iris(): Load the iris dataset.


● sklearn.datasets.load_digits(): Load the hand-written digits
dataset.
● sklearn.model_selection.train_test_split(): Split datasets into
training and testing subsets.

Preprocessing:

● sklearn.preprocessing.StandardScaler(): Standardize features by


removing the mean and scaling to unit variance.
● sklearn.preprocessing.MinMaxScaler(): Transform features by scaling
them to a given range.
● sklearn.preprocessing.LabelEncoder(): Encode labels with value
between 0 and n_classes-1.
● sklearn.preprocessing.OneHotEncoder(): Convert categorical
variable(s) into dummy/indicator variables.

Supervised Learning Algorithms:

Linear Models:

● sklearn.linear_model.LinearRegression(): Ordinary least squares


linear regression.
● sklearn.linear_model.LogisticRegression(): Logistic regression
(classification).
● sklearn.linear_model.Ridge(): Linear least squares with l2
regularization.

Support Vector Machines (SVM):

● sklearn.svm.SVC(): C-Support Vector Classification.


● sklearn.svm.SVR(): Epsilon-Support Vector Regression.

By: Waleed Mousa


Nearest Neighbors:

● sklearn.neighbors.KNeighborsClassifier(): Classifier implementing


the k-nearest neighbors vote.
● sklearn.neighbors.KNeighborsRegressor(): Regression based on
k-nearest neighbors.

Gaussian Processes:

● sklearn.gaussian_process.GaussianProcessRegressor(): Gaussian
process regression (GPR).
● sklearn.gaussian_process.GaussianProcessClassifier(): Gaussian
process classification (GPC).

Decision Trees:

● sklearn.tree.DecisionTreeClassifier(): Decision tree classifier.


● sklearn.tree.DecisionTreeRegressor(): Decision tree regressor.

Ensemble Methods:

● sklearn.ensemble.RandomForestClassifier(): Random forest classifier.


● sklearn.ensemble.RandomForestRegressor(): Random forest regressor.
● sklearn.ensemble.GradientBoostingClassifier(): Gradient boosting
classifier.
● sklearn.ensemble.GradientBoostingRegressor(): Gradient boosting
regressor.

Neural Network Models:

● sklearn.neural_network.MLPClassifier(): Multi-layer perceptron


classifier.
● sklearn.neural_network.MLPRegressor(): Multi-layer perceptron
regressor.

Unsupervised Learning Algorithms:

Clustering:

● sklearn.cluster.KMeans(): K-Means clustering.

By: Waleed Mousa


● sklearn.cluster.DBSCAN(): Density-based spatial clustering of
applications with noise.
● sklearn.cluster.AgglomerativeClustering(): Agglomerative
clustering.

Dimensionality Reduction:

● sklearn.decomposition.PCA(): Principal component analysis.


● sklearn.decomposition.NMF(): Non-negative matrix factorization.
● sklearn.manifold.TSNE(): t-distributed Stochastic Neighbor
Embedding.

Model Selection and Evaluation:

● sklearn.model_selection.cross_val_score(): Evaluate a score by


cross-validation.
● sklearn.model_selection.GridSearchCV(): Exhaustive search over
specified parameter values for an estimator.
● sklearn.model_selection.RandomizedSearchCV(): Randomized search on
hyperparameters.
● sklearn.metrics.accuracy_score(): Accuracy classification score.
● sklearn.metrics.mean_squared_error(): Mean squared error regression
loss.
● sklearn.metrics.confusion_matrix(): Compute confusion matrix to
evaluate the accuracy of a classification.
● sklearn.metrics.roc_curve(): Compute Receiver operating
characteristic (ROC).
● sklearn.metrics.auc(): Compute Area Under the Curve (AUC) from
prediction scores.

Pipeline:

● sklearn.pipeline.Pipeline(): Pipeline of transforms and a final


estimator.
● sklearn.pipeline.make_pipeline(): Construct a Pipeline from the
given estimators.

By: Waleed Mousa


Feature Extraction:

● sklearn.feature_extraction.text.CountVectorizer(): Convert a
collection of text documents to a matrix of token counts.
● sklearn.feature_extraction.text.TfidfVectorizer(): Convert a
collection of raw documents to a matrix of TF-IDF features.

Feature Selection:

● sklearn.feature_selection.SelectKBest(): Select features according


to the k highest scores.
● sklearn.feature_selection.RFE(): Feature ranking with recursive
feature elimination.

Imbalanced Datasets:

● sklearn.utils.class_weight.compute_class_weight(): Estimate class


weights for unbalanced datasets.

Decomposition:

● sklearn.decomposition.TruncatedSVD(): Dimensionality reduction


using truncated SVD (aka LSA).
● sklearn.decomposition.FastICA(): Fast algorithm for Independent
Component Analysis.

Manifold Learning:

● sklearn.manifold.Isomap(): Isomap embedding.


● sklearn.manifold.MDS(): Multi-dimensional scaling.

Dataset Transformations:

● sklearn.preprocessing.PolynomialFeatures(): Generate polynomial and


interaction features.
● sklearn.preprocessing.Binarizer(): Binarize data (set feature
values to 0 or 1) according to a threshold.

By: Waleed Mousa


Validation:

● sklearn.model_selection.StratifiedKFold(): Stratified K-Folds


cross-validator.
● sklearn.model_selection.LeaveOneOut(): Leave-One-Out
cross-validator.

Calibration:

● sklearn.calibration.CalibratedClassifierCV(): Probability
calibration with isotonic regression or logistic regression.

Semi-Supervised Learning:

● sklearn.semi_supervised.LabelPropagation(): Label Propagation


classifier.
● sklearn.semi_supervised.LabelSpreading(): Label Spreading
classifier.

Kernel Ridge Regression:

● sklearn.kernel_ridge.KernelRidge(): Kernel ridge regression.

Pairwise Metrics:

● sklearn.metrics.pairwise.cosine_similarity(): Compute cosine


similarity between samples in X and Y.

Discriminant Analysis:

● sklearn.discriminant_analysis.LinearDiscriminantAnalysis(): Linear
Discriminant Analysis.
● sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis():
Quadratic Discriminant Analysis.

Isolation Forest:

● sklearn.ensemble.IsolationForest(): Isolation Forest Algorithm.

By: Waleed Mousa


Naive Bayes:

● sklearn.naive_bayes.GaussianNB(): Gaussian Naive Bayes.


● sklearn.naive_bayes.MultinomialNB(): Multinomial Naive Bayes.

Cross Decomposition:

● sklearn.cross_decomposition.PLSRegression(): PLS regression.

Nearest Centroid Classifier:

● sklearn.neighbors.NearestCentroid(): Nearest centroid classifier.

Neural network utilities:

● sklearn.neural_network.BernoulliRBM(): Bernoulli Restricted


Boltzmann Machine.

Stochastic Gradient Descent:

● sklearn.linear_model.SGDClassifier(): Linear classifiers with SGD


training.
● sklearn.linear_model.SGDRegressor(): Linear model fitted by
minimizing a regularized empirical loss with SGD.

Multi-class and multi-label algorithms:

● sklearn.multiclass.OneVsRestClassifier(): One-vs-the-rest (OvR)


multiclass/multilabel strategy.

Multioutput regression:

● sklearn.multioutput.MultiOutputRegressor(): Multioutput regression.

Multiclass-multioutput algorithms:

● sklearn.multioutput.ClassifierChain(): Classifier Chain.

By: Waleed Mousa


Sparse coding:

● sklearn.decomposition.SparseCoder(): Sparse coding.

Covariance estimators:

● sklearn.covariance.EmpiricalCovariance(): Maximum likelihood


covariance estimator.

Gaussian Mixture Models:

● sklearn.mixture.GaussianMixture(): Gaussian Mixture.

Model Evaluation & Selection:

● sklearn.model_selection.permutation_test_score(): Permutation test


for score.

Cluster Biclustering:

● sklearn.cluster.bicluster.SpectralBiclustering(): Spectral
Biclustering.

Sparse PCA:

● sklearn.decomposition.SparsePCA(): Sparse Principal Components


Analysis (SparsePCA).

Voting regressor:

● sklearn.ensemble.VotingRegressor(): Voting regressor.

Bagging regressor:

● sklearn.ensemble.BaggingRegressor(): Bagging regressor.

Impute:

● sklearn.impute.SimpleImputer(): Basic imputation transformer.

By: Waleed Mousa


Checking:

● sklearn.utils.check_X_y(): Ensure X and y have compatible shapes.

Checking Estimators:

● sklearn.utils.estimator_checks.check_estimator(): Check if
estimator adheres to scikit-learn conventions.

Multilabel Binarizer:

● sklearn.preprocessing.MultiLabelBinarizer(): Transform between


iterable of iterables and a multilabel format.

Cross Decomposition:

● sklearn.cross_decomposition.CCA(): Canonical Correlation Analysis.

Loading datasets:

● sklearn.datasets.load_breast_cancer(): Load breast cancer dataset.


● sklearn.datasets.load_diabetes(): Load diabetes dataset.
● sklearn.datasets.load_linnerud(): Load Linnerud dataset.

Binarize labels:

● sklearn.preprocessing.label_binarize(): Binarize labels in a


one-vs-all fashion.

Metrics:

● sklearn.metrics.log_loss(): Logarithmic loss.


● sklearn.metrics.mean_absolute_error(): Mean absolute error
regression loss.
● sklearn.metrics.mean_squared_log_error(): Mean squared logarithmic
error regression loss.

By: Waleed Mousa


Partial dependence plots:

● sklearn.inspection.plot_partial_dependence(): Partial dependence


plots.

Unsupervised Neural Network:

● sklearn.neural_network.BernoulliRBM(): Bernoulli Restricted


Boltzmann Machine.

Load sample images:

● sklearn.datasets.load_sample_images(): Load sample images for image


manipulation.

Metrics:

● sklearn.metrics.precision_recall_curve(): Compute precision-recall


pairs for different probability thresholds.
● sklearn.metrics.average_precision_score(): Compute average
precision (AP) from prediction scores.

Checking:

● sklearn.utils.check_random_state(): Turn random state into a numpy


random number generator.

Output Code:

● sklearn.utils.murmurhash3_32(): Hash a Python object into a 32-bit


integer.

Metrics:

● sklearn.metrics.classification_report(): Build a text report


showing the main classification metrics.
● sklearn.metrics.cohen_kappa_score(): Cohen's kappa: a statistic
that measures inter-annotator agreement.

By: Waleed Mousa


● sklearn.metrics.confusion_matrix(): Compute confusion matrix to
evaluate the accuracy of a classification.
● sklearn.metrics.hinge_loss(): Compute (average) hinge loss.
● sklearn.metrics.matthews_corrcoef(): Compute the Matthews
correlation coefficient (MCC) for binary classes.

By: Waleed Mousa

You might also like