Machine Learning
Machine Learning
In reinforcement learning, a computer program interacts with a dynamic environment in which it must
perform a certain goal (such as driving a vehicle),
without a teacher explicitly telling it whether it has
come close to its goal or not. Another example
is learning to play a game by playing against an
opponent.[2]:3
Between supervised and unsupervised learning is semisupervised learning, where the teacher gives an incomplete training signal: a training set with some (often
many) of the target outputs missing. Transduction is a
special case of this principle where the entire set of problem instances is known at learning time, except that part
of the targets are missing.
Overview
4.1
Theory
Association rule learning is a method for discovering interesting relations between variables in large databases.
The computational analysis of machine learning algorithms and their performance is a branch of theoretical
computer science known as computational learning theory. Because training sets are nite and the future is uncertain, learning theory usually does not yield guarantees
of the performance of algorithms. Instead, probabilistic bounds on the performance are quite common. The
biasvariance decomposition is one way to quantify gen- 4.4 Inductive logic programming
eralization error.
Main article: Inductive logic programming
In addition to performance bounds, computational learning theorists study the time complexity and feasibility of
learning. In computational learning theory, a computa- Inductive logic programming (ILP) is an approach to rule
tion is considered feasible if it can be done in polynomial learning using logic programming as a uniform representime. There are two kinds of time complexity results. tation for input examples, background knowledge, and
Positive results show that a certain class of functions can hypotheses. Given an encoding of the known background
be learned in polynomial time. Negative results show that knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hycertain classes cannot be learned in polynomial time.
pothesized logic program that entails all positive and no
There are many similarities between machine learning negative examples. Inductive programming is a related
theory and statistical inference, although they use dier- eld that considers any kind of programming languages
ent terms.
for representing hypotheses (and not only logic programming), such as functional programs.
Approaches
4 APPROACHES
Support vector machines (SVMs) are a set of related 4.9 Representation learning
supervised learning methods used for classication and
regression. Given a set of training examples, each marked Main article: Representation learning
as belonging to one of two categories, an SVM training
algorithm builds a model that predicts whether a new exSeveral learning algorithms, mostly unsupervised learnample falls into one category or the other.
ing algorithms, aim at discovering better representations
of the inputs provided during training. Classical examples include principal components analysis and cluster
4.6 Clustering
analysis. Representation learning algorithms often attempt to preserve the information in their input but transform it in a way that makes it useful, often as a preMain article: Cluster analysis
processing step before performing classication or predictions, allowing to reconstruct the inputs coming from
Cluster analysis is the assignment of a set of observations
the unknown data generating distribution, while not being
into subsets (called clusters) so that observations within
necessarily faithful for congurations that are implausible
the same cluster are similar according to some predesunder that distribution.
ignated criterion or criteria, while observations drawn
from dierent clusters are dissimilar. Dierent cluster- Manifold learning algorithms attempt to do so under
ing techniques make dierent assumptions on the struc- the constraint that the learned representation is lowture of the data, often dened by some similarity metric dimensional. Sparse coding algorithms attempt to do
and evaluated for example by internal compactness (simi- so under the constraint that the learned representation is
larity between members of the same cluster) and separa- sparse (has many zeros). Multilinear subspace learning
tion between dierent clusters. Other methods are based algorithms aim to learn low-dimensional representations
on estimated density and graph connectivity. Clustering is directly from tensor representations for multidimensional
a method of unsupervised learning, and a common tech- data, without reshaping them into (high-dimensional)
vectors.[15] Deep learning algorithms discover multiple
nique for statistical data analysis.
levels of representation, or a hierarchy of features, with
higher-level, more abstract features dened in terms of
(or generating) lower-level features. It has been argued
4.7 Bayesian networks
that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation
Main article: Bayesian network
that explain the observed data.[16]
A Bayesian network, belief network or directed acyclic
graphical model is a probabilistic graphical model that
represents a set of random variables and their conditional
independencies via a directed acyclic graph (DAG). For
example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms.
Given symptoms, the network can be used to compute
the probabilities of the presence of various diseases. Efcient algorithms exist that perform inference and learning.
4.8
Reinforcement learning
4.11 Sparse dictionary learning
5
tions is strongly NP-hard and also dicult to solve
approximately.[17] A popular heuristic method for sparse
dictionary learning is K-SVD.
Adaptive websites
Computational advertising
Robot locomotion
Computational nance
Structural health monitoring
Sentiment analysis (or opinion mining)
Aective computing
Information retrieval
Recommender systems
Applications
6 Software
Software suites containing a variety of machine learning
algorithms include the following:
Bioinformatics
Brain-machine interfaces
Cheminformatics
Apache Mahout
dlib
ELKI
Sequence mining
Encog
H2O
Game playing[22]
KNIME
Software engineering
mlpy
9
MLPACK
MOA (Massive Online Analysis)
Monte Carlo Machine Learning Library
OpenCV
OpenNN
Orange
R
RapidMiner
Adaptive control
Adversarial machine learning
Automatic reasoning
Cache language model
Computational intelligence
Computational neuroscience
Cognitive science
Shogun
Cognitive modeling
Yooreeka
Data mining
Weka
Explanation-based learning
6.2
Commercial software
Angoss KnowledgeSTUDIO
IBM SPSS Modeler
KNIME
KXEN Modeler
LIONsolver
Mathematica
MATLAB
Microsoft Azure Machine Learning
NeuroSolutions
Oracle Data Mining
RapidMiner
RCASE
SAS Enterprise Miner
STATISTICA Data Miner
8 See also
scikit-learn
REFERENCES
9 References
[1] Ron Kovahi; Foster Provost (1998). Glossary of terms.
Machine Learning 30: 271274.
[2] C. M. Bishop (2006). Pattern Recognition and Machine
Learning. Springer. ISBN 0-387-31073-8.
[3] Wernick, Yang, Brankov, Yourganov and Strother, Machine Learning in Medical Imaging, IEEE Signal Processing Magazine, vol. 27, no. 4, July 2010, pp. 25-38
[4] Mannila, Heikki (1996). Data mining: machine learning, statistics, and databases. Int'l Conf. Scientic and
Statistical Database Management. IEEE Computer Society.
[5] Friedman, Jerome H. (1998). Data Mining and Statistics:
Whats the connection?". Computing Science and Statistics
29 (1): 39.
[6] Phil Simon (March 18, 2013). Too Big to Ignore: The
Business Case for Big Data. Wiley. p. 89. ISBN 9781118638170.
[7]
[8] Harnad, Stevan (2008), The Annotation Game: On Turing (1950) on Computing, Machinery, and Intelligence,
in Epstein, Robert; Peters, Grace, The Turing Test Sourcebook: Philosophical and Methodological Issues in the Quest
for the Thinking Computer, Kluwer
[9] Russell, Stuart; Norvig, Peter (2003) [1995]. Articial
Intelligence: A Modern Approach (2nd ed.). Prentice Hall.
ISBN 978-0137903955.
[10] Langley, Pat (2011). The changing science of machine learning. Machine Learning 82 (3): 275279.
doi:10.1007/s10994-011-5242-y.
[11] MI Jordan (2014-09-10). statistics and machine learning. reddit. Retrieved 2014-10-01.
[12] http://projecteuclid.org/download/pdf_1/euclid.ss/
1009213726
[13] Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). An Introduction to Statistical Learning.
Springer. p. vii.
[14] Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar
(2012) Foundations of Machine Learning, MIT Press
ISBN 9780262018258.
[15] Lu, Haiping; Plataniotis, K.N.; Venetsanopoulos, A.N.
(2011). A Survey of Multilinear Subspace Learning for
Tensor Data. Pattern Recognition 44 (7): 15401551.
doi:10.1016/j.patcog.2011.01.004.
[16] Yoshua Bengio (2009). Learning Deep Architectures for
AI. Now Publishers Inc. pp. 13. ISBN 978-1-60198294-0.
[17] A. M. Tillmann, "On the Computational Intractability of
Exact and Approximate Dictionary Learning", IEEE Signal Processing Letters 22(1), 2015: 4549.
[18] Aharon, M, M Elad, and A Bruckstein. 2006. KSVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. Signal Processing,
IEEE Transactions on 54 (11): 4311-4322
[19] Goldberg, David E.; Holland, John H. (1988). Genetic
algorithms and machine learning. Machine Learning 3
(2): 9599.
[20] Michie, D.; Spiegelhalter, D. J.; Taylor, C. C. (1994). Machine Learning, Neural and Statistical Classication. Ellis
Horwood.
[21] Daniel Jurafsky and James H. Martin (2009). Speech and
Language Processing. Pearson Education. pp. 207 .
[22] Tesauro, Gerald (March 1995). Temporal Dierence
Learning and TD-Gammon". Communications of the
ACM 38 (3).
10 Further reading
Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar (2012). Foundations of Machine Learning,
The MIT Press. ISBN 9780262018258.
Ian H. Witten and Eibe Frank (2011). Data Mining: Practical machine learning tools and techniques Morgan Kaufmann, 664pp., ISBN 9780123748560.
Sergios Theodoridis, Konstantinos Koutroumbas
(2009) Pattern Recognition, 4th Edition, Academic Press, ISBN 978-1-59749-272-0.
Mierswa, Ingo and Wurst, Michael and Klinkenberg, Ralf and Scholz, Martin and Euler, Timm:
YALE: Rapid Prototyping for Complex Data Mining
Tasks, in Proceedings of the 12th ACM SIGKDD
International Conference on Knowledge Discovery
and Data Mining (KDD-06), 2006.
Bing Liu (2007), Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, ISBN
3-540-37881-2
Toby Segaran (2007), Programming Collective Intelligence, O'Reilly, ISBN 0-596-52932-5
Huang T.-M., Kecman V., Kopriva I. (2006),
Kernel Based Algorithms for Mining Huge Data
Sets, Supervised, Semi-supervised, and Unsupervised Learning, Springer-Verlag, Berlin, Heidelberg, 260 pp. 96 illus., Hardcover, ISBN 3-54031681-7.
Ethem Alpaydn (2004) Introduction to Machine
Learning (Adaptive Computation and Machine
Learning), MIT Press, ISBN 0-262-01211-1
MacKay, D.J.C. (2003). Information Theory, Inference, and Learning Algorithms, Cambridge University Press. ISBN 0-521-64298-1.
KECMAN Vojislav (2001), Learning and Soft
Computing, Support Vector Machines, Neural Networks and Fuzzy Logic Models, The MIT Press,
Cambridge, MA, 608 pp., 268 illus., ISBN 0-26211255-8.
Trevor Hastie, Robert Tibshirani and Jerome Friedman (2001). The Elements of Statistical Learning,
Springer. ISBN 0-387-95284-5.
[24]
11
Ryszard S. Michalski, George Tecuci (1994), Machine Learning: A Multistrategy Approach, Volume
IV, Morgan Kaufmann, ISBN 1-55860-251-8.
Sholom Weiss and Casimir Kulikowski (1991).
Computer Systems That Learn, Morgan Kaufmann.
ISBN 1-55860-065-5.
Yves Kodrato, Ryszard S. Michalski (1990), Machine Learning: An Articial Intelligence Approach,
Volume III, Morgan Kaufmann, ISBN 1-55860119-8.
Ryszard S. Michalski, Jaime G. Carbonell, Tom
M. Mitchell (1986), Machine Learning: An Articial Intelligence Approach, Volume II, Morgan Kaufmann, ISBN 0-934613-00-1.
Ryszard S. Michalski, Jaime G. Carbonell, Tom M.
Mitchell (1983), Machine Learning: An Articial
Intelligence Approach, Tioga Publishing Company,
ISBN 0-935382-05-4.
Vladimir Vapnik (1998). Statistical Learning Theory. Wiley-Interscience, ISBN 0-471-03003-1.
Ray Solomono, An Inductive Inference Machine,
IRE Convention Record, Section on Information
Theory, Part 2, pp., 56-62, 1957.
Ray Solomono, "An Inductive Inference Machine" A privately circulated report from the 1956
Dartmouth Summer Research Conference on AI.
11
External links
EXTERNAL LINKS
12
12.1
12.2
Images
File:Animation2.gif Source: http://upload.wikimedia.org/wikipedia/commons/c/c0/Animation2.gif License: CC-BY-SA-3.0 Contributors: Own work Original artist: MG (talk contribs)
File:Fisher_iris_versicolor_sepalwidth.svg Source: http://upload.wikimedia.org/wikipedia/commons/4/40/Fisher_iris_versicolor_
sepalwidth.svg License: CC BY-SA 3.0 Contributors: en:Image:Fisher iris versicolor sepalwidth.png Original artist: en:User:Qwfp (original); Pbroks13 (talk) (redraw)
File:Internet_map_1024.jpg Source: http://upload.wikimedia.org/wikipedia/commons/d/d2/Internet_map_1024.jpg License: CC BY
2.5 Contributors: Originally from the English Wikipedia; description page is/was here. Original artist: The Opte Project
File:Portal-puzzle.svg Source: http://upload.wikimedia.org/wikipedia/en/f/fd/Portal-puzzle.svg License: Public domain Contributors: ?
Original artist: ?
File:Svm_max_sep_hyperplane_with_margin.png Source: http://upload.wikimedia.org/wikipedia/commons/2/2a/Svm_max_sep_
hyperplane_with_margin.png License: Public domain Contributors: Own work Original artist: Cyc
12.3
Content license