0% found this document useful (0 votes)

38 views48 pages

Lecture 1

Uploaded by

Gaurav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views48 pages

Lecture 1

Uploaded by

Gaurav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MACHINE LEARNING IN HIGH

ENERGY PHYSICS
LECTURE #1

Alex Rogozhnikov, 2015

INTRO NOTES
4 days
two lectures, two practice seminars every day
this is introductory track to machine learning
kaggle competition!
WHAT IS ML ABOUT?
Inference of statistical dependencies which give us ability to
predict

Data is cheap, knowledge is precious

WHERE ML IS CURRENTLY USED?
Search engines, spam detection
Security: virus detection, DDOS defense
Computer vision and speech recognition
Market basket analysis, Customer relationship management
(CRM)
Credit scoring, fraud detection
Health monitoring
Churn prediction
... and hundreds more
ML IN HIGH ENERGY PHYSICS
High-level triggers (LHCb trigger system: 40MHz → 5kHz )
Particle identification
Tagging
Stripping line
Analysis
Different data is used on different stages
GENERAL NOTION
In supervised learning the training data is represented as set
of pairs
xi , yi

iis index of event

xi is vector of features available for event

yi is target — the value we need to predict

CLASSIFICATION EXAMPLE
yi ∈ Y , Y if finite set
on the plot: xi ∈ ℝ 2
, yi ∈ {0, 1, 2}

Examples:
defining type of particle (or decay
channel)
Y = {0, 1} — binary classification, 1

is signal, 0 is bck
REGRESSION
y ∈ ℝ
Examples:
predicting price of house by it's positions
predicting number of customers / money income
reconstructing real momentum of particle

Why need automatic classification/regression?

in applications up to thousands of features
higher quality
much faster adaptation to new problems
CLASSIFICATION BASED ON
NEAREST NEIGHBOURS
Given training set of objects and their labels {xi , yi } we
predict the label for new observation.
y = yj ,
̂ j = arg min ρ(x, x )
i
i
VISUALIZATION OF DECISION RULE
k NEAREST NEIGHBOURS
A better way is to use k neighbours:
# of knn events in class i
pi (x) =
k
k = 1, 2, 5, 30
OVERFITTING
what is the quality of classification on training dataset when
k = 1?

answer: it is ideal (closest neighbor is event itself)

quality is lower when k > 1
this doesn't mean k = 1 is the best,
it means we cannot use training events to estimate quality
when classifier's decision rule is too complex and captures
details from training data that are not relevant to
distribution, we call this overfitting (more details tomorrow)
KNN REGRESSOR
Regression with nearest neighbours is done by averaging of
output

1
y =
̂ yj
k ∑
j ∈knn(x)
KNN WITH WEIGHTS
COMPUTATIONAL COMPLEXITY
Given that dimensionality of space is d and there are n
training samples:
training time ~ O(save a link to data)
prediction time: n × d for each sample
SPACIAL INDEX: BALL TREE
BALL TREE
training time ~ O(d × n log(n))
prediction time ~ log(n) × d for each sample
Other option exist: KD-tree.
OVERVIEW OF KNN
1. Awesomely simple classifier and regressor
2. Have too optimistic quality on training data
3. Quite slow, though optimizations exist
4. Hard times with data of high dimensions
5. Too sensitive to scale of features
SENSITIVITY TO SCALE OF FEATURES
Euclidean distance:
ρ(x, y) 2
= (x1 −y 1)
2
+ (x2 −y 2)
2
+ ⋯ + (xd −y d)
2

Change scale fo first feature:

ρ(x, y) = (10x − 10y )
2
1 1
2
+ (x2 −y 2)
2
+ ⋯ + (xd −y d)
2

ρ(x, y) ∼ 100(x − y )
2
1 1
2

Scaling of features frequently increases quality.

DISTANCE FUNCTION MATTERS
Minkowski distance ρp (x, y) = ∑ (x − y )
i i i
p

|xi −y|
Canberra ρ(x, y) ∑
i
=
i
|xi | + |yi |
< x, y >
Cosine metric ρ(x, y) =
|x| |y|
x MINUTES BREAK
RECAPITULATION
1. Statistical ML: problems
2. ML in HEP
3. k nearest neighbours classifier and regressor.
MEASURING QUALITY OF BINARY
CLASSIFICATION
The classifier's output in binary classification is real variable

Which classifier is better?

All of them are identical
ROC CURVE

These distributions have the same ROC curve:

(ROC curve is passed signal vs passed bck dependency)
ROC CURVE DEMONSTRATION
ROC CURVE
Contains important information:
all possible combinations of signal and background
efficiencies you may achieve by setting threshold
Particular values of thresholds (and initial pdfs) don't
matter, ROC curve doesn't contain this information
ROC curve = information about order of events:
s s b s b ... b b s b b

Comparison of algorithms should be based on information

from ROC curve
TERMINOLOGY AND CONVENTIONS
fpr = background efficiency = b
tpr = signal efficiency = s

→
ROC AUC
(AREA UNDER THE ROC CURVE)

ROC AUC = P(x < y) where x, y are predictions of

random background and signal events.
Which classifier is better for triggers?
(they have the same ROC AUC)
STATISTICAL MACHINE LEARNING
Machine learning we use in practice is based on statistics
1. Main assumption: the data is generated from probabilistic
distribution:
p(x, y)

2. Does there really exist the distribution of people / pages?

3. In HEP these distributions do exist
OPTIMAL CLASSIFICATION. OPTIMAL
BAYESIAN CLASSIFIER
Assuming that we know real distributions p(x, y) we
reconstruct using Bayes' rule
p(x, y) p(y)p(x|y)
p(y|x) = =
p(x) p(x)

p(y = 1 | x) p(y = 1) p(x | y = 1)

=
p(y = 0 | x) p(y = 0) p(x | y = 0)

LEMMA (NEYMAN–PEARSON):
p(y = 1 | x)
The best classification quality is provided by
p(y = 0 | x)

(optimal bayesian classifier)

OPTIMAL BINARY CLASSIFICATION

Optimal bayesian classifier has highest possible ROC curve.
Since the classification quality depends only on order,
p(y = 1 | x) gives optimal classification quality too!

p(y = 1 | x) p(y = 1) p(x | y = 1)

=
p(y = 0 | x) p(y = 0) p(x | y = 0)
FISHER'S QDA (QUADRATIC DISCRIMINANT
ANALYSIS)
Reconstructing probabilities p(x | y = 1), p(x | y = 0) from
data, assuming those are multidimensional normal
distributions:
p(x | y = 0) ∼ μ( 0, 0)
Σ

p(x | y = 1) ∼  (μ 1, 1)
Σ
QDA COMPLEXITY
n samples, d dimensions
training takes O(nd 2 + d
3
)

computing covariation matrix O(nd 2 )

inverting covariation matrix O(d 3 )
prediction takes O(d 2 ) for each sample
1 1
−1
f (x) = exp − (x − μ) T
(x − μ) )
π) (
Σ

k/2 1/2
(2 |Σ | 2
QDA
simple decision rule
fast prediction
many parameters to reconstruct in high dimensions
data almost never has gaussian distribution
WHAT ARE THE PROBLEMS WITH
GENERATIVE APPROACH?
Generative approach: trying to reconstruct p(x, y), then use
it to predict.
Real life distributions hardly can be reconstructed
Especially in high-dimensional spaces
So, we switch to discriminative approach: guessing p(y|x)
LINEAR DECISION RULE
Decision function is linear:
d(x) =< w, x > +w0

d(x) > 0, class + 1

{ d(x) < 0, class −1

This is parametric model (finding parameters w, w0 ).

FINDING OPTIMAL PARAMETERS
A good initial guess: get such w, w0 , that error of
classification is minimal ([true] = 1, [false] = 0):

 = [yi ≠ sgn(d(xi ))]

∑
i∈events
Discontinuous optimization (arrrrgh!)
Let's make decision rule smooth
⎧ f (0) = 0.5
p+1 (x) = f (d(x)) ⎪
⎨ f (x) > 0.5 if x > 0
p−1 (x) = 1 −p +1 (x)
⎪
⎩ f (x) < 0.5 if x < 0
LOGISTIC FUNCTION
a smooth step rule.
x
e 1
σ(x) = x
=
−x
1 + e 1 + e

PROPERTIES
1. monotonic, σ(x) ∈ (0, 1)
2. σ(x) + σ(−x) = 1
3. σ (x) = σ(x)(1 − σ(x))
′

4. 2 σ (x) = 1 + tanh(x/2)
LOGISTIC FUNCTION
LOGISTIC REGRESSION
Optimizing log-likelihood (with probabilities obtained with
logistic function)
d(x) = < w, x > +w0

p+1 (x) = σ(d(x))

p−1 (x) = σ (−d(x))

 =
1

N ∑
− ln(p yi (xi )) =
1

N ∑
L(xi , yi ) → min

∈events
i i
Exercise: find expression and build plot for L(xi , yi )

DATA SCIENTIST PIPELINE

1. Experiments in appropriate high-level language or

environment
2. After experiments are over — implement final algorithm in
low-level language (C++, CUDA, FPGA)
Second point is not always needed.
SCIENTIFIC PYTHON
NumPy
vectorized computations in python

Matplotlib
for drawing

Pandas
for data manipulation and analysis (based on
NumPy)
SCIENTIFIC PYTHON
Scikit-learn
most popular library for machine learning

Scipy
libraries for science and engineering

Root_numpy
convenient way to work with ROOT files
THE END

Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
ML - Logistic Regression&KNN
No ratings yet
ML - Logistic Regression&KNN
48 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
2 - Classification Models
No ratings yet
2 - Classification Models
52 pages
Session 5
No ratings yet
Session 5
36 pages
Classification
100% (2)
Classification
105 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Lec 04
No ratings yet
Lec 04
70 pages
1datamining Intro
No ratings yet
1datamining Intro
42 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Introduction to Machine Learning Classification
No ratings yet
Introduction to Machine Learning Classification
62 pages
Chapter
100% (1)
Chapter
101 pages
Classification
No ratings yet
Classification
53 pages
Unit 1
No ratings yet
Unit 1
92 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
79 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Intro to Machine Learning for Data Science
No ratings yet
Intro to Machine Learning for Data Science
37 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Classification
No ratings yet
Classification
4 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
40 pages
Lect 1
No ratings yet
Lect 1
24 pages
Machine Learning Lec 1
No ratings yet
Machine Learning Lec 1
68 pages
Reviews Less 1 - 4
No ratings yet
Reviews Less 1 - 4
115 pages
Two-Class Pattern Classification Tutorial
No ratings yet
Two-Class Pattern Classification Tutorial
14 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
17 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Live Classroom 2
No ratings yet
Live Classroom 2
40 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
DM - Ch4 - Classification (Part1)
No ratings yet
DM - Ch4 - Classification (Part1)
20 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Intro to Supervised Learning
No ratings yet
Intro to Supervised Learning
55 pages
Learning
No ratings yet
Learning
51 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Murphy's Machine Learning Solutions Manual
No ratings yet
Murphy's Machine Learning Solutions Manual
100 pages
Introduction To Pattern Recognition: Vojtěch Franc
100% (1)
Introduction To Pattern Recognition: Vojtěch Franc
21 pages
Data Mining and Classification Basics
No ratings yet
Data Mining and Classification Basics
129 pages
Classification
No ratings yet
Classification
47 pages
University Entrance Fee Details
No ratings yet
University Entrance Fee Details
1 page
Btech Barch
No ratings yet
Btech Barch
19 pages
EoSE QP PPM
No ratings yet
EoSE QP PPM
1 page
PA Final Work
No ratings yet
PA Final Work
5 pages
Nest2017 A
No ratings yet
Nest2017 A
28 pages
Character Sketch of Harris
No ratings yet
Character Sketch of Harris
2 pages
Ordinance on Admissions and Evaluations
No ratings yet
Ordinance on Admissions and Evaluations
21 pages
IIT-JEE/NEET Waves Study Guide
No ratings yet
IIT-JEE/NEET Waves Study Guide
32 pages
Itl Public School Holiday Homework Class Xii: Theory
No ratings yet
Itl Public School Holiday Homework Class Xii: Theory
3 pages
Central University of Rajasthan Tentative Academic Calendar Odd Semester 2017 - 2018
No ratings yet
Central University of Rajasthan Tentative Academic Calendar Odd Semester 2017 - 2018
2 pages
Chapter 14
No ratings yet
Chapter 14
14 pages
Elasticity Concepts for IIT-JEE/NEET
No ratings yet
Elasticity Concepts for IIT-JEE/NEET
5 pages
IIT-JEE & NEET Physics Prep
No ratings yet
IIT-JEE & NEET Physics Prep
11 pages
Chapter 08
No ratings yet
Chapter 08
16 pages
Chapter 07
No ratings yet
Chapter 07
16 pages
Chapter 03
No ratings yet
Chapter 03
29 pages
Motion N One Dimension
No ratings yet
Motion N One Dimension
18 pages
Chapter 05
No ratings yet
Chapter 05
16 pages
Thermodynamics Concepts for IIT-JEE
No ratings yet
Thermodynamics Concepts for IIT-JEE
14 pages
Assignment - Predictive Modeling
88% (24)
Assignment - Predictive Modeling
66 pages
Segmentation 2
No ratings yet
Segmentation 2
16 pages
R for Visual Statistics
No ratings yet
R for Visual Statistics
429 pages
BasicNeuralNetwork TrainingAndEvaluation - Ipynb Colaboratory
No ratings yet
BasicNeuralNetwork TrainingAndEvaluation - Ipynb Colaboratory
2 pages
A Novel Machine Learning-Based Approach For The Risk Assessment Ofnitrate Groundwater Contamination
No ratings yet
A Novel Machine Learning-Based Approach For The Risk Assessment Ofnitrate Groundwater Contamination
9 pages
Agri Stats Manual for B.Sc. Students
No ratings yet
Agri Stats Manual for B.Sc. Students
132 pages
Chapter 1 Annexe
No ratings yet
Chapter 1 Annexe
17 pages
Loan Approval Prediction Using ML
No ratings yet
Loan Approval Prediction Using ML
3 pages
Early Season Mapping of Sugarcane by Applying Mach
No ratings yet
Early Season Mapping of Sugarcane by Applying Mach
21 pages
Revisiting Fundamental Ideas For Statistics Education From The Perspective of Machine Learning and Its Applications
No ratings yet
Revisiting Fundamental Ideas For Statistics Education From The Perspective of Machine Learning and Its Applications
6 pages
ML Unit 1
No ratings yet
ML Unit 1
16 pages
Wa0008.
No ratings yet
Wa0008.
21 pages
Exp 6
No ratings yet
Exp 6
12 pages
Cloud-Based Network Intrusion Detection System Using Deep Learning
No ratings yet
Cloud-Based Network Intrusion Detection System Using Deep Learning
6 pages
AI in Reservoir Flow Unit Classification
No ratings yet
AI in Reservoir Flow Unit Classification
16 pages
SPE 133545 Failure Prediction For Rod Pump Artificial Lift Systems
No ratings yet
SPE 133545 Failure Prediction For Rod Pump Artificial Lift Systems
8 pages
Customer Segmentation via K-Means Clustering
No ratings yet
Customer Segmentation via K-Means Clustering
6 pages
08 Notes DecisionTrees RandomForest
No ratings yet
08 Notes DecisionTrees RandomForest
6 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Unit 2
No ratings yet
Unit 2
6 pages
What Are The Basic Concepts in Machine Learning
No ratings yet
What Are The Basic Concepts in Machine Learning
3 pages
Data Coding Tabulation
100% (2)
Data Coding Tabulation
20 pages
Lab Manual
No ratings yet
Lab Manual
44 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
44 pages
Role of Data Mining in Education For Improving Students Performance For Social Change
No ratings yet
Role of Data Mining in Education For Improving Students Performance For Social Change
2 pages
Csa4020 Deep-Learning LP 1.0 22 Csa4020 Deep-Learning LP 1.0 1 Deep Learning
No ratings yet
Csa4020 Deep-Learning LP 1.0 22 Csa4020 Deep-Learning LP 1.0 1 Deep Learning
2 pages
Heart Disease Prediction Tech
50% (6)
Heart Disease Prediction Tech
104 pages
Clustering Techniques Comparison
No ratings yet
Clustering Techniques Comparison
18 pages
Syllabus of DT-1 23ECH102
No ratings yet
Syllabus of DT-1 23ECH102
5 pages
QUESTIONS
No ratings yet
QUESTIONS
20 pages

Lecture 1

Uploaded by

Lecture 1

Uploaded by

MACHINE LEARNING IN HIGH

Alex Rogozhnikov, 2015

Data is cheap, knowledge is precious

iis index of event

yi is target — the value we need to predict

Why need automatic classification/regression?

answer: it is ideal (closest neighbor is event itself)

Change scale fo first feature:

Scaling of features frequently increases quality.

Which classifier is better?

These distributions have the same ROC curve:

Comparison of algorithms should be based on information

ROC AUC = P(x < y) where x, y are predictions of

2. Does there really exist the distribution of people / pages?

p(y = 1 | x) p(y = 1) p(x | y = 1)

(optimal bayesian classifier)

OPTIMAL BINARY CLASSIFICATION

p(y = 1 | x) p(y = 1) p(x | y = 1)

computing covariation matrix O(nd 2 )

d(x) > 0, class + 1

{ d(x) < 0, class −1

This is parametric model (finding parameters w, w0 ).

 = [yi ≠ sgn(d(xi ))]

p+1 (x) = σ(d(x))

DATA SCIENTIST PIPELINE

1. Experiments in appropriate high-level language or

You might also like