0% found this document useful (0 votes)
90 views59 pages

Machine Learning Lecture - 2 and Lecture - 3

The document outlines 5 potential project areas for a machine learning course - environment and climate change, intelligent transportation systems, natural language processing, 5G/6G wireless networks, and healthcare/biology/bioinformatics. It provides examples of tasks and datasets for each project area, such as using climate data to predict local weather, applying machine learning techniques to beam selection in vehicle-to-infrastructure communication, and enabling computers to understand human language through natural language processing.

Uploaded by

Charmil Gandhi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
90 views59 pages

Machine Learning Lecture - 2 and Lecture - 3

The document outlines 5 potential project areas for a machine learning course - environment and climate change, intelligent transportation systems, natural language processing, 5G/6G wireless networks, and healthcare/biology/bioinformatics. It provides examples of tasks and datasets for each project area, such as using climate data to predict local weather, applying machine learning techniques to beam selection in vehicle-to-infrastructure communication, and enabling computers to understand human language through natural language processing.

Uploaded by

Charmil Gandhi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 59

CSE 523 : Machine Learning (ML)

Lecture : 2 and 3
Introduction to Machine Learning Course

Dhaval Patel, Ph.D


Assistant Professor,
School of Engineering and Applied Science,
Ahmedabad University, Gujarat, India

January 9-10, 2020


Outline

• About CSE 523 – A Course on Machine Learning


- - Project Guidelines

• A Brief on Project Areas


- Environment and Climate Change
- Intelligent Transportation System (ITS)
- Natural Language Processing (NLP)
- 5G/6G Wireless Networks
- Biology/Bioinformatics/Healthcare

2
About CSE 523 – A Course on Machine Learning
- Project Guidelines

What is the project all about?

As a part of the project component comprising of 35% weightage, the students


would be required to do a project. There are five project areas that have been
identified. They are listed as below:

1. Environment and Climate change

2. Intelligent Transportation System (ITS)

3. Natural Language Processing (NLP)

4. 5G/6G Wireless Networks

5. Health care / Biology /Bioinformatics


3
Ml Project Areas
- Environment and Climate change
Ml Project Areas
- Environment and Climate change
Ml Project Areas
- Environment and Climate change
Ml Project Areas
- Environment and Climate change

Abstract
Climate change is one of the greatest challenges facing humanity, and
we, as machine learning experts, may wonder how we can help. Here
we describe how machine learning can be a powerful tool in reducing
greenhouse gas emissions and helping society adapt to a changing
climate. From smart grids to disaster management, we identify high
impact problems where existing gaps can be filled by machine learning,
in collaboration with other fields. Our recommendations encompass
exciting research questions as well as promising business
opportunities. We call on the machine learning community to join
the global effort against climate change.
Ml Project Areas
- Environment and Climate change
Ml Project Areas
- Environment and Climate change

https://www.climatechange.ai/
Ml Project Areas
- Environment and Climate change

Task:

Using Machine Learning To Predict Local Weather

Data Set:

Charlotte, NC Climate Data from 2013 to 2018 (downloaded from the


NOAA NCEI site - https://www.ncei.noaa.gov/)
Ml Project Areas
- Environment and Climate change

Data Set:
Charlotte, NC Climate Data from 2013 to 2018 (downloaded from the
NOAA NCEI site - https://www.ncei.noaa.gov/)
Ml Project Areas
- Environment and Climate change
How Machine Learning and AI Can Help in the Fight Against Climate Change?

Source: https://iopscience.iop.org/article/10.1088/1748-9326/ab4e55#erlab4e55f1
Ml Project Areas
- Environment and Climate change

Adaptations:
Climate prediction

Data
Predictive
Description and Data Analysis
Modeling
preparation

Charlotte, NC Climate Data from 2013 to 2018 (downloaded from the NOAA NCEI site - https://www.ncei.noaa.gov/)
ML Project Areas
- Intelligent Transportation Systems

14
ML Project Areas
- Intelligent Transportation Systems

15
ML Project Areas
- Intelligent Transportation Systems

16
ML Project Areas
- Intelligent Transportation Systems

17
ML Project Areas
- Intelligent Transportation Systems

18
ML Project Areas
- Intelligent Transportation Systems
V2I Architecture

Each sensor generates data

Many Sensors/car

➢ V2V and V2I communications : Sensor data exchange


➢ Applications: Safety, transportation operations, cargo, and infotainment

Source: P. Kumari, N. Gonzalez-Prelcic and R. W. Heath, "Investigating the IEEE 802.11ad Standard for Millimeter Wave
19
Automotive Radar," in IEEE VTC Fall, 2015
ML Project Areas
- Intelligent Transportation Systems Learning based Channel Estimation
Traffic prediction

Source: https://www.mn.uio.no/ifi/studier/masteroppgaver/nd/traffic-flow-prediction-with-
deep-learning.html

Vehicle Trajectory Prediction

Source: H. Ye, G. Y. Li and B. F. Juang, "Deep Reinforcement Learning


Source: N. Deo and M. M. Trivedi, "Multi-Modal Trajectory Prediction of Surrounding Vehicles Based Resource Allocation for V2V Communications," in IEEE
with Maneuver based LSTMs," 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, 2018, Transactions on Vehicular Technology, vol. 68, no. 4, pp. 3163-3173,
pp. 1179-1184. April 2019.
ML Project Areas
- Intelligent Transportation Systems
ML Project Areas
- Intelligent Transportation Systems

22
ML Project Areas
- Intelligent Transportation Systems
Case Study:Machine Learning for Beam Selection in V2I
Methodology for Data Generation
Base.py in SUMO
Template. Route file in SUMO
2. 1.
Config.ak file in SUMO

Randomtrips.py file
6. in SUMO 3.
5.
SUMO

GEM 𝑽𝟐
7. 9.
4.

8. 10.

Source: A. Klautau, P. Batista, N. González-Prelcic, Y. Wang and R. W. Heath, "5G MIMO Data for Machine Learning:
Application to Beam-Selection Using Deep Learning," 2018 Information Theory and Applications Workshop (ITA), San23Diego,
ML Project Areas
- Intelligent Transportation Systems
The goal is to choose best pair of beams for analog beam forming with both
transmitter and receiver having antenna arrays with only one radio frequency chain.
Machine Learning Input features
Ray Tracing Study Area V2I Study Area Grid Resolution
337 x 202𝑚2 23 X 250 𝑚2 1 x 1 𝑚2

𝑄𝑠
Grid Resolution Matrix
1 x 1 𝑚2 Negative element : Location is occupied

Positive element : Location of receiver

Zero : Position is not occupied

24
ML Project Areas
- Intelligent Transportation Systems
Steps to Perform Classification
Step 1: Generate and Validate Data Set

Step 2: Divide into Training and Testing Sets


Step 3: Import Classifier in Python Scripts

Step 4: Provide input features file

Step 5: Train feature and test feature have values -1 for blockers (Truck) and 1
for non blockers .
Step 6: convert output to single number (the class label) and eliminate pairs
that do not appear
Step 7: Iterate over Classifiers

Accuracy (%)
Classifier All Data Only NLOS
Linear SVM 31 11
Decision tree 54 28
Deep neural network 65 37
25
Project Areas
- Natural Language Processing

26
Source: https://sigmoidal.io/boosting-your-solutions-with-nlp/
Project Areas
- Natural Language Processing
NLP is a branch of artificial intelligence which is focused on the enabling the computers to
understand and interpret the human language.

27
Source: https://sigmoidal.io/boosting-your-solutions-with-nlp/
ML Project Areas
- Natural Language Processing

Source-https://ontotext.com/top-5-semantic-technology-trends-2017/ 28
Project Areas
- Natural Language Processing

29
Project Areas
- Natural Language Processing
NLP is a branch of artificial intelligence which is focused on the enabling the computers to
understand and interpret the human language.

30
Source: https://sigmoidal.io/boosting-your-solutions-with-nlp/
Project Areas
- Natural Language Processing

Word2Vec representations of words projected onto


a two-dimensional space.

31
Project Areas
- Natural Language Processing
NLP is a branch of artificial intelligence which is focused on the enabling the computers to
understand and interpret the human language.

32
Source: https://sigmoidal.io/boosting-your-solutions-with-nlp/
Project Areas
- Natural Language Processing
NLP is a branch of artificial intelligence which is focused on the enabling the computers to
understand and interpret the human language.

33
Source: Google Duplex: A.I. Assistant Calls Local Businesses To Make Appointments
Project Areas
- Natural Language Processing

Word Count in a Speech

34
Source: https://tfipost.com/2019/08/pm-modi-optimistic-independence-day-speech-pakistan-imran-khan-01/
Project Areas
- Natural Language Processing

Case study: Text Classification


Dataset : Amazon Review Data set which has 10,000 rows of Text data which is classified
into “Label 1” and “Label 2”. The Data set has two columns “Text” and “Label”.

Text, Label (Review-1) : Good


Stunning even for the non-gamer: This sound track was beautiful! It paints the scenery in
your mind so well I would recommend it even to people who hate video game music! I have
played the game Chrono Cross but out of all of the games I have ever played it has the best
music! It backs away from crude keyboarding and takes a fresher step with grate guitars and
soulful orchestras. It would impress anyone who cares to listen! ^_^,__label__2

Text, Label (Review-2): Bad


" The Worst!: A complete waste of time. Typographical errors, poor grammar, and a totally
pathetic plot add up to absolutely nothing. I'm embarrassed for this author and very
35
disappointed I actually paid for this book.",__label__1
Project Areas
- Natural Language Processing

Step 1: Add the libraries

Step 2: Set the Random seed

Step 3: Read the dataset

36
Project Areas
- Natural Language Processing
Step 4: Data Pre-processing

1. Remove Blank rows in Data, if any


2. Change all the text to lower case
3. Word Tokenization
4. Remove Stop words
5. Remove Non-alpha text
6. Word Lemmatization

37
Project Areas
- Natural Language Processing

Step 5: Prepare Train and Test dataset

Step 6: Encoding

38
Project Areas
- Natural Language Processing
Step 7: Word Vectorization

It is a process of turning a collection of text documents into numerical feature vectors

One of the method is term frequency-inverse document frequency (TF-IDF)

Term Frequency: This summarizes how often a given word appears within a document.

Inverse Document Frequency: This down scales words that appear a lot across documents.

Vectorized words

39
Project Areas
- Natural Language Processing
Step 8: Use ML algorithm to predict the outcome on test dataset

40
Project Areas
- Natural Language Processing
Step 8: Use ML algorithm to predict the outcome on test dataset

41
Project Areas
- 5G/6G Wireless Network
Why machine Learning for wireless networks? (1/2)

Increasing antennas in massive MIMO has


changed channel properties
Mathematical complexity for scenarios
like underwater communication
Molecular communication

Derya Malak, Ozgur B. Akan, “Molecular communication NANO networks inside human body,” Elsevier Nano
Communication Networks, Volume 3, Issue 1, 2012, Pages 19-35. 42
Project Areas Case study: ANN based Spectrum sensing for Cognitive
- 5G/6G Wireless Network Radio Network

Fixed spectrum access v/s Dynamic spectrum access


Spectrum bands are Spectrum bands are assigned
allocated/assigned statically dynamically
Unlicensed Band v/s Licensed Band
Over-crowded Under-utilized

Source:M. López-Benítez et al., “Spectral occupation measurements and blind standard recognition sensor for cognitive
radio networks,”Proc. 4th Int’l. Conf. Cognitive Radio Oriented Wireless Networks and Comms. (CrownCom 2009), Hannover,
Germany, June 22-24, 2009.
Background
Fixed spectrum access v/s Dynamic spectrum access
Spectrum bands are Spectrum bands are assigned
allocated/assigned statically dynamically
Unlicensed Band v/s Licensed Band
Over-crowded Under-utilized

Analogy:

Road → UnLicensed band

BRTS route → Licensed band


Vehicles → Channel users

Source: Google Images (Shivranjini cross roads, Ahmedabad)


Project Areas
- 5G/6G Wireless Network

Solution Opportunistic Spectrum Access (OSA)

Opportunity : Vacancy
of Primary User
i.e, Hunt for the white space for
the needy(Secondary User)
through a technique called
Spectrum Sensing

Spectrum Sensing
-Parametric
-Non Parametric
Project Areas
- 5G/6G Wireless Network
Why Machine Learning for CRN?

The main task of any machine learning algorithm is:


Or

PU present

PU absent

From CRN perspective, task is to identify whether PU is present or absent

Thus, machine learning can be used to address this binary classification problem 46
Project Areas
- 5G/6G Wireless Network

ANN Hybrid Sensing Scheme


❑ The scheme is a combination of Classical Energy Detection, Likelihood Ratio
Test statistic (LRS-G2) and Artificial Neural Network (ANN).
❑ Features:
1. Energy : This is the simplest and most efficient non-parametric sensing
feature.

2. LRS-𝑮𝟐 Zhang Statistic : A non-parametric sensing feature, with highest


statistical power for the normality test in comparison with other Goodness of
fit based tests.

N is the Sample size and 𝑭𝟎 𝒚 is the known cumulative distribution


function (CDF) of noise

❑ Four Features (Input to ANN): (1) Sample’s Energy (2) Sample’s Zhang
Statistic (3) Previous Sample’s Energy (4) Previous Sample’s Zhang Statistic
47
Project Areas
- 5G/6G Wireless Network

E
AWGN E_P

Desired
Chunks
N Feature Extraction Z Pd
SNR Signal
Z_P
Testing
Data set

E
Feature E_P
Extraction
Chunks
Chunks
AWGN NN Z Pf
Z_P

48
Project Areas
- 5G/6G Wireless Network

Numerical Result: Pd vs SNR

Radio technology: DCS-1800 DL, False alarm : 0.035, N =100


49
Project Areas
- Environment and Health Care
Application of Machine Learning : Health Care

50
Project Areas
- Biology/Bioinformatics Machine learning in bioinformatics

https://www.genomicseducation.hee.nhs.uk/
Project Areas
- Biology/Bioinformatics
Project Areas
- Biology/Bioinformatics
Stroke diagnosis
Microarrays

Gene prediction
Project Areas
- Biology/Bioinformatics
Molecular Classification of Cancer by Gene Expression Monitoring using Support
Vector Machine(SVM)

The goal is to classify cancer patients with acute myeloid leukemia


(AML) and acute lymphoblastic leukemia (ALL) using the SVM algorithm.
Project Areas
- Biology/Bioinformatics
Molecular Classification of Cancer by Gene Expression Monitoring using Support
Vector Machine(SVM)

The goal is to classify cancer patients with acute myeloid leukemia


(AML) and acute lymphoblastic leukemia (ALL) using the SVM algorithm.

Source: https://towardsdatascience.com/explore-the-world-of-bioinformatics-with-machine-
learning-47c62c482aaf
Project Areas
- Biology/Bioinformatics Train Data

Test Data
Project Areas
- Biology/Bioinformatics

About the dataset:

1. Each row represents a different gene.


2. Columns 1 and 2 are descriptions about that gene.
3. Each numbered column is a patient in label data.
4. Each patient has 7129 gene expression values — i.e each patient has one
value for each gene.
5. The training data contain gene expression values for patients 1 through 38.
6. The test data contain gene expression values for patients 39 through 72

Processing Steps:

1. Read Datasets
2. Obtain Normalized Data
3. Dimensionality reduction
4. Hyper parameter optimization
5. SVM Classification model
6. Confusion matrix and visualize with heat map
Project Areas
- Biology/Bioinformatics

Class Label 0 - acute myeloid leukemia


Class Label 1- acute lymphoblastic leukemia
Thank you !!
http://profpatel.in/

59

You might also like