0% found this document useful (0 votes)
11 views2 pages

Foundation of Datascience

BDA

Uploaded by

lekha.cce
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
11 views2 pages

Foundation of Datascience

BDA

Uploaded by

lekha.cce
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 2

Syllabu BIG DATA (ME)

SEMESTER - II
24PBDPC2 FOUNDATION OF DATA L T P C
01 SCIENCE 3 1 0 4
SDG NO. 4

OBJECTIVES:
 Able to apply fundamental algorithmic ideas to process data.
 Learn to apply hypotheses and data into actionable predictions.
 Document and transfer the results and effectively communicate the findings using
visualization techniques.

UNIT I INTRODUCTION TO DATA SCIENCE 9


Data science process – roles, stages in data science project – working with data from files –
working with relational databases – exploring data – managing data – cleaning and sampling for
modeling and validation – introduction to NoSQL.

UNIT II MODELING METHOD 9


Choosing and evaluating models – mapping problems to machine learning, evaluating clustering
models, validating models – cluster analysis – K-means algorithm, Naı¨ve Bayes – Memorization
Methods – Linear and logistic
regression– unsupervised methods.

UNIT III INTRODUCTION TOR 9


Reading and getting data into R – ordered and unordered factors – arrays and matrices – lists
and data frames – reading data from files – probability distributions – statistical models in R -
manipulating objects – data distribution.

UNIT IV MAP REDUCE 9


Introduction – distributed file system – algorithms using map reduce, Matrix- Vector
Multiplication by Map Reduce – Hadoop - Understanding the Map Reduce architecture - Writing
Hadoop MapReduce Programs - Loading data into HDFS - Executing the Map phase - Shuffling
and sorting - Reducing phase execution.

UNIT V DELIVERING RESULTS 9


Documentation and deployment – producing effective presentations – Introduction to graphical
analysis – plot() function – displaying multivariate data – matrix plots – multiple plots in one
window - exporting graph - using graphics parameters. Case studies
TOTAL: 45 PERIODS
1
TEXT BOOKS:
1. Nina Zumel, John Mount, “Practical Data Science with R”, Manning Publications, 2014.

REFERENCES:
1. Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, “Mining of Massive
Datasets”, Cambridge University Press, 2014.
2. Mark Gardener, “Beginning R - The Statistical Programming Language”,
John Wiley &Sons, Inc., 2012.
3. W. N. Venables, D. M. Smith and the R Core Team, “An Introduction to R”,
2013
4. Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, Abhijit Dasgupta,
“Practical DataScience Cookbook”, Packet Publishing Ltd., 2014.
5. Nathan Yau, “Visualize This: The FlowingData Guide to Design, Visualization,
andStatistics”, Wiley, 2011.
Syllabu BIG DATA (ME)
s

6. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop


Solutions”,Wiley, ISBN: 9788126551071, 2015.

WEB REFERENCES:
1. http://www.johndcook.com/R_language_for_programmers.html
2. http://bigdatauniversity.com/
3. http://home.ubalt.edu/ntsbarsh/stat-data/topics.htm#rintroduction

ONLINERESOURCES:
1. https://freevideolectures.com/search/foundation-of-data-science/
2. https://www.simplilearn.com/big-data-and-analytics/senior-data- scientist-masters-
program-training

OUTCOMES:
Upon completion of the course, the student should be able to
1. Obtain, clean/process and transform data.
2. Analyze and interpret data using an ethically responsible approach.
3. Use appropriate models of analysis, assess the quality of input, derive insight from
results, and investigate potential issues.

4. Apply computing theory, languages and algorithms, as well as mathematical and statistical
models, and the principles of optimization to appropriately formulate and use data
analyses.
5. Formulate and use appropriate models of data analysis to solve hidden solutions to
business-related challenge

You might also like