0% found this document useful (0 votes)
132 views3 pages

Data Science Syllabus

This document outlines the topics that will be covered across 5 units on data science. Unit 1 introduces data processing architectures and challenges of big data analysis. It also discusses providing structure to unstructured data through machine translation and indexing. Unit 2 covers data types, feature engineering, and analytic techniques. Unit 3 focuses on simple analytic techniques like distributions and relationships as well as deeper analysis like clustering and modeling. Unit 4 applies techniques like text mining and processing to information retrieval. Finally, Unit 5 uses a case study to examine challenges of large scale data and frameworks like MapReduce and Hadoop. Reading materials with more details on these topics are also provided.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
132 views3 pages

Data Science Syllabus

This document outlines the topics that will be covered across 5 units on data science. Unit 1 introduces data processing architectures and challenges of big data analysis. It also discusses providing structure to unstructured data through machine translation and indexing. Unit 2 covers data types, feature engineering, and analytic techniques. Unit 3 focuses on simple analytic techniques like distributions and relationships as well as deeper analysis like clustering and modeling. Unit 4 applies techniques like text mining and processing to information retrieval. Finally, Unit 5 uses a case study to examine challenges of large scale data and frameworks like MapReduce and Hadoop. Reading materials with more details on these topics are also provided.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 3

Unit1

Introduction
Introduction:DataProcessingArchitectures,componentsandprocessesDataStoresand
Datakind,Challenges"BigData"andotherwise
SpecialConsiderationsinBigDataAnalysis:Background,TheoryinSearchofData,Data
inSearchofTheory,Overfitting,BignessBias,TooMuchData,FixingData,DataSubsetsin
BigData:NeitherAdditivenorTransitive,AdditionalBigDataPitfalls.
ProvidingStructuretoUnstructuredData:Background,MachineTranslation,Autocoding,
IndexingandTermExtraction
Unit2
DataandFeatures
ComponentPartsofDataScience:DataTypes,ClassesofAnalyticTechniques,Learning
Models,ExecutionModelsFractalAnalyticModel,AnalyticSelectionProcess:
ImplementationConstraints
FeatureEngineering:FeatureSelection,DataVeracity,ApplicationofDomainKnowledge,
CurseofDimensionality
Unit3
DataandAnalysis
SimpleAnalyticTechniques:Background,LookattheData,DataRange,Denominator,
FrequencyDistributions,MeanandStandardDeviation,EstimationOnlyAnalyses
DeepDiveintoAnalysis:Background,AnalyticTasks,Clustering,Classifying,
Recommending,andModelling,DataReduction,NormalisingandAdjustingData,Find
RelationshipsNotSimilarities
Unit4
ApplyingNuancesofDataSciencetoTextProcessingAndInformationRetrieval
TextMining:Definition,Genericarchitecture,TextMiningOperations,FrequentItemset
Mining,CategorizationDocumentRepresentation,Clusteringandcategorization,Bayesian
Classifier
TextProcessing:Tokenization,Stem,Stop,nGram,categorization,informationextraction
Unit5
BignatureofDataCasestudy
MapReduce,ThePaper:Programmingmodel,TypesandExamplesImplementationand
ExecutionArchitecturePartitioning,types,Combiners,DataLocality
Hadoop:ChallengesatLargeScaleandtheHadoopApproachHDFSMapReducein
Hadoop
ReadingMaterial:(Innoparticularorderofprecedence)
1. PrinciplesofBigData:Preparing,SharingandAnalyzingComplexInformation,JulesJ
Berman,FirstEdition,MKPublishers,2013.
2. TheFieldGuidetoDataScience:
http://www.boozallen.com/media/file/TheFieldGuidetoDataScience.pdf
3. UnderstandingBigData:
ftp://129.35.224.12/software/tw/Defining_Big_Data_through_3V_v.pdf
4. Ghemawatet.alGoogle,MapReduce:SimpliedDataProcessingonLargeClusters

5.
6.
7.

8.

http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduc
eosdi04.pdf
HadoopTutorial:http://developer.yahoo.com/hadoop/tutorial/
ScalableSQLandNoSQLDataStores
http://www.sigmod.org/publications/sigmodrecord/1012/pdfs/04.surveys.cattell.pdf
OracleInformationArchitecture
http://www.oracle.com/technetwork/topics/entarch/articles/oeabigdataguide1522052
.pdf
ChallengesandopportunitieswithBigData
http://www.purdue.edu/discoverypark/cyber/assets/pdfs/BigDataWhitePaper.pdf

9. FeatureEngineeringinTextProblems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.9770&rep=rep1&type=pd
f
10. ClusteringExplained(ReadatleasttillVectorSpaceModelandKMeans)
http://www.iula.upf.edu/materials/040701wanner.pdf
Kmeansbrokendown
http://www.engineering.uiowa.edu/~ie_155/Lecture/Kmeans.pdf
11. KMeansexplained(especiallythereasoningforkmeans)
http://www.croce.ggf.br/dados/K%20mean%20Clustering1.pdf
12. NaiveBayesBrokenDown
http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab4NaiveBayes.pdf

13. TextMiningSlideshttp://www.vscse.org/summerschool/2013/Abbott.pdf
14. TextMiningHandbookhttp://www.roelsbeestenboel.nl/text.pdf(Chapter1and2,4and
5)

OtherInterestingReadingMaterial:
1. FeatureSelectionforHighDimensionalData:APearsonRedundancyBasedFilter
http://kzi.polsl.pl/~jbiesiada/prace/selekcja/07Wroclaw.pdf
2. OntheRoleofFeatureSelectioninMachineLearning:ThesisonFeatureEngineering
http://www.cs.huji.ac.il/labs/learning/Theses/Navot_PhD.pdf
3. FeatureSelectionMethods(GoodThesis)
http://www.cs.cmu.edu/~kdeng/thesis/feature.pdf
http://www.dccia.ua.es/~boyan/papers/TesisBoyan.pdf
Asurvey
:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.1337&rep=rep1&type=
pdf
4. DimensionalityReduction:http://infolab.stanford.edu/~ullman/mmds/ch11.pdf
5. BayesExplained(Slides)(Veryuseful)
http://www.stanford.edu/class/cs124/lec/naivebayes.pdf

6. NaiveBayesClassifierswithexamples(VeryIntuitiveExplanation)
http://www.cs.ucr.edu/~eamonn/CE/Bayesian%20Classification%20withInsect_exampl
es.pdf

You might also like