Data Science
Data Science
VIRTUAL INTERNSHIP
Submitted for partial fulfillment of the requirements for the award of the degree
of
Bachelor of Technology in COMPUTER SCIENCE AND ENGINEERING
By:
MYNENI JWALITHA
22KP1A0586
Date:
Place: Signature of Student:
NRI INSTITUTE OF TECHNOLOGY
Department of Computer Science and Engineering
CERTIFICATE
This certificate attests that the following report accurately represents the work
completed by MYNENI JWALITHA, Registration Number- 22KP1A0586,
during the academic year 2023-2024, covering the time period from 15/05/2024
to 25/06/2024, as part of the DATA SCIENCE AND MACHINE LEARNING
VIRTUAL INTERNSHIP PROGRAME.
Signature of HOD
The goal of the internship was to The Data Science and Machine Learning Virtual
Internship program equips participants with a comprehensive understanding of the dynamic
field of data science and its practical applications. Covering essential topics such as data
preprocessing, exploratory data analysis, feature engineering, and model evaluation, the
program emphasizes solving real-world problems through data-driven approaches.
Participants delved into data manipulation, visualization techniques, and statistical
modeling to extract insights from complex datasets while gaining hands-on experience with
both foundational and advanced machine learning algorithms, including supervised and
unsupervised learning methods, to build predictive models and generate actionable insights.
empower participants with a strong foundation in data science and machine learning,
enabling them to extract insights, build intelligent systems, and contribute to advancements
in the field. Ultimately, this program aimed to cultivate skilled data practitioners capable of
addressing challenges across industries.
The internship provided participants with hands-on experience in using key tools and
technologies like Python, Jupyter Notebooks, Scikit-learn, TensorFlow, and Pandas to
design and implement machine learning solutions, optimize models, and evaluate their
performance. It also emphasized understanding the complete lifecycle of a machine
learning project, covering stages from problem formulation and data collection to
deployment. Through real-world case studies and projects, participants explored practical
applications in areas such as classification, regression, clustering, and recommendation
systems, gaining valuable insights into solving complex problems with data-driven
approaches.
LETTER OF UNDERTAKING
To
The Principal
NRI Institute of Technology Visadala
Guntur District.
Subject: Submission of Internship Report on Data Science and Machine Learning Virtual
Internship organized by ExcelR Edutech Pvt. Ltd.
Dear Sir,
This report would not have been possible without your guidance and support, for which I am
sincerely thankful. Working for six weeks on the Data Science and Machine Learning
Virtual Internship was an incredible learning opportunity, equipping me with valuable skills
and knowledge. Preparing this report has also been a rewarding experience, and I will be
available for any clarification, if required.
Therefore, I humbly request you to kindly accept my Internship Report and oblige.
Yours Obediently,
MYNENI JWALITHA
22KP1A0586
Jwalithamyneni995@gmail.com
CERTIFICATE OF INTERNSHIP
ACKNOWLEDGEMENT
We take this opportunity to express our deepest gratitude and appreciation to all those who
made this internship work easier with words of encouragement, motivation, discipline, and
faith by offering guidance and support, which helped us successfully complete this
internship.
First and foremost, we express our deep gratitude to Mr. Alapati Raja, Chairman, NRI
Institute of Technology, for providing the necessary facilities throughout the Computer
Science & Engineering program.
We express our sincere thanks to Dr. Kotha Srinivas, Principal, NRI Institute of
Technology, for his constant support and cooperation throughout the Computer Science &
Engineering program.
We express our sincere gratitude to Mr. K. Nageswara Rao, Professor & Head of the
Department, Computer Science & Engineering, NRI Institute of Technology, for his
constant encouragement, motivation, and invaluable guidance.
We also extend our heartfelt thanks to our Internship Coordinator, ExcelR, and our
Internship SPOC, Kranti mam, for their insightful advice, motivating suggestions, and
unwavering support throughout this internship.
Lastly, we would like to take this opportunity to thank the teaching and non-teaching staff
of the Department of Computer Science & Engineering, NRI Institute of Technology,
for their invaluable help and support.
MYNENI JWALITHA
22KP1A0586
Table of Contents
Data Science and Machine Learning Virtual Internship
Page
Module No Module Title Module Contents Date
No
Introduction to Data 15/5/2024 to
Module 1 1. Overview of Data Science 21 / 5/2024
1
Science
2. Key Concepts and
Terminologies in Data Science
3. Data Science Lifecycle
4. Introduction to Data
Visualization and Exploratory Data
Analysis (EDA)
5. Tools and Technologies in Data
Science (Python, Jupyter
Notebooks, Pandas, Matplotlib)
Data Preprocessing
1. Introduction to Data 22/5/2024 to
Module 2 and Feature 2
Preprocessing 28 / 5/2024
Engineering
2. Handling Missing Data
3. Data Transformation (Scaling,
Normalization, Encoding)
4. Feature Selection and Feature
Engineering Techniques
5. Outlier Detection and Treatment
Big Data, Data Mining, and Data Analytics: Big data refers to vast amounts of data
that traditional methods can't handle. Data mining involves exploring large datasets to
find patterns. Data analytics refers to the process of analyzing data to find actionable
insights.
Types of Data:
Structured Data: Organized data like spreadsheets or databases with a
welldefined format. o
Unstructured Data: Data that doesn't follow a specific format, like text
documents, images, and videos. o
Semi-Structured Data: Data that doesn't fully conform to a structure but has
some organizational properties, such as JSON files.
• Data Collection: Gathering raw data from various sources such as sensors, databases,
APIs, and web scraping.
• Data Cleaning: Handling missing values, outliers, and duplicates to ensure
highquality data.
• Data Analysis: Applying statistical methods to explore data and uncover patterns or
insights.
• Data Visualization: Presenting data in graphical formats like charts, graphs, and
dashboards to make it easier to understand.
• Data Interpretation: Drawing conclusions from the analysis and using them for
decision-making.
Data Visualization: The graphical representation of data to identify trends, patterns, and
insights. Tools like Matplotlib, Seaborn, and Tableau are commonly used.
• Types of Data Visualizations: Bar charts, line graphs, scatter plots, pie charts,
histograms, and box plots.
Conclusion
The Data Science and Machine Learning internship has been a transformative experience,
offering substantial growth in both theoretical understanding and practical application within
this dynamic field. Through a series of meticulously designed modules, I acquired profound
insights into key concepts such as data preprocessing, exploratory data analysis (EDA), feature
engineering, and model selection. The Fundamentals of Data Science module laid a robust
foundation, guiding me through the lifecycle of data from collection and cleaning to
visualization and analysis. Subsequently, the Machine Learning Fundamentals module
expanded my knowledge of algorithms like regression, classification, and clustering, allowing
me to implement these techniques effectively on real-world datasets.
The advanced modules on Model Optimization and Evaluation further honed my skills in fine-
tuning models for enhanced accuracy and performance. I gained hands-on expertise in
hyperparameter optimization, model selection, and performance evaluation using metrics like
accuracy, precision, recall, and F1-score. Additionally, practical exposure to real-world
datasets presented invaluable learning opportunities, such as overcoming data processing
challenges and applying machine learning algorithms to address complex problems. This
internship has not only strengthened my theoretical acumen but also provided indispensable
hands-on experience, establishing a solid platform for tackling future challenges in Data
Science and Machine Learning