CAPSTONE
PROJECT
CAPSTONE PROJECT
The final project of an academic program, typically integrating
all of the learning from the program is called the Capstone
Project.
A capstone project is a project where students must research
a topic independently to find a deep understanding of the
subject matter. It gives an opportunity for the student to
integrate all their knowledge and demonstrate it through a
comprehensive project.
Objectives of the Capstone Project:
Application of Learning: The goal is to apply theoretical
knowledge to practical, real-world issues. This
demonstrates your ability to translate academic concepts
into actionable solutions.
Example: If you’ve learned about neural networks, you
should be able to apply them to a project such as image
classification.
Communicating Solutions: It’s important to present
your findings in a way that non-technical stakeholders
can understand. Explaining complex algorithms in
simple, clear language is key.
Example: When explaining a model’s predictions to a
business audience, you would avoid jargon like
“backpropagation” and instead focus on how the model
benefits the business.
Choosing the Right Algorithm: You need to analyze the
problem carefully to determine the most appropriate
algorithm to solve it.
Example: For predicting stock prices (a regression task),
you might choose linear regression or a more complex
algorithm like a neural network, depending on the dataset
and pand problem complexity.
THESE ARE SOME SIMPLE CAPSTONE PROJECT IDEAS
1. Stock Prices Predictor
2. Develop A Sentiment Analyzer
3. Movie Ticket Price Predictor
4. Students Results Predictor
5. Human Activity Recognition using Smartphone Data set
6. Classifying humans and animals in a photo
1. Understanding The Problem
Artificial Intelligence is perhaps the most transformative technology available
today. At a high level, every Al
project follows the following six steps:
1. Problem definition - Clearly define the issue you’re addressing.
2. Data gathering - Collect the right data for training your model.
3. Feature definition - Identify the key factors (features) that influence the
outcome.
4. Al model construction - Build and train a suitable AI model.
5. Evaluation & refinements - Assess the model’s performance and make
improvements.
6. Deployment - Implement the solution in a real-world setting.
2. Decomposing The Problem Through DT Framework
Design Thinking is a design methodology that provides a
solution- based approach to solving problems. It's extremely
useful in tackling complex problems that are ill-defined or
unknown.
The five stages of Design Thinking are as follows:
1. Empathize
Observe consumers to gain a deeper understanding of the problem
Observation must be made with empathy
Use 5W1H method for right questioning
Who, What, When, Where, Why & How
Empathy Map
It is a collaborative visualization used to clarify
our understanding of a specific type of user.
2.Define
Define the problem statement
Determining the cause of the problem
Brainstorming to generate possible solutions
Selecting most suitable solution.
3.Ideate
Gather ideas to solve the problem you defined
Brainstorm to arrive at various creative solutions
4. Prototype
A prototype is a simple experimental model for a proposed solution
Build representation(charts, models) of one or more ideas
5. Test
Test the prototype and gain user feedback
Iterate(Design thinking is an iterative process)
Problem decomposition steps
1. Understand the problem and then restate the problem in your own
words
Know what the desired inputs and outputs are
Ask questions for clarification
2. Break the problem down into a few large pieces.
3. Break complicated pieces down into smaller pieces. Keep doing this until
all of the pieces are small.
4. Code one small piece at a time.
Think about how to implement it
Write the code/query
Test it ... on its own.
Fix problems, if any
3. Analytic Approach
Those who work in the domain of Al and Machine Learning solve problems and answer
questions through data every day. They build models to predict outcomes or discover
underlying patterns, all to gain insights leading to actions that will improve future outcomes.
Pick analytic approach based on type of question
Descriptive
. Current status
Diagnostic (Statistical Analysis)
. What happened?
. Why is this happening?
Predictive (Forecasting)
. What if these trends continue?
. What will happen next?
Prescriptive
. How do we solve it?
• If the question is to determine probabilities of an action,
then a predictive model might be used.
• If the question is to show relationships, a descriptive
approach maybe required.
• Statistical analysis applies to problems that require counts:
if the question requires a yes/ no answer, then a
classification approach to predicting a response would be
suitable.
DATA REQUIREMENT
• The data scientist must determine the following if the
issue at hand is "a recipe," so to speak, and data is "an
ingredient:
1. which ingredients are required?
2. how to source or the collect them?
3. how to understand or work with them?
4. and how to prepare the data to meet the desired
outcome?
• Prior to undertaking the data collection and data preparation stages of the
methodology, it's vital to define the data requirements for decision-tree
classification. This includes identifying the necessary data content,
formats and sources for initial data collection.
• In this phase the data requirements are revised and decisions are made
as to whether or not the collection requires more or less data. Once the
data ingredients are collected, the data scientist will have a good
understanding of what they will be working with.
• Techniques such as descriptive statistics and visualization can be
applied to the data set, to assess the content, quality, and initial insights
about the data. Gaps in data will be identified and plans to either fill or
make substitutions will have to be made.
• In essence, the ingredients are now sitting on the cutting board.
MODELING APPROACH
Modeling
• In what way can the data be visualized to get to the answer that is required?
Evaluation
• Does the model used really answer the intial question or does it need to be
adjusted?
• Data Modeling focuses on developing models that are either
descriptive or predictive.
• An example of a descriptiue model might examine things like:
if a person did this, then they're likely to prefer that.
• A predictive model tries to yield yes/no, or stop/go type
outcomes. These models are based on the analytic approach
that was taken, either statistically driven or machine learning
driven.
• The data scientist will use a training set for predictive modelling. A
training set is a set of historical data in which the outcomes are already
known. The training set acts like a gauge to determine if the model
needs to be calibrated. In this stage, the data scientist will play around
with different algorithms to ensure that the variables in play are
actually required.
• The success of data compilation, preparation and modelling, depends
on the understanding of the problem at hand, and the appropriate
analytical approach being taken. The data supports the answering of
the question, and like the quality of the ingredients in cooking, sets the
stage for the outcome.
Constant refinement, adjustments and tweaking are necessary
within each step to ensure the outcome is one that is solid. The
framework is geared to do 3 things:
• First, understand the question at hand.
• Second, select an analytic approach or method to solve the
problem.
• Third, obtain, understand, prepare, and model the data.
The end goal is to move the data scientist to a point where a data
model can be built to answer the question.
Model Validation Techniques
Train-Test Split:
• In this technique, you split the data into two parts: a training
set and a test set. You train the model on the training data
and then test its performance on the test data. The
performance is usually measured using metrics like accuracy,
precision, or RMSE.
• Common Ratios: The typical split is 80% training and 20%
testing, but other ratios like 70-30% or even 50-50% can be
used depending on the dataset size.
Cross-Validation:
• Cross-validation involves splitting the data into ‘k’ folds, training
the model on some folds, and testing it on the remaining fold.
This process repeats for each fold, and the results are
averaged for more reliable performance metrics.
• Example: In a 5-fold cross-validation, the data is divided into 5
equal parts. Each time, 4 parts are used for training, and 1 part
is used for testing. This is repeated 5 times, and the model’s
performance is averaged across all runs.
Model Quality Metrics – Measuring Success
When building AI models, especially regression models, it’s crucial to evaluate
how well your model’s predictions match the actual data. Two commonly used
metrics for this purpose are Mean Squared Error (MSE) and Root Mean
Squared Error (RMSE).
1.Mean Squared Error (MSE)
• Definition: MSE measures the average of the squares of the errors—that is,
the average squared difference between the predicted values and the actual
values.
• Formula:
Interpretation:
• A lower MSE indicates better model performance; it means the
predictions are closer to the actual values.
• Since errors are squared, larger errors have a disproportionately
large effect on MSE, making it sensitive to outliers.
Example:
• Suppose we’re predicting the test scores of students based on the
number of hours they studied.
• Actual Test Scores: [85, 78, 92, 75, 80]
• Predicted Test Scores: [83, 76, 95, 70, 82]
Calculating MSE:
Calculate the squared errors:
Student 1: (83 – 85)² = (-2)² = 4
Student 2: (76 – 78)² = (-2)² = 4
Student 3: (95 – 92)² = (3)² = 9
Student 4: (70 – 75)² = (-5)² = 25
Student 5: (82 – 80)² = (2)² = 4
Sum the squared errors:
Total = 4 + 4 + 9 + 25 + 4 = 46
Calculate MSE:
MSE = 46/5 = 9.2
Interpretation:
• The Mean Squared Error is 9.2, indicating the average squared difference
between the predicted and actual test scores.
2. Root Mean Squared Error (RMSE)
Definition: RMSE is the square root of the MSE. It provides the error metric in
the same units as the target variable, making it more interpretable.
• Formula:
Example:
Using the MSE calculated above:
1.Calculate RMSE:
RMSE = (9.2)1/2 ≈3.033
Interpretation:
•The RMSE of approximately 3.033 means that, on average, the model’s
predictions are about 3 points off from the actual test scores.