AI Project Cycle and Problem Scoping
AI Project Cycle and Problem Scoping
PROBLEM SOLUTION
Attendance takes a lot of time in school and offices. Develop an attendance software.
1
According to [Link], a project is a piece of planned work or an activity that is
finished over a period of time and intended to achieve a particular purpose.
The purpose of the project is translated into achievable goals.
This sounds something familiar. Even if it is not, let us have a look at a farewell party with the
perspective of a small project to execute which needs planning.
Planning the farewell party needs to answer certain questions which may begin with why, what,
when, where, who and how, etc.
For example, why is the party being held? Who are the hosts? Who are the guests? What is the time
and venue? Who will arrange what?, etc.
Make teams of 4/5 classmates of yours and come up with a plan for school’s farewell party covering
the following:
Budget arrangements – how much, source of money, etc.
Date, time and venue.
Activities in the party – opening, entertainment, music, dinner, gifts, speeches, closing, etc.
List of event managers – not more than 8 persons with one of them the captain of the managers’
team.
Responsibilities of event managers before, during and after the party.
Discuss your plan with others and your teacher.
2
ACTIVITY: PROJECT SCOPING THE SCHOOL FAREWELL PARTY
Scoping covers all activities that are to be done in a project to achieve the intended goals. Think
again about the school farewell party and list the goals to make the party successful and the tasks to
be done to accomplish each goal. Remember, here, goal means what to achieve and task means what
to do to achieve it. For example:
Goal: Invite all guests through an invitation card.
Tasks: Prepare or arrange for invitation cards, get the cards ready, distribute or send cards to all
guests.
3
2.2 AI Project Cycle: An Introductiontions
Learning Outcomes
Learn problem scoping and ways to set goals for an AI project.
Identify stakeholders involved in the problem scoped.
Implement 4W framework to create problem statement template.
A problem scope is mutual understanding of all stakeholders about what is to be done to solve that
problem.
Problem scoping the very first stage of AI project. Trying to see or define what is to be done to solve a
problem is comes under problem scoping and once it is defined, then it is called the problem scope.
Here, stakeholders include those associated with the project as well as the beneficiaries of the project
and other related people like financiers, any sponsors, etc.
Problem scoping gives a clear vision of the problem. It also distinctly defines what will be the outcomes
of entire problem-solving exercise. So, first thing needed for defining a problem scope is to answer the
question: What are we trying to achieve after solving the problem?
A proper problem scoping achieves the following:
Identification of the stakeholders.
Clearly defined problem through a problem statement.
List of achievable goals.
Inputs required in solving the problem.
4
Resources (people, infrastructure, etc.) required to solve the problem.
Financial requirements (budgeting).
Delivery deadline of the solution.
Challenges in the way of problem solving.
Who?
Primarily, as solution developer or provider, we should be familiar with the people who are facing the
problem and the people who will be affected by the solution of the problem directly or indirectly. All such
people are called stakeholders. The stakeholders include people in various capacities such as operators of
the system for which problem needs to be solved, users of the system, investors, managers and owner of
the system etc. As a solution developer we must be familiar with primary stakeholders such as operators
of the system and beneficiaries or users of the system.
There are many sources and means to gather information about the stakeholders such as web site of the
business and meeting and interviewing the persons involved.
Let us understand this through a real-life scenario.
A college library has several thousand books. The school administration observed in the monthly
feedback from the students that it takes a lot of time for the students to browse through the books and
select the right ones especially for studies. School already has an online library management system
in the form of a mobile app where students can search books. But still it is difficult to browse several
categories of books and find one.
As a solution provider, answer the following questions based on the above scenario:
1. Who are the stakeholders affected by the problem.
2. What details do you have about these stakeholders?
3. What ways do you suggest to gather more information about the stakeholders?
What?
Identifying the problem: The scope of the problem cannot be defined until the problem is understood
completely and correctly. Getting familiar with all aspects of the problem is the prerequisite to defining
problem scope. A problem which is identified and understood could be described in writing. This is called
5
problem statement. The problem statement is short and includes the problem description as well as the
proposed solution. Some samples of problem statements are given below:
How can I improve on my preparation for exams?
The lab time for students needs to be optimised by minimising the activities which can
be done before or after the lab.
How to increase the factory production and utilise of wasted man-hours?
Let us identify the problem in the Library Management System case study.
A college library has several thousand books. The school administration observed in the monthly
feedback from the students that it takes a lot of time for the students to browse through the books and
select the right ones especially for studies. School has an online library management system app where
students can search books. But still it is difficult to browse several categories of books.
In this scenario, describe the problem faced by the college in your own words.
Write a short and concise problem statement.
What is the evidence that the problem is real?
Identify the goals to achieve: Once the problem is understood, it is easy to set the goals. Describe the
goals in clear language with specific details. For example:
Develop the handwriting analyser within 60 days to analyse 500 samples in 1 minute.
Create the burglar alarm system in 30 days that raises alarm of 120 decibel hearable within 500
metres radius.
Prepare 10,000 records of transactions by this weekend to train the 3 AI-algorithms in the span of
4 weeks' time.
Goals should be set to cover entire purpose of the problem-scoping exercise.
A college library has several thousand books. The school administration observed in the monthly
feedback from the students that it takes a lot of time for the students to browse through the books and
select the right ones especially for studies. School already has an online library management system
in the form of a mobile app where students can search books. But still it is difficult to browse several
categories of books and find one. The IT consultant suggested of an intelligent book recommendation
system that must “know” the reading habit, preferences, and frequency of issuing the books of the
students and on the basis of that, it should recommend the top 5 most relevant book to the students.
Such system could also help the college to buy most useful new books every year and classify the least
read books.
6
List the specific goals to be achieved by solving their problem. Assume that college has given you 60
days’ time to deliver the solution. You are free to judiciously allocate number of days in which each
goal should be achieved.
Where?
This question defines the context of the problem. It shows the exact area or boundary where the problem
is occurring. This helps you identify the location of the problem. It clearly shows you when and where
exactly the real problem arises and helps you pinpoint the affected area of the system.
In the Library Management System problem identified earlier, describe the context in which
the problem occurs.
Why?
This question answers the rationale of the solution. It describes the benefits to be drawn from the
implemented solution to the problem at hand. It also helps you describe how valuable will be the solution
to the stakeholders. The answer to this question must inform the stakeholders how the situation will be
improved after implementing the solution.
are facing a problem that It takes too long to find the desired book in the library. WHAT
7
Case Study: The digital screener
Edusoft Academy – A professional training institute, is going to conduct an all India admission test. In
the last admission test they faced the problem of unauthorised candidates appearing for the admission
test at various centres. They need a solution that should raise an alarm if the person entering the
examination hall is not an authorised candidate or any unauthorised person. The academy has the
photographs of all the candidates appearing for the admission test and of all their authorised staff. The
photographs are in hard-copy (printed). Let us fill the problem statement template for this scenario.
Our STAKEHOLDERS
1. Edusoft Academy invigilators and staff involved in the test
process.
WHO
2. Candidates – the admission aspirants.
3. Edusoft Academy management.
8
2.3 AI Project Cycle: Data Acquisition
Learning Outcomes
Learn about data, data features and data formats.
Work around the scenarios to think of the ways to acquire data.
Create System Map of the problem area/context.
Data is the biggest asset for a business, society and economy today.
Data (singular: datum), as we know, are the raw piece of facts that alone do not make any sense. When
a set of data are related logically in a context, they generate information which is meaningful and useful
to meet a purpose. For example, Raj, A, 9, 16 are examples of some data values which make no sense as
such. But if we look at them in the context of a school and relate them together then they make a piece
of information – Raj is 16 years old and studies in class 9, section A.
Data Features
Every piece of data is not the same. If you consider the previous example, Raj and A are text type while
9 and 16 are numbers. If you look around into various systems, you will find basically three features of
data which are also called data types:
Characters or individual letters, symbols, marks. E.g. a, A, @, *, ! etc.
Strings of letters, also called text. E.g. “India”, “Ravi”, “House”.
Numbers. E.g. 10, 1, -9, 200.
There are variations to these three basic data types:
Phrases and sentences – variations of strings.
Numbers with decimal places.
Dates.
Data Formats
How data is presented or stored is determined by various formats?
Numbers with decimal places may have the decimal places defined in a system that they can store as
many decimal places only. For example, monitory figures have 2 decimal places while scientific notations
may have more than 10 decimal places.
Dates can be presented in various formats like 12-29-2022, 29-Dec-2022 and 29-12-2022, etc.
Text can be in various cases like UPPERCASE, lowercase, etc.
9
objects such as the model of a house. Complex data types are presented in a variety of formats like audio
is coded as mp3, wav etc.; video as mpeg, mp4, etc.; images as jpg, gif, png, etc.
Data Quality
Data which is relevant to the context in which problem is being solved, is said to be quality data or useful
data. Data quality is defined by following features of data:
Relevance: Data should not be out of context. That is the reason, why it is important to identify the
context of the problem during problem scoping stage. This ensures that only the data values relevant to
the problem are acquired.
Age: Data should not be too historic or too recent. There has to be a balance to it. For example, if data for
test matches is being used to train the machine while predictions are to be done for T20 matches then
data is too old for this.
Accuracy: Data values should be correct and in proper format. For example, if data is to be predicted
against the opponent team from Sri Lanka and in several records, the spelling of Sri Lanka is misspelt
then there are chances that those records are missed out to be included in the testing data.
Volume: Higher the volume of data, better would be the training of the machine. That is why, AI algorithms
of E-Commerce and social media web sites get intelligent day-by-day, minute-by-minute since they have a
lot of data to learn from every day.
10
Richness: Richness refers to the variety of data values in the data set. This directly relates to the volume
of data. There are chances of having a variety of data values in a bulk lot of data. For example, more values
of centuries hit, sixers scored, duck (zero) score, number of times not out will make a rich data set and a
robust training of the machine instead of plain total score values.
Format: In many AI applications different data formats also help in better training of the machine.
For example, an AI system using natural language processing, needs letters, text, symbols and voice in
different notations, accent, tones, semantics etc. or a face recognition AI system needs a variety of image
shots of the same person while our example of score prediction needs only numbers and, occasionally,
text (name of the country, bowler, etc.)
Data source: Data accuracy depends on the source from which the data is collected. For example, data
collected from public domain like Internet may not be authentic while data collected from an authorised
source such as a government or certified organisation.
Let us apply our learning so far about the data and its features.
A college library has several thousand books. The school administration observed in the monthly feedback
from the students that it takes a lot of time for the students to browse through the books and select the
right ones especially for studies. School already has an online library management system in the form
of a mobile app where students can search books. But still it is difficult to browse several categories of
books and find one. The IT consultant suggested of an intelligent book recommendation system that
must “know” the reading habit, preferences, and frequency of issuing the books of the students and on
the basis of that, it should recommend the top 5 most relevant book to the students. Such system could
also help the college to buy most useful new books every year and classify the least read books.
The most popular books issued category-wise.
Grouping of students who issued similar books.
The least read books category-wise.
The books which make less than 5% of all the books issued.
In the above scenario, list the sample data values, their data types and possible data formats. Also list
some examples of irrelevant data.
Data Acquisition
After understanding the data, its features and quality features, let us get to the practical aspect of
it – acquiring data to train the machine (machine learning).
Data acquisition has two aspects:
Data sources.
Data acquisition process.
Data sources
Depending on the context of the problem as considered during problem scoping, there could be
11
different sources that may provide training data. Some common examples are:
Database of the company for which solution is being developed.
Customer reviews and feedback.
Business documents – financial statements, business transactions, agreements, etc.
Web page content.
Live data – video recording, satellite imagery, images captured by webcam, chat text, phone calls,
video chat stream, CCTV feeds, weather data, etc.
Raw, flat files – plain text files, comma separated values (csv) files, spreadsheets, maps, images, hard
copies (books, reports), tables, etc.
Software applications – they generate some data of their own while working, which may be useful
sometimes like Windows server is maintaining the log-in information of users in a file. Another
example is registry of operating system that has details of software and hardware installed on the
computer system or simple a virus database of an anti-virus application.
Data acquisition process
Depending on the source of data, there are different methods
or processes of acquiring the data. Let us have a look at different
possible ways of data acquisition:
Certain data sources retain the data in such an organised fashion that
they can be acquired or collected very easily. For example, databases
store data in tables which is very well organised. Spreadsheets also
store most of the data in tabular format.
Databases also allow us to generate data sets by applying queries
on them. Many software applications allow to export data in various
formats which are easier to process.
Data acquisition needs more efforts and sophisticated methods with
the data which is not organised in a particular format. For example,
images, plain text, audio and video are complex data types to be acquired and need different tools
of compilation such as scanners, optical readers, sensors, etc.
Certain kind of data can be generated directly in the form of hard copy. For example, call data
printout from an EPBX machine or call connection exchange.
Another way of acquiring data is through online survey, feedback and review forms. All the data
entered in such forms can be collected in the form of a spreadsheet or CSV files.
Live data is acquired via the device involved such as
webcam, CCTV, Chatbot interface, satellite, sensors in
medical equipment, etc.
12
Programming interfaces are the piece of codes which help one
application to connect with another. Such interfaces are called
Application Programming Interface (API). For example, a
Python program may use a Java API to import data via a
Java program.
Web scraping or web harvesting is
the technique that lets collect the data of a website in an
organised format such as a table, CSV file or spreadsheet.
Scanning symbols, bar codes and QR codes, etc.
The simple, traditional method is using pen-paper to collect
thedata in printed formats filled by hand. Such documents can
be scanned into an Optical Character Recognition (OCR) device and then softcopy can be processed
to extract the data.
Consider the Library Management System problem you have read earlier and list possible sources
of data relevant for AI-powered book recommendation algorithm. Also, list some possible ways to
acquire the data.
13
System Map of an AI System
When we analyse a problem area
for scoping, we identify various
elements that comprise the
context of problem area. A system
map is a visual tool to show
the relationship among various
elements of a problem area in
a graphical form. It helps us in
thinking of a possible solution to
the problem easily. System map
shows the interconnections of
elements of a system and help us
understand the complex issues
easily. The System Map for Library MS scenario is given here:
14
Case Study: The digital screener
Create a simple System Map of Edusoft Academy scenario
15