IS102
Computer as an Analysis Tool
Week 6: Its About Time
Reading:
Chapter 7: Processes and Time
Instructor:
Associate Professor Guo Zhiling
School of Information Systems
zhilingguo@[Link]
Agenda Today
Simulating More Data Points
Using Frequency Bins
Using Re-sampling
Using Inverse Distribution
Lesson Outcome
Date and Time Management in Excel
Modeling Queuing System using Simulations
Macro Recording
Exercises
EX57: [Link]
EX58: [Link]
Simulating More Data Points
Method 1: Using Frequency Bins
Build CRF table and use RAND() function to match CRF by
LOOKUP() and returns the simulated data X:
(i) frequency count or (ii) percentage of occurrence
(iii) distribution functions
Method 2: Using Re-sampling
Discrete Data: SMALL(array, RANDBETWEEN(1,N))
Continuous Data: PERCENTILE(array, RAND())
Method 3: Using Inverse Distribution
Discrete distribution (Uniform, Poisson, Binomial)
Continuous distribution (Uniform, Exponential, Normal)
Method 1: Frequency Bins
We first compute the cumulative relative frequency (CRF) of
data X and build the CRF data
We then use RAND() to generate a random number
between 0 and 1
This number is compared with the CRF table using
Lookup() and return the simulated data X
Since a bigger range of values from
RAND() will fall within this interval, 0 has
the highest probability of turning up
Method 2: Resampling
Resampling discrete data
Use Randbetween(1,N) to generate a position number k.
This number is used to return the kth value in the raw
data collection as if the raw data is already sorted in
ascending order.
X=SMALL(array,k)
X=SMALL(raw data, RNADBETWEEN(1,N))
This ensures that higher frequency results occur more
likely than the lower frequency results.
An Example: Resampling Discrete Data
Sorted Data
Randbetween
(1,10)
Number
Frequency
10
Because the position
number is generated
randomly, higher
frequency result (2)
will occur more often
than lower frequency
result (4).
Result 2 has the
highest probability of
turning up.
Method 2: Resampling
Resampling continuous data
Use Rand() to generate a random number between 0
and 1, to represent the percentile value k.
This percentile value k is used to return the
corresponding percentile number in the raw data
collection.
X=PERCENTILE(array,k)
X=PERCENTILE(raw data, RNAD())
PERCENTILE() sorts and interpolates among the raw
data using the number returned by RAND().
The newly generated data X may be DIFFERENT from
the raw data due to interpolation.
Method 3: Simulating Data from Distribution
Uniform
X = RANDBETWEEN(min, max)
X = min + RAND()*(max min)
Normal
X = NORMINV(RAND(), mean, std)
returns the X for a given cumulative probability RAND()
Z = NORMSINV(RAND())
returns the Z for a given cumulative probability RAND()
Exponential
X = (-Mean)*LN(RAND())
or X = (-Mean)*LN(1-RAND())
Normal Distribution
NORMDIST(x, mean, standard_dev, cumulative)
x: value of interest
cumulative = true returns CDF, false returns PDF
X = NORMINV(RAND(), mean, std)
NORMSDIST(z) = standard normal
mean = 0, standard_dev=1
z: value of interest; only returns CDF
Z = NORMSINV(RAND())
Different Distribution Functions
Probability Mass Function (PMF)
Cumulative Distribution Function (CDF)
Poisson
Distribution
Probability Density Function (PDF) Cumulative Distribution Function (CDF)
Exponential
Distribution
Distribution Functions
Poisson
POISSON(x, mean, cumulative)
x: number of events; mean: expected value
cumulative = true returns CDF, false returns PMF
Example: number of customers who arrive in a store
every hour; number of emails you receive everyday
Exponential Distribution
EXPONDIST(x, lambda, cumulative)
x: value of interest; lamda: 1/mean
cumulative = true returns CDF, false returns PDF
X = (-Mean)*LN(RAND()) or X = (-Mean)*LN(1-RAND())
Example: customer inter-arrival time; email inter-arrival
time; fish/bus inter-arrival time
Date and Time Management
Date & Time Management in Excel
2 date systems in Excel (Excel Options > Advanced)
1900 date system (default)
1904 date system
1900 date system
1st Jan 1900 = 1
29th Feb 1900 = 60
This date does not exist!
1st Mar 1900 = 61
60
1904 date system
1st Jan 1904 = 0
29th Feb 1904 = 59
1st Mar 1904 = 60
Date/Day Functions
TODAY()
returns the current date
YEAR(serial_number)
returns the year corresponding to the serial number
Examples:
Wrong: YEAR(14-Jan-05)
OK: YEAR(14-Jan-05)
OK: YEAR(B15) where B15 has value 14-Jan-05
OK: YEAR(39014) = 2006
MONTH(serial_number)
returns the month corresponding to the serial number
How many serial_number will return you the same year?
Same month? Same day?
Date/Day Functions
DAY(serial_number)
returns the day corresponding to the serial number
Examples:
MONTH(39014) = 10, DAY(39014) = 24
So, in fact, 39014 is 24th Oct 2006
DATE(year, month, day)
returns a serial number
WEEKDAY(serial_number, return_type)
Subtracting
WRONG: 14-Jan-05 23-Sep-04
OK: 14-Jan-05 23-Sep-04 = 113
OK: DATE(2005,1,14) DATE(2004,9,23) = 113
Time Functions
Time is stored as the fractional part of the serial
number, that is, the digits to the right of the decimal
point
NOW()
returns the current date and time
E.g. If now is 8:00AM, 25th Dec 2005, then the value of
NOW() is 38711.3333333 where
38711 is the day 25th Dec 2005
3333333 is the time of the day which is 1/3 of the day
So, 8:00AM, 26th Dec 2005 is 38712.3333333
What is the fractional part of noon time?
Time Functions
TIME(hour, minute, second)
returns the serial number to the right of the decimal point in
the format [Link]
HOUR(serial_number)
MINUTE(serial_number)
SECOND(serial_number)
returns the hour, minute and second of a serial number
respectively
HOUR(39461.847) = 20
0.847 is in fact [Link]pm on any day
MINUTE(39461.847) = 19
SECOND(39461.847) = 41
Management of Waiting Lines
Queues arise when the short term demand for service
exceeds the capacity
Most often caused by random variation in service times and the
times between customer arrivals
Queuing models are used to:
Describe the behavior of queuing systems
Determine the level of service to provide
Evaluate alternate configurations for providing service
Simulation is often used to analyze more complex
queuing system
Interarrival
time
Interarrival
time
Observing Queues Recording Arrivals
Service
Start time
Arrival time
Wait time
Inter-arrival
time
server
Service
time
Service
End time
We construct simple macros and assign to buttons to conveniently
record arrival time, service start time and service end time.
Then we compute the inter-arrival time, wait time and service time.
[Link]
Observing Queues Recording Arrivals
Timer
designed for Queue observation and analysis:
records times
customer arrival time
service-start time
service-end time
tabulates intermediate variables
inter-arrival time
Service time
waiting time
system times [=service + waiting]
Timer Macro to record arrival time
Steps 1 & 2
Step 14
Step 9 - ON
Step 13 - OFF
Step 8
Step 7
Step 10
1.D3 = Now(), change to time format
[Link] Array C8:E8 with link to D3
[Link] Array F8:I8 with formulas
[Link] cells
15.
Create buttons & assign macros to
them
[Link]/Macro/Record New Macro
[Link] F9 (to activate Calculate)
7. Select C7 and copy
8. Select C6 and key Ctrl-DownArrow
9. Select Rel. Ref in Macro Record Control Panel
10. Key DownArrow
11. PasteSpecial Value&NumberFormat
12. Key Esc
13. Unselect Rel. Ref in Macro Record Stop
14. Stop Macro Record
Observing Queues Recording Arrivals
Clicker
Timer has limitations: First-Come-First-Serve, Single Server
Clicker is an adaptation of Timer
Counts "arrivals
records arrival times of up to 3 types of customers
by clicking an appropriate button, one for each type of customer
tabulates their cumulative frequency counts for given time
intervals ("bins")
change the time bins to the correct date to capture time stamps
Sample Applications:
Count number of vehicles using a stretch of road
E.g. Motorcars, Motocycles, Buses
Simulating Queues
Given observed/historical/raw data, how can we
model and analyse waiting lines?
Use of Re-sampling
Simulate data for inter-arrival and service times via resampling of raw data through observed distributions e.g.
Inverse of Exponential Distribution: -Average*Ln(Rand())
Inverse of Empirical Distribution: Percentile(DataArray,Rand())
Single Server Queues vs Multi Server Queues
checkout lines, fast food outlets
Simulating Queues
Some definitions (1 Server)
Inter-Arrival
Time
Service
Time
Obtained
using
Inverse
Exponential
Distribution
Re-sampling from
historical data using
Percentile function
Arrival
Time
Service
Start
End
Wait
Time
System
Time
System
Length
No. in queue =
No. of Service End
Time > Arrival Time
Arrival Time +
Inter-Arrival Time
End Time
Arrival Time
If no one in queue,
= Arrival Time
Else
= End Time of last customer
Start Time
Arrival Time
Service Start Time +
Service Time
XDB Bank
Customer
Inter-Arrival
Time
Service
Time
Arrival
Time
Service
Start
End
Wait
Time
System
Time
System
Length
Inter Arrival Time = mean*LN(RAND()) simulate using the
Exponential Function
Service Time = PERCENTILE (service time array, RAND())
Arrival Time = Arrival Time of previous customer + Inter Arrival Time
Service Start Time = MAX (previous customer service end time, arrival
time)
Service End Time = Service Start Time + Service Time
Wait Time = Service Start Time Arrival Time
System Time = Wait Time + Service Time or End Time Arrival Time
System queue length (number of customer in the system at arrival time)
= COUNTIF (all previous customer End Time > Arrival Time)
Traffic intensity = service time/inter arrival time
Concatenate() or &
We can concatenate text together to form longer text by
using the function concatenate() or the & sign
Concatenating text is necessary when you need to enter
criteria as text
Example 1:
Concatenate(Microsoft, , Excel) = Microsoft Excel
Microsoft & & Excel = Microsoft Excel
Example 2:
Given that cell A2 stores the number 20 and cell A3 stores
the text Apples
Concatenate(A2, , A3) = 20 Apples
A2 & & A3 = 20 Apples
Excel automatically converts numbers to text when
concatenating
Countif()
Countif(range, criteria)
Returns the number of cells that satisfy the evaluation
criteria
Range is the range of cells from which you want to count
Criteria is input as text (e.g., <=&D2)
criteria
Return 3
range
XDB Bank c-Server
When there are more than 1 servers
Model same as before except customer service start time can be
earlier
If there are c number of servers, the customer start time will be
the cth largest customer service end time of all previous customer
service end times.
3rd largest
10.00am
Server 1
10.05am
Server 2
10.15am
Server 3
If no. of people in queue < no. of servers,
Service Start Time = Arrival Time
Else if no of people in queue >= no. of servers,
Service
Start Time = (Multiple
Time any Server
becomes available
Some
definitions
Servers)
= nth largest End Time where n = no. of servers
Observing Queues Processing Queues
=IF(J20>=$E$4,LARGE($G$14:G19,$E$4),E20)
Takeaways
Timer & Clicker
Time-based Simulations
Ways to count arrivals
Application of Time functions
Application of Macro Recording
XDB Bank
Use of observed distribution (Exponential, Empirical)
from raw/historical data to generate simulation of future
trials
Reminders
Project Scope Confirmation with Prof. before the
Recess week
Start working on your project
Next class
Review of Week 1 to Week 7 lessons