Uploaded by Jimmy hk

Poisson model application to real life data

advertisement
Poisson model application to real life data
Location: Road in front of Kerr gate
Section studied
CIVE 460 – Spring 2022-2023
Instructor: Dr. Maya Abou Zeid
Assistant Instructor: Ms. Arwa Awad and Ms. Mariam Saber
Group 5: Lawali Franck G. Ki, Jimmy Kiuya, Elizabeth Mukinya
Date: 16/03/2022
Abstract
To understand the arrival pattern of pedestrians at a road section, this experiment was
designed to observe and collect data on the arrival of pedestrians behind the Kerr dormitories, on
Bliss street. The aim was to determine whether the pedestrian arrival pattern can be modeled
using a Poisson process. Data collection took place during a non-peak hour (3:30 PM – 4:00
PM), for a 30-minutes duration of observation with a 20 seconds interval between pedestrians'
arrival count. The arrival pattern of the pedestrians was observed to be random and the data
collected was analyzed using google colab. We made a comparison between the observed pattern
of arrival and the theoretical frequency of arrival based on the poisson probability formula to
check if the values are closed enough to conclude that the observed pattern of pedestrians' arrival
follows a poisson distribution. This comparison was done by determining the statistical
parameter chi-squared which was found to be equal to 11.442. Based on an allowable critical
value of 9.488 at a 95% confidence level, we rejected the null hypothesis that the "Pedestrian
arrival pattern can be modeled using a Poisson process" since the test statistic was greater than
the critical value. The data collected and analyzed may have been affected by multiple
assumptions that were made. Ultimately, it is important to collect multiple set of data on
different days and make sure the time of collection fall in non-peak hours in order not to get false
results.
2
Table of Content
Abstract ........................................................................................................................................... 2
Table of Content ............................................................................................................................. 3
List of Tables .................................................................................................................................. 4
List of Figures ................................................................................................................................. 4
1. Introduction and Theory ............................................................................................................. 5
2. Methodology ............................................................................................................................... 6
2.1. Data Collection and Site Conditions .................................................................................... 6
2.2. Analytical Methods, Fundamental Principles and Equations .............................................. 8
2.2.1 Analytical Methods ........................................................................................................ 8
2.2.2. Fundamental Principles ................................................................................................. 9
2.2.3. Equations....................................................................................................................... 9
2.3. Assumptions....................................................................................................................... 11
3. Results and Calculations ........................................................................................................... 11
3.1. Results ................................................................................................................................ 11
3.2. Sample calculation ............................................................................................................. 15
4. Discussion ................................................................................................................................. 15
5. Conclusion and Recommendation ............................................................................................ 16
Team Members Contribution to report ......................................................................................... 16
References ..................................................................................................................................... 17
Appendix ....................................................................................................................................... 18
3
List of Tables
Table 1: Team Members’ Contribution to Report ........................................................................ 16
Table 2: Raw Data Collected from the Field ................................................................................ 18
List of Figures
Figure 1: Road Section and Point of Observation .......................................................................... 7
Figure 2: Location of Observation on Google Maps taken from class notes .................................. 7
Figure 3:Summary for Observed Frequency and Calculation of the Average Arrival Rate. ........ 12
Figure 4: Summary Table for Theoretical Probability and Theoretical Frequency of Intervals .. 13
Figure 5:Chi-Squared Calculation Summary. Test Statistic = 11.44 ............................................ 14
Figure 6: Bar Plot of Theoretical versus Observed Frequency of Interval ................................... 14
Figure 7: Chi-Squared Distribution Table (Njudang, n.d.) ........................................................... 20
4
1. Introduction and Theory
In this experiment, the objective is to reject or fail to reject the hypothesis that the
number and arrival sequence of pedestrians crossing by a specific point in Bliss street behind the
Kerr dormitories, can be modeled following a Poisson distribution. The premise is that the
frequency of arrival of pedestrians in this section of the road can be thought of as random. The
experiment then aims to check whether the arrival of pedestrians follows a poisson distribution
or should be modeled on another random distribution. This experiment is generated to collect
data on the number of pedestrians passing in a time interval of 20 seconds for a total of 30
minutes and to analyze the results and compare it to the theoretical values expected from a
poisson model. To count the number of pedestrians arriving, a team is posted at a point on the
road and manually numbers the passengers that cross in front of them. From knowing the number
of pedestrians arriving per 20 seconds, it is possible to calculate the total observed pedestrians
that have arrived during the 30 minutes, followed by the probability of a specific number of
pedestrians arriving in 20 seconds and finally be able to compare it with theoretical probabilities
of arrival. In this experiment, we perform this comparison based on a test statistic that we
determine and a standard allowable critical value obtained from the chi-square distribution table.
All calculations and analyses are done in a python code on google colab.
This lab report details the process of collecting and analyzing the data to be able to
answer whether our above hypothesis holds or does not hold. The paper starts by describing how
the data was collected and the settings in which we conducted the experiment. Along with that, it
describes what tools were used to analyze the data and what methods of analysis were adopted.
Afterward, it provides and explains the results and the data collected that are subsequently used
as a basis of discussion and analysis to answer our initial question through statistical tests. The
paper then concludes with the major findings and recommendations for future experiments
5
2. Methodology
2.1. Data Collection and Site Conditions
The material used included: A sheet of A4 paper, a pen/pencil, and a phone to use as a clock.
The team executed the experiment using the following steps:







Locate a point on the road section of interest and stand where as an observer you will not
interact with the passing pedestrians.
On the sheet of paper, create a table with the number of intervals required to cover the 30
minutes counting time in one column and the corresponding number of pedestrians arriving
in the other column. The number of intervals can be obtained by dividing the total time of
observation by the provided time of each interval(30min*60/20).
Make sure to set a clock using a phone or a stopwatch.
Divide the team with one person in charge of keeping the time and recording the number
of pedestrians per interval on the sheet of paper, and the others in charge of counting.
When everything is set, start the clock and observe the number of pedestrians for 30
minutes without interruption.
Only count people crossing in front of you which include adults, and children. Exclude
anyone with any sort of 2-wheel vehicle or more.
Perform the count in both directions of the road section.
Table 2 the appendix shows the raw data collected from the above steps.
The experiment was performed on Friday, February 3 from 3:00 PM to 3:30 PM in a
section of bliss street behind Kerr dormitories in AUB. See Figures 1 and 2 for more details on
the road and the observation place. The temperature varied from 12 degrees Celsius to 15
degrees Celsius with light rain showers (Lebanon Historical Past Weather, n.d.). To effectively
collect the data, the team was organized as shown in the following:



Counting of one side of the road: Elizabeth Mukinya
Counting of the other side of the road: Lawali Franck Ghislain Ki
Timekeeping and data recording: Jimmy Kiuya
6
Point of
observation
Figure 1: Road Section and Point of Observation
Figure 2: Location of Observation on Google Maps taken from class notes
7
2.2. Analytical Methods, Fundamental Principles and Equations
2.2.1 Analytical Methods
The count was done for every pedestrian that passed in front of the team in an interval of
20 seconds for a total time of 30 minutes. The total number of intervals amounted to 90. For easy
manipulation and use of the data, the team transferred the raw data to an excel sheet to be used
later in the analysis.
The analysis of the data was done in google colab. The code used in analyzing the data is
presented following the below main sequence:








Importing the data to the system and getting more information about it.
Grouping the data according to the number of pedestrians arriving and their corresponding
frequencies.
Running some operational computations to get the arrival rate.
Adding new columns and calculating the theoretical probabilities and frequency of arrival
using the formula provided below.
Merging the last rows of our data frame having theoretical frequencies <5 according to the
requirement of the procedure for chi-square testing.
Plotting the number of intervals versus the number of pedestrians arriving.
Performing the Chi-square goodness-of-fit test on the merged data and saving its value for
analysis and interpretation
Using the table from Figure 7 in the appendix to determine the critical value for the poisson
model based on our data with a degree of freedom k-1=5-1=4. If the critical value is less
than the obtained chi-squared value, the initial hypothesis is rejected. Otherwise, we fail to
reject the initial hypothesis.
8
2.2.2. Fundamental Principles
To be able to understand the arrival pattern of pedestrians in a specific section of a road,
it is important to be able to model it to a theoretical distribution that will allow us to calculate
probabilities and predict the performance of that road. The model used in this experiment is the
poisson model. The poisson model is a random distribution that provides the probability of a
discrete event occurring given certain parameters. In theory, the poisson model can be used to
predict the arrival pattern of pedestrians or vehicles in lightly congested traffic regimes. Indeed,
the arrival of pedestrians can be considered a non-uniform or a random process because any
point in time is as likely as any other to see a pedestrian arrive and the arrival of a pedestrian
does not affect the probability of the arrival of another pedestrian. However, a question remains:
"which kind of random and discrete distribution does the arrival pattern of pedestrians follow?".
Based on the theory above, if the arrival pattern of pedestrians follows a lightly congested
regime, it is possible to model it on a poisson distribution. General requirements of the
application of the poisson model are given in the following:
The arrival of one pedestrian does not influence the arrival of other pedestrians:

The arrival of one pedestrian does not influence the arrival of other pedestrians.

The data modeled should be discrete and we should be able to calculate the probability of
an event.

Events should not occur at the same time.
Given the above, it is possible to make a hypothesis that the arrival pattern of pedestrians
can be modeled on a poisson distribution. This hypothesis is called the null hypothesis. An
alternative hypothesis is that the arrival pattern of pedestrians cannot be modeled as a poisson
model. To check if the null hypothesis cannot be rejected, it is necessary to collect experimental
data and compare these data to the theoretical values of a poisson model. The principle of
rejecting or failing to reject our initial hypothesis is based on making use of the chi-squared
goodness-of-fit test. The test allows us to understand how far the values collected in an
experiment are from the theoretical values.
2.2.3. Equations
After collecting the number of pedestrians crossing a point per interval of time, the
following method and equation are used to understand the data. The total number of pedestrians
is obtained by summing the number in each 20-second interval. The observed frequency of
intervals is obtained by counting the number of intervals that have the same number of
pedestrians arriving.
The theoretical probability is obtained as follows:
9
𝑷(𝒏) =
(𝝀𝒕)𝒏 𝒆−𝝀𝒕
𝒏!
(Eq. 1)
𝑃(𝑛) = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛 𝑝𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛𝑠 𝑎𝑟𝑟𝑖𝑣𝑎𝑙
𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛𝑠
𝜆 = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑜𝑓 𝑝𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛𝑠 𝑖𝑛
𝑝𝑒𝑑
𝑠𝑒𝑐
𝑡 = 𝑡𝑖𝑚𝑒 𝑖𝑛 𝑜𝑛𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙(sec)
The average rate of arrival of pedestrians (𝝀) is given by:
𝝀=
𝑻𝒐𝒕𝒂𝒍 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒑𝒆𝒅𝒆𝒔𝒕𝒊𝒂𝒏𝒔
𝑻𝒐𝒕𝒂𝒍 𝑻𝒊𝒎𝒆
(Eq. 2)
The theoretical expected frequency of interval is given by equation
𝑭𝒕 = 𝑷(𝒏) ∗ 𝑰
(Eq. 3)
𝐹𝑡 = 𝑇ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑛 𝑝𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛𝑠 𝑎𝑟𝑟𝑖𝑣𝑖𝑛𝑔
𝐼 = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠
10
To determine the experiment Chi-Squared value, we use the equation
𝒌−𝟏
(𝟎𝒊 − 𝒕𝒊 )𝟐
𝝌 =∑
𝒕𝒊
𝟐
(Eq. 4)
𝒊=𝟎
𝑖 = 𝑖𝑛𝑑𝑒𝑥 𝑓𝑜𝑟 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛𝑠 𝑖𝑛 𝑎𝑛 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
𝑂𝑖 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 𝑖𝑛 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑖
𝑡𝑖 = 𝑡ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 𝑖𝑛 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦1 𝑖
𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠 𝑓𝑜𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑎𝑛𝑑 𝑡ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
The critical value is obtained by using the table of Figure 7 in the appendix.
2.3. Assumptions
To collect and analyze our data, the following assumptions are made:





All the pedestrians arriving were counted during the data collection.
The arrival of pedestrians is independent of each other.
An average rate of pedestrian passage can be computed.
The time of count is a non-peak hour when the traffic regime is lightly congested.
The pattern of arrival of that evening is representative of every other evening.
3. Results and Calculations
3.1. Results
The results of the data collection are provided in Table 2 in the appendix.
The following calculations were computed in google colab to understand and analyze the
data. To obtain the average rate of arrival (lamda), we divided the total number of observed
pedestrians by the total observation time (30 minutes = 1800 seconds). Figure 3 shows the
python code for the described calculation and the result. The average rate of arrival was found to
be 0.085pedestrian/second.
1
A category here refers to a specific same number of pedestrians that have arrived in different time intervals.
11
Figure 3:Summary for Observed Frequency and Calculation of the Average Arrival Rate.
To calculate the theoretical probability of n pedestrians arriving in a 20-second interval,
(Eq.1) was used with t=20 sec and λ = 0.085 pedestrians/sec. Subsequently, to obtain the
theoretical frequency of interval, we defined a function that multiplies the theoretical probability
by the total number of the intervals (90) (See notation in Eq. 5). Figure 4 shows the summary of
the values of the theoretical probabilities and the theoretical frequency of intervals.
𝒕𝒊 = 𝒑(𝒊)𝒙(𝒕𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒊𝒏𝒕𝒆𝒓𝒗𝒂𝒍𝒔)
(Eq. 5)
12
Figure 4: Summary Table for Theoretical Probability and Theoretical Frequency of Intervals
The chi-squared test was done as shown in Figure 5 using (Eq. 4). Figure 6 shows the bar
plot of observed frequencies of intervals against the theoretical frequencies to provide a visual
appreciation of how the data differ from experimental to theoretical. In order to determine the
critical value for the Chi-squared test, we used a confidence interval of 95%, with a degree of
freedom(df) equal to the number of categories – 1. In our case df=4. Therefore, the critical value
obtained from Figure 7 in the appendix is 9.488.
13
Figure 5:Chi-Squared Calculation Summary. Test Statistic = 11.44
Figure 6: Bar Plot of Theoretical versus Observed Frequency of Interval
14
3.2. Sample calculation
First row
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑃𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛 = 0 ∗ 26 = 0
𝑇ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑃(0) =
(0.085 × 20)2 𝑒 −0.085×20
= 0.183
0!
𝑇ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 𝑡𝑖 = 0.183 × 90 = 16.47
𝐶ℎ𝑖𝑠𝑞𝑢𝑎𝑟𝑒𝑑 =
𝜒02
(16.47 − 16)2
=
= 5.56
16
4. Discussion
Let H1 be the hypothesis that the observed data can be described by a poisson
distribution. And let H0 be the hypothesis that the observed data cannot be described by the
poisson distribution.
With the test statistic, 𝜒 2 = 11.44, and the critical value, χα2 =9.488, comparing the two
values shows the test statistic is larger than the critical value (χ2>𝜒𝛼2 ). This means that the
observed and theoretical frequencies are significantly different hence the pedestrian arrival
pattern in our experiment cannot be described by the poisson distribution model and this rejects
the null hypothesis.
The significance of the results drawn above is that we cannot use the sample of data that
we collected to draw general conclusions about pedestrian arrival using the poisson model.
Multiple parameters may have affected the data collection which can justify the discrepancy in
the data collected. The poisson model is based on non-peak hours of observation. However,
given that the experiment was conducted from 3:00 PM to 3:30 Pm, it is possible that due to
students and professors leaving campus for their homes, the flow regime was at its peak. As such
the poisson model would not be the best to represent the arrival pattern of pedestrians.
In addition, given that Friday afternoons mark the beginning of the weekend for most,
many people may be using the streets to share time with friends and families. This factor could
have also created denser traffic. Furthermore, errors and assumptions could have led to
inaccurate data. Given that, the poisson distribution model may not be the best for the data.
Below are some of the errors that might have occurred in our experiments

Incorrect timing of the end and the start of an interval probably due to distractions.

Pedestrians that pass just at the end and the start of an interval could have been double
counted or not counted.
15

Distractions which can lead to incorrect counting.

Pedestrians who might pass behind high vehicles in the road, hence passing uncounted.

Passage of pedestrians back and forth.
5. Conclusion and Recommendation
Our conclusion regarding the null hypothesis we made at the start of the experiment,
"arrival patterns of pedestrians can be modeled using a poisson process", we ended up rejecting it
partly due to the methodology which did not account for more parameters and also an
insufficient set of data which led us to draw such conclusion. The aim was to conduct this study
during the non-peak hours for a 30-minute duration, in 20 seconds intervals, counting the arrival
of pedestrians on a designated spot along Bliss street, Kerr. The peak hours for pedestrians run
between 08:00 AM to 10:00 AM and between 3:00 PM to 6:00 PM. The experiment was
intended to be conducted during the non-peak hours of the day to attain the random arrival
pattern but that was not the case. As a recommendation, experimenting during different times of
the day during the non-peak hours would yield a more pronounced effect when it comes to the
analysis of the data.
We assumed that the data collected on that particular day represent the patterns of all
other days, meaning that we only had one data set which in statistics is not enough to conclude.
As a recommendation, the experiment ought to be conducted multiple times (three times is
standard) on different days and include other parameters in the observation such as the weather
(temperature of a given day, wind, precipitation, etc.). The more data we collect, the easier it gets
to observe and analyze the pedestrian's pattern
Team Members’ Contribution to Report
Table 1: Team Members’ Contribution to Report
Jimmy Kiuya
Tasks
-
Abstract
Results and
calculations
Conclusion
Elizabeth Mukinya
-Methodology
-Discussion
Lawali Franck G.
Ki
- Introduction
- Python code
-Report Layout and
referencing.
16
References

Lebanon historical past weather. (n.d.). Weather25.Com. Retrieved February 9, 2023, from
https://www.weather25.com/asia/lebanon?page=past-weather Print.

Njudang, E. (n.d.). Chi-Square Distribution Table. Retrieved February 11, 2023, from
https://www.academia.edu/36551188/Chi_Square_Distribution_Table Print.
17
Appendix
Table 2: Raw Data Collected from the Field
Intervals
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Number of Pedestrians Arriving
1
1
1
0
3
0
3
4
0
1
3
0
2
7
2
0
0
0
3
3
2
1
1
0
0
3
1
2
1
3
1
0
0
0
1
3
2
4
0
18
Intervals
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
Number of Pedestrians Arriving
1
2
5
2
0
3
0
0
5
1
1
0
0
0
0
2
2
4
2
1
1
1
3
0
1
0
1
4
1
0
5
0
1
3
1
0
2
5
2
0
4
2
2
1
19
Intervals
84
85
86
87
88
89
90
Number of Pedestrians Arriving
3
4
2
2
2
5
5
Figure 7: Chi-Squared Distribution Table (Njudang, n.d.)
20
Download