Poisson model application to real life data Location: Road in front of Kerr gate Section studied CIVE 460 – Spring 2022-2023 Instructor: Dr. Maya Abou Zeid Assistant Instructor: Ms. Arwa Awad and Ms. Mariam Saber Group 5: Lawali Franck G. Ki, Jimmy Kiuya, Elizabeth Mukinya Date: 16/03/2022 Abstract To understand the arrival pattern of pedestrians at a road section, this experiment was designed to observe and collect data on the arrival of pedestrians behind the Kerr dormitories, on Bliss street. The aim was to determine whether the pedestrian arrival pattern can be modeled using a Poisson process. Data collection took place during a non-peak hour (3:30 PM – 4:00 PM), for a 30-minutes duration of observation with a 20 seconds interval between pedestrians' arrival count. The arrival pattern of the pedestrians was observed to be random and the data collected was analyzed using google colab. We made a comparison between the observed pattern of arrival and the theoretical frequency of arrival based on the poisson probability formula to check if the values are closed enough to conclude that the observed pattern of pedestrians' arrival follows a poisson distribution. This comparison was done by determining the statistical parameter chi-squared which was found to be equal to 11.442. Based on an allowable critical value of 9.488 at a 95% confidence level, we rejected the null hypothesis that the "Pedestrian arrival pattern can be modeled using a Poisson process" since the test statistic was greater than the critical value. The data collected and analyzed may have been affected by multiple assumptions that were made. Ultimately, it is important to collect multiple set of data on different days and make sure the time of collection fall in non-peak hours in order not to get false results. 2 Table of Content Abstract ........................................................................................................................................... 2 Table of Content ............................................................................................................................. 3 List of Tables .................................................................................................................................. 4 List of Figures ................................................................................................................................. 4 1. Introduction and Theory ............................................................................................................. 5 2. Methodology ............................................................................................................................... 6 2.1. Data Collection and Site Conditions .................................................................................... 6 2.2. Analytical Methods, Fundamental Principles and Equations .............................................. 8 2.2.1 Analytical Methods ........................................................................................................ 8 2.2.2. Fundamental Principles ................................................................................................. 9 2.2.3. Equations....................................................................................................................... 9 2.3. Assumptions....................................................................................................................... 11 3. Results and Calculations ........................................................................................................... 11 3.1. Results ................................................................................................................................ 11 3.2. Sample calculation ............................................................................................................. 15 4. Discussion ................................................................................................................................. 15 5. Conclusion and Recommendation ............................................................................................ 16 Team Members Contribution to report ......................................................................................... 16 References ..................................................................................................................................... 17 Appendix ....................................................................................................................................... 18 3 List of Tables Table 1: Team Members’ Contribution to Report ........................................................................ 16 Table 2: Raw Data Collected from the Field ................................................................................ 18 List of Figures Figure 1: Road Section and Point of Observation .......................................................................... 7 Figure 2: Location of Observation on Google Maps taken from class notes .................................. 7 Figure 3:Summary for Observed Frequency and Calculation of the Average Arrival Rate. ........ 12 Figure 4: Summary Table for Theoretical Probability and Theoretical Frequency of Intervals .. 13 Figure 5:Chi-Squared Calculation Summary. Test Statistic = 11.44 ............................................ 14 Figure 6: Bar Plot of Theoretical versus Observed Frequency of Interval ................................... 14 Figure 7: Chi-Squared Distribution Table (Njudang, n.d.) ........................................................... 20 4 1. Introduction and Theory In this experiment, the objective is to reject or fail to reject the hypothesis that the number and arrival sequence of pedestrians crossing by a specific point in Bliss street behind the Kerr dormitories, can be modeled following a Poisson distribution. The premise is that the frequency of arrival of pedestrians in this section of the road can be thought of as random. The experiment then aims to check whether the arrival of pedestrians follows a poisson distribution or should be modeled on another random distribution. This experiment is generated to collect data on the number of pedestrians passing in a time interval of 20 seconds for a total of 30 minutes and to analyze the results and compare it to the theoretical values expected from a poisson model. To count the number of pedestrians arriving, a team is posted at a point on the road and manually numbers the passengers that cross in front of them. From knowing the number of pedestrians arriving per 20 seconds, it is possible to calculate the total observed pedestrians that have arrived during the 30 minutes, followed by the probability of a specific number of pedestrians arriving in 20 seconds and finally be able to compare it with theoretical probabilities of arrival. In this experiment, we perform this comparison based on a test statistic that we determine and a standard allowable critical value obtained from the chi-square distribution table. All calculations and analyses are done in a python code on google colab. This lab report details the process of collecting and analyzing the data to be able to answer whether our above hypothesis holds or does not hold. The paper starts by describing how the data was collected and the settings in which we conducted the experiment. Along with that, it describes what tools were used to analyze the data and what methods of analysis were adopted. Afterward, it provides and explains the results and the data collected that are subsequently used as a basis of discussion and analysis to answer our initial question through statistical tests. The paper then concludes with the major findings and recommendations for future experiments 5 2. Methodology 2.1. Data Collection and Site Conditions The material used included: A sheet of A4 paper, a pen/pencil, and a phone to use as a clock. The team executed the experiment using the following steps: Locate a point on the road section of interest and stand where as an observer you will not interact with the passing pedestrians. On the sheet of paper, create a table with the number of intervals required to cover the 30 minutes counting time in one column and the corresponding number of pedestrians arriving in the other column. The number of intervals can be obtained by dividing the total time of observation by the provided time of each interval(30min*60/20). Make sure to set a clock using a phone or a stopwatch. Divide the team with one person in charge of keeping the time and recording the number of pedestrians per interval on the sheet of paper, and the others in charge of counting. When everything is set, start the clock and observe the number of pedestrians for 30 minutes without interruption. Only count people crossing in front of you which include adults, and children. Exclude anyone with any sort of 2-wheel vehicle or more. Perform the count in both directions of the road section. Table 2 the appendix shows the raw data collected from the above steps. The experiment was performed on Friday, February 3 from 3:00 PM to 3:30 PM in a section of bliss street behind Kerr dormitories in AUB. See Figures 1 and 2 for more details on the road and the observation place. The temperature varied from 12 degrees Celsius to 15 degrees Celsius with light rain showers (Lebanon Historical Past Weather, n.d.). To effectively collect the data, the team was organized as shown in the following: Counting of one side of the road: Elizabeth Mukinya Counting of the other side of the road: Lawali Franck Ghislain Ki Timekeeping and data recording: Jimmy Kiuya 6 Point of observation Figure 1: Road Section and Point of Observation Figure 2: Location of Observation on Google Maps taken from class notes 7 2.2. Analytical Methods, Fundamental Principles and Equations 2.2.1 Analytical Methods The count was done for every pedestrian that passed in front of the team in an interval of 20 seconds for a total time of 30 minutes. The total number of intervals amounted to 90. For easy manipulation and use of the data, the team transferred the raw data to an excel sheet to be used later in the analysis. The analysis of the data was done in google colab. The code used in analyzing the data is presented following the below main sequence: Importing the data to the system and getting more information about it. Grouping the data according to the number of pedestrians arriving and their corresponding frequencies. Running some operational computations to get the arrival rate. Adding new columns and calculating the theoretical probabilities and frequency of arrival using the formula provided below. Merging the last rows of our data frame having theoretical frequencies <5 according to the requirement of the procedure for chi-square testing. Plotting the number of intervals versus the number of pedestrians arriving. Performing the Chi-square goodness-of-fit test on the merged data and saving its value for analysis and interpretation Using the table from Figure 7 in the appendix to determine the critical value for the poisson model based on our data with a degree of freedom k-1=5-1=4. If the critical value is less than the obtained chi-squared value, the initial hypothesis is rejected. Otherwise, we fail to reject the initial hypothesis. 8 2.2.2. Fundamental Principles To be able to understand the arrival pattern of pedestrians in a specific section of a road, it is important to be able to model it to a theoretical distribution that will allow us to calculate probabilities and predict the performance of that road. The model used in this experiment is the poisson model. The poisson model is a random distribution that provides the probability of a discrete event occurring given certain parameters. In theory, the poisson model can be used to predict the arrival pattern of pedestrians or vehicles in lightly congested traffic regimes. Indeed, the arrival of pedestrians can be considered a non-uniform or a random process because any point in time is as likely as any other to see a pedestrian arrive and the arrival of a pedestrian does not affect the probability of the arrival of another pedestrian. However, a question remains: "which kind of random and discrete distribution does the arrival pattern of pedestrians follow?". Based on the theory above, if the arrival pattern of pedestrians follows a lightly congested regime, it is possible to model it on a poisson distribution. General requirements of the application of the poisson model are given in the following: The arrival of one pedestrian does not influence the arrival of other pedestrians: The arrival of one pedestrian does not influence the arrival of other pedestrians. The data modeled should be discrete and we should be able to calculate the probability of an event. Events should not occur at the same time. Given the above, it is possible to make a hypothesis that the arrival pattern of pedestrians can be modeled on a poisson distribution. This hypothesis is called the null hypothesis. An alternative hypothesis is that the arrival pattern of pedestrians cannot be modeled as a poisson model. To check if the null hypothesis cannot be rejected, it is necessary to collect experimental data and compare these data to the theoretical values of a poisson model. The principle of rejecting or failing to reject our initial hypothesis is based on making use of the chi-squared goodness-of-fit test. The test allows us to understand how far the values collected in an experiment are from the theoretical values. 2.2.3. Equations After collecting the number of pedestrians crossing a point per interval of time, the following method and equation are used to understand the data. The total number of pedestrians is obtained by summing the number in each 20-second interval. The observed frequency of intervals is obtained by counting the number of intervals that have the same number of pedestrians arriving. The theoretical probability is obtained as follows: 9 𝑷(𝒏) = (𝝀𝒕)𝒏 𝒆−𝝀𝒕 𝒏! (Eq. 1) 𝑃(𝑛) = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛 𝑝𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛𝑠 𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛𝑠 𝜆 = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑜𝑓 𝑝𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛𝑠 𝑖𝑛 𝑝𝑒𝑑 𝑠𝑒𝑐 𝑡 = 𝑡𝑖𝑚𝑒 𝑖𝑛 𝑜𝑛𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙(sec) The average rate of arrival of pedestrians (𝝀) is given by: 𝝀= 𝑻𝒐𝒕𝒂𝒍 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒑𝒆𝒅𝒆𝒔𝒕𝒊𝒂𝒏𝒔 𝑻𝒐𝒕𝒂𝒍 𝑻𝒊𝒎𝒆 (Eq. 2) The theoretical expected frequency of interval is given by equation 𝑭𝒕 = 𝑷(𝒏) ∗ 𝑰 (Eq. 3) 𝐹𝑡 = 𝑇ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑛 𝑝𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛𝑠 𝑎𝑟𝑟𝑖𝑣𝑖𝑛𝑔 𝐼 = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 10 To determine the experiment Chi-Squared value, we use the equation 𝒌−𝟏 (𝟎𝒊 − 𝒕𝒊 )𝟐 𝝌 =∑ 𝒕𝒊 𝟐 (Eq. 4) 𝒊=𝟎 𝑖 = 𝑖𝑛𝑑𝑒𝑥 𝑓𝑜𝑟 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛𝑠 𝑖𝑛 𝑎𝑛 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑂𝑖 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 𝑖𝑛 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑖 𝑡𝑖 = 𝑡ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 𝑖𝑛 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦1 𝑖 𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠 𝑓𝑜𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑎𝑛𝑑 𝑡ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 The critical value is obtained by using the table of Figure 7 in the appendix. 2.3. Assumptions To collect and analyze our data, the following assumptions are made: All the pedestrians arriving were counted during the data collection. The arrival of pedestrians is independent of each other. An average rate of pedestrian passage can be computed. The time of count is a non-peak hour when the traffic regime is lightly congested. The pattern of arrival of that evening is representative of every other evening. 3. Results and Calculations 3.1. Results The results of the data collection are provided in Table 2 in the appendix. The following calculations were computed in google colab to understand and analyze the data. To obtain the average rate of arrival (lamda), we divided the total number of observed pedestrians by the total observation time (30 minutes = 1800 seconds). Figure 3 shows the python code for the described calculation and the result. The average rate of arrival was found to be 0.085pedestrian/second. 1 A category here refers to a specific same number of pedestrians that have arrived in different time intervals. 11 Figure 3:Summary for Observed Frequency and Calculation of the Average Arrival Rate. To calculate the theoretical probability of n pedestrians arriving in a 20-second interval, (Eq.1) was used with t=20 sec and λ = 0.085 pedestrians/sec. Subsequently, to obtain the theoretical frequency of interval, we defined a function that multiplies the theoretical probability by the total number of the intervals (90) (See notation in Eq. 5). Figure 4 shows the summary of the values of the theoretical probabilities and the theoretical frequency of intervals. 𝒕𝒊 = 𝒑(𝒊)𝒙(𝒕𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒊𝒏𝒕𝒆𝒓𝒗𝒂𝒍𝒔) (Eq. 5) 12 Figure 4: Summary Table for Theoretical Probability and Theoretical Frequency of Intervals The chi-squared test was done as shown in Figure 5 using (Eq. 4). Figure 6 shows the bar plot of observed frequencies of intervals against the theoretical frequencies to provide a visual appreciation of how the data differ from experimental to theoretical. In order to determine the critical value for the Chi-squared test, we used a confidence interval of 95%, with a degree of freedom(df) equal to the number of categories – 1. In our case df=4. Therefore, the critical value obtained from Figure 7 in the appendix is 9.488. 13 Figure 5:Chi-Squared Calculation Summary. Test Statistic = 11.44 Figure 6: Bar Plot of Theoretical versus Observed Frequency of Interval 14 3.2. Sample calculation First row 𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑃𝑒𝑑𝑒𝑠𝑡𝑟𝑖𝑎𝑛 = 0 ∗ 26 = 0 𝑇ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑃(0) = (0.085 × 20)2 𝑒 −0.085×20 = 0.183 0! 𝑇ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 𝑡𝑖 = 0.183 × 90 = 16.47 𝐶ℎ𝑖𝑠𝑞𝑢𝑎𝑟𝑒𝑑 = 𝜒02 (16.47 − 16)2 = = 5.56 16 4. Discussion Let H1 be the hypothesis that the observed data can be described by a poisson distribution. And let H0 be the hypothesis that the observed data cannot be described by the poisson distribution. With the test statistic, 𝜒 2 = 11.44, and the critical value, χα2 =9.488, comparing the two values shows the test statistic is larger than the critical value (χ2>𝜒𝛼2 ). This means that the observed and theoretical frequencies are significantly different hence the pedestrian arrival pattern in our experiment cannot be described by the poisson distribution model and this rejects the null hypothesis. The significance of the results drawn above is that we cannot use the sample of data that we collected to draw general conclusions about pedestrian arrival using the poisson model. Multiple parameters may have affected the data collection which can justify the discrepancy in the data collected. The poisson model is based on non-peak hours of observation. However, given that the experiment was conducted from 3:00 PM to 3:30 Pm, it is possible that due to students and professors leaving campus for their homes, the flow regime was at its peak. As such the poisson model would not be the best to represent the arrival pattern of pedestrians. In addition, given that Friday afternoons mark the beginning of the weekend for most, many people may be using the streets to share time with friends and families. This factor could have also created denser traffic. Furthermore, errors and assumptions could have led to inaccurate data. Given that, the poisson distribution model may not be the best for the data. Below are some of the errors that might have occurred in our experiments Incorrect timing of the end and the start of an interval probably due to distractions. Pedestrians that pass just at the end and the start of an interval could have been double counted or not counted. 15 Distractions which can lead to incorrect counting. Pedestrians who might pass behind high vehicles in the road, hence passing uncounted. Passage of pedestrians back and forth. 5. Conclusion and Recommendation Our conclusion regarding the null hypothesis we made at the start of the experiment, "arrival patterns of pedestrians can be modeled using a poisson process", we ended up rejecting it partly due to the methodology which did not account for more parameters and also an insufficient set of data which led us to draw such conclusion. The aim was to conduct this study during the non-peak hours for a 30-minute duration, in 20 seconds intervals, counting the arrival of pedestrians on a designated spot along Bliss street, Kerr. The peak hours for pedestrians run between 08:00 AM to 10:00 AM and between 3:00 PM to 6:00 PM. The experiment was intended to be conducted during the non-peak hours of the day to attain the random arrival pattern but that was not the case. As a recommendation, experimenting during different times of the day during the non-peak hours would yield a more pronounced effect when it comes to the analysis of the data. We assumed that the data collected on that particular day represent the patterns of all other days, meaning that we only had one data set which in statistics is not enough to conclude. As a recommendation, the experiment ought to be conducted multiple times (three times is standard) on different days and include other parameters in the observation such as the weather (temperature of a given day, wind, precipitation, etc.). The more data we collect, the easier it gets to observe and analyze the pedestrian's pattern Team Members’ Contribution to Report Table 1: Team Members’ Contribution to Report Jimmy Kiuya Tasks - Abstract Results and calculations Conclusion Elizabeth Mukinya -Methodology -Discussion Lawali Franck G. Ki - Introduction - Python code -Report Layout and referencing. 16 References Lebanon historical past weather. (n.d.). Weather25.Com. Retrieved February 9, 2023, from https://www.weather25.com/asia/lebanon?page=past-weather Print. Njudang, E. (n.d.). Chi-Square Distribution Table. Retrieved February 11, 2023, from https://www.academia.edu/36551188/Chi_Square_Distribution_Table Print. 17 Appendix Table 2: Raw Data Collected from the Field Intervals 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Number of Pedestrians Arriving 1 1 1 0 3 0 3 4 0 1 3 0 2 7 2 0 0 0 3 3 2 1 1 0 0 3 1 2 1 3 1 0 0 0 1 3 2 4 0 18 Intervals 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 Number of Pedestrians Arriving 1 2 5 2 0 3 0 0 5 1 1 0 0 0 0 2 2 4 2 1 1 1 3 0 1 0 1 4 1 0 5 0 1 3 1 0 2 5 2 0 4 2 2 1 19 Intervals 84 85 86 87 88 89 90 Number of Pedestrians Arriving 3 4 2 2 2 5 5 Figure 7: Chi-Squared Distribution Table (Njudang, n.d.) 20