Matthew Linnell 8 December 2023 Changes in Camera Resolution Over Time Abstract In an effort to show that cameras have increased in performance over time, the specifications of over a thousand cameras were analyzed to see if maximum resolution has increased. By dividing the data into two samples, one for cameras released after 2003 and another for cameras released before or during 2003, we can measure whether newer cameras actually perform better than their predecessors. This analysis shows that the increase in performance between these two groups is statistically significant. This can be used as an indicator that camera performance has increased over time. Introduction Most technologies used by humans have been steadily improving over time, especially since the industrial revolution. This is a natural result of human progress and invention in the pursuit of an increased standard of living. Cameras are an interesting part of this progression because they themselves have improved over time, but, more importantly, their progression and wider distribution have been key to capturing and remembering the progress that humans have made. The continued progression of camera technology will also play an important role in the development of other technologies such as autonomous vehicles. In order to measure the progress of camera technology, I used the data found in Kaggle’s 1000 Cameras Dataset. This dataset includes the specifications on over 1000 camera models that were released between 1994 and 2007. The data included in this dataset includes things such as release year, maximum and minimum resolutions, focus range, storage included, etc. For this analysis, I will be focusing on the release year to measure age and the maximum resolution to measure camera performance. Maximum resolution simply describes the number of pixels that a camera is capable of using while capturing a picture. The more pixels that are used (higher resolution), the clearer and sharper the image will appear. In this report, I will test the hypothesis that newer cameras have a higher maximum resolution than older cameras. Methods The population that is being analyzed consists of all cameras that have ever been made by humans. In order to see the effect that time has had, I treated the dataset as two independent samples. While this is only partially correct, it allows us to view the increase in performance of newer cameras. While the 1000 Cameras Dataset is extensive, it obviously does not include the statistics of every camera ever made. This means that the data should be viewed as a sample of a larger population, or in this case, samples of two distinct populations with the populations being cameras released before or during 2003 and the other being cameras released after 2003. These two groups can be used to represent “old” and “new” cameras respectively. 2003 was chosen as the cut-off so that the number of data points would be roughly even between the two groups. Camera performance is not an easily quantifiable attribute. I used a camera’s maximum resolution as a representation of a camera’s performance. While this is certainly not a comprehensive measure of a camera’s performance, it does help us understand the best that a camera can do in terms of photo quality. In order to compare the performance of each group of cameras, I calculated the mean (the central tendency) and standard deviation (the spread) of the samples. This allowed me to fit a normal distribution to the data from each group. This is possible because the data is normally distributed, as seen in Figure 1. I can then use those distributions to perform a 2-sample independent t-test to test the null hypothesis that older cameras, on average, have a higher maximum resolution than newer cameras. Figure 1: Probability mass functions for old and new cameras as a function of maximum resolution. Results I found that the two samples had a difference in mean maximum resolution of 1044p. This number is considered the effect between the samples that is to be tested. This effect was then tested using the previously stated null hypothesis that cameras released before and during 2003 have a higher maximum resolution than cameras released after 2003. Upon comparing the normal distributions that had been fit to the sample data, it was analytically found that the null hypothesis had a p-value of 7.52e-142. The p-value is the fractional probability that the null hypothesis could be true and still see this large of an effect. This value is so low that it can be considered 0. Table 1: Descriptive statistics of the maximum resolutions of old cameras and new cameras. Mean (p) Standard Deviation (p) New Cameras 2915 535 Old Cameras 1871 585 Discussion P-values essentially represent the probability that the null hypothesis is true. The p-value found for the hypothesis test was so low that it can be considered nearly impossible that the null hypothesis is true. This means that the relationship between release year and maximum resolution is statistically significant and can be assumed to be a real correlation. This result is not surprising to anyone who regularly uses even a phone camera. Comparing the capabilities of a phone camera from 2003, such as a Nokia 7650, to the latest iPhone makes the effect immediately obvious in a qualitative way. While phone cameras were not the focus of this analysis, this is a good way to understand the results because phone cameras are the only cameras that the average person will use on a day-to-day basis. Even within the context of this dataset, the result of the t-test could have been predicted by inspecting the distributions of the samples, as shown in Figure 2. A quick way to check if the effect between two samples could be real is to inspect the amount that the distributions overlap. Figure 2 shows that the distributions of the two samples don’t have a very large overlap. This indicates that the two groups likely have a significant difference between them. The t-test confirms this guess. Figure 2: Normal distributions of maximum resolution fitted to the data for cameras released before or during 2003 and cameras released after 2003. While this analysis may seem very conclusive, it is worth noting the limitations of this analysis in order to better understand how the results should be interpreted. One such limitation is that this is only a proof of the alternative hypothesis that cameras released after 2003 have a higher resolution that cameras released before or during 2003. While this result can be assumed to extend to all the conditions specified, some uncertainty in this result is introduced by the fact that the dataset only goes from 1994 to 2007. It does not invalidate the findings of this study, but it does reduce the certainty of the results. Based on this analysis alone, we cannot definitively conclude that cameras have increased in performance over time. As previously stated, maximum resolution is not a comprehensive measure of camera performance and improvement from one time period to another doesn’t necessarily mean continuous growth so, while this analysis is a strong indicator, it cannot be considered a proof of increasing performance because there would be more factors to consider. However, by combining these findings with other research done into the progression of cameras, one can conclude that camera technology has improved as time has gone on.