Uploaded by Matthew Linnell

StatsProjectFinal

advertisement
Matthew Linnell
8 December 2023
Changes in Camera Resolution Over Time
Abstract
In an effort to show that cameras have increased in performance over time, the
specifications of over a thousand cameras were analyzed to see if maximum resolution has
increased. By dividing the data into two samples, one for cameras released after 2003 and
another for cameras released before or during 2003, we can measure whether newer cameras
actually perform better than their predecessors. This analysis shows that the increase in
performance between these two groups is statistically significant. This can be used as an
indicator that camera performance has increased over time.
Introduction
Most technologies used by humans have been steadily improving over time, especially
since the industrial revolution. This is a natural result of human progress and invention in the
pursuit of an increased standard of living. Cameras are an interesting part of this progression
because they themselves have improved over time, but, more importantly, their progression and
wider distribution have been key to capturing and remembering the progress that humans have
made. The continued progression of camera technology will also play an important role in the
development of other technologies such as autonomous vehicles.
In order to measure the progress of camera technology, I used the data found in Kaggle’s
1000 Cameras Dataset. This dataset includes the specifications on over 1000 camera models that
were released between 1994 and 2007. The data included in this dataset includes things such as
release year, maximum and minimum resolutions, focus range, storage included, etc. For this
analysis, I will be focusing on the release year to measure age and the maximum resolution to
measure camera performance. Maximum resolution simply describes the number of pixels that a
camera is capable of using while capturing a picture. The more pixels that are used (higher
resolution), the clearer and sharper the image will appear. In this report, I will test the hypothesis
that newer cameras have a higher maximum resolution than older cameras.
Methods
The population that is being analyzed consists of all cameras that have ever been made by
humans. In order to see the effect that time has had, I treated the dataset as two independent
samples. While this is only partially correct, it allows us to view the increase in performance of
newer cameras.
While the 1000 Cameras Dataset is extensive, it obviously does not include the statistics
of every camera ever made. This means that the data should be viewed as a sample of a larger
population, or in this case, samples of two distinct populations with the populations being
cameras released before or during 2003 and the other being cameras released after 2003. These
two groups can be used to represent “old” and “new” cameras respectively. 2003 was chosen as
the cut-off so that the number of data points would be roughly even between the two groups.
Camera performance is not an easily quantifiable attribute. I used a camera’s maximum
resolution as a representation of a camera’s performance. While this is certainly not a
comprehensive measure of a camera’s performance, it does help us understand the best that a
camera can do in terms of photo quality.
In order to compare the performance of each group of cameras, I calculated the mean (the
central tendency) and standard deviation (the spread) of the samples. This allowed me to fit a
normal distribution to the data from each group. This is possible because the data is normally
distributed, as seen in Figure 1. I can then use those distributions to perform a 2-sample
independent t-test to test the null hypothesis that older cameras, on average, have a higher
maximum resolution than newer cameras.
Figure 1: Probability mass functions for old and new cameras as a function of maximum resolution.
Results
I found that the two samples had a difference in mean maximum resolution of 1044p.
This number is considered the effect between the samples that is to be tested. This effect was
then tested using the previously stated null hypothesis that cameras released before and during
2003 have a higher maximum resolution than cameras released after 2003. Upon comparing the
normal distributions that had been fit to the sample data, it was analytically found that the null
hypothesis had a p-value of 7.52e-142. The p-value is the fractional probability that the null
hypothesis could be true and still see this large of an effect. This value is so low that it can be
considered 0.
Table 1: Descriptive statistics of the maximum resolutions of old cameras and new cameras.
Mean (p)
Standard Deviation (p)
New Cameras
2915
535
Old Cameras
1871
585
Discussion
P-values essentially represent the probability that the null hypothesis is true. The p-value
found for the hypothesis test was so low that it can be considered nearly impossible that the null
hypothesis is true. This means that the relationship between release year and maximum
resolution is statistically significant and can be assumed to be a real correlation.
This result is not surprising to anyone who regularly uses even a phone camera.
Comparing the capabilities of a phone camera from 2003, such as a Nokia 7650, to the latest
iPhone makes the effect immediately obvious in a qualitative way. While phone cameras were
not the focus of this analysis, this is a good way to understand the results because phone cameras
are the only cameras that the average person will use on a day-to-day basis.
Even within the context of this dataset, the result of the t-test could have been predicted
by inspecting the distributions of the samples, as shown in Figure 2. A quick way to check if the
effect between two samples could be real is to inspect the amount that the distributions overlap.
Figure 2 shows that the distributions of the two samples don’t have a very large overlap. This
indicates that the two groups likely have a significant difference between them. The t-test
confirms this guess.
Figure 2: Normal distributions of maximum resolution fitted to the data for cameras released before or during 2003 and cameras
released after 2003.
While this analysis may seem very conclusive, it is worth noting the limitations of this
analysis in order to better understand how the results should be interpreted. One such limitation
is that this is only a proof of the alternative hypothesis that cameras released after 2003 have a
higher resolution that cameras released before or during 2003. While this result can be assumed
to extend to all the conditions specified, some uncertainty in this result is introduced by the fact
that the dataset only goes from 1994 to 2007. It does not invalidate the findings of this study, but
it does reduce the certainty of the results.
Based on this analysis alone, we cannot definitively conclude that cameras have
increased in performance over time. As previously stated, maximum resolution is not a
comprehensive measure of camera performance and improvement from one time period to
another doesn’t necessarily mean continuous growth so, while this analysis is a strong indicator,
it cannot be considered a proof of increasing performance because there would be more factors
to consider. However, by combining these findings with other research done into the progression
of cameras, one can conclude that camera technology has improved as time has gone on.
Download