List of datasets

advertisement
List of datasets
Here is a list of the datasets mentioned in the main text. You can get these datasets by going
to http://www.physics.upenn.edu/biophys/Datasets . You may be able to obtain a file just
by clicking its link (e.g., Firefox will then offer to save it to your hard drive). If your browser
won’t do that (e.g., it shows you a garbled page), try right-click (Windows) or ctrl-click
(Mac), which should give you a context menu including one allowing you to download the file
instead of attempting to display it.1 If a link is dead or missing, sorry—I’m working on it.
Once you have the file, and if necessary have moved it to a convenient folder, you must
next load it into Matlab. First navigate to the folder containing the file, for example by
clicking the · · · button located on the upper right of the main Matlab window or using the
cd command in the command window.
Files with names ending in .mat or .dat are Matlab data files, which can be loaded
by using the load command (or just doubleclick the file in Matlab’s Current Directory
window). After this operation your workspace will contain some variables containing the
data.
Files with names ending in .csv or .txt are generic comma-separated-variable files.
Matlab can read such files by using the Import Wizard (File>Import data). But in many
cases the .csv file is a duplicate of data also given in a .mat file. If you use Matlab, and a
.mat version exists, it’s easier to use this file instead of the .csv or .txt version.
Files with names ending in .tiff or .png are images, which can be read into Matlab
by using the imread command or Import Wizard.
#1=HIVseries: HIV infection time course. File HIVseries.mat contains variable a with
two columns of data. The first is the time in days since administration of a treatment to an
HIV positive patient; the second contains the concentration of virus in that patient’s blood
in arbitrary units. Data from Perelson, 2002, Box 1.
#2=population: File population.mat: First column: date in years CE. Second column: Estimated world population. Data from http://www.vaughns-1-pagers.com/history/world-populationgrowth.htm; see also http://www.census.gov/population/international/data/idb/worldpopinfo.php .
#3=insectAFM: File insectAFM.dat contains data generated using an atomic force microscope. Specifically, it contains a variable insect, which is a 512×512 array of height
values in nanometers that correspond to the microscopic topography of a cicada wing (data
courtesy Andre Brown).
#4=shotNoise: File shotNoise2008001t.txt: The first column gives the arrival times of
1
If you are viewing an electronic version of this page, you may be able simply to click the links below, or
right-click (Windows) or ctrl-click (Mac) to copy the link and paste it into Firefox. This may work better in
some PDF viewers (Adobe Reader) than others (Apple Preview).
15
16
Datasets
290 photon absorption events in an avalanche photodiode detector. Time is measured in
units of 50 ns. Total duration is 5s. (Data courtesy John Beausang.)
#5=photodiodeblips: File g112APDtraces.csv contains columns of data. Columns 1–2
correspond to higher illumination; 4–5 are medium illumination; 7–8 are for the lowest
illumination. In each pair of columns the first entry is time in seconds; the second is detector
output in volts at that time. (Data courtesy John Beausang).
#6=colorResponse: File responsecurves.mat contain data shown in Figure 12.9 (from
Baylor, 1995, Fig. 5).
#7=cluster: File bluedots.mat contains variables paris (a color image) and xy (30
coordinate pairs specifying points).
#8=stocks: File monthlydji.mat contains variable monthlydji with
First column: Time in days since a starting date.
Second column: Dow Jones Industrial Average.
#9=brownian: File g26perrindata.txt (data from J. Perrin, Les Atomes). The columns
give x, y coordinates of the points in Figure 3.3a.
#10=blitz: Clarke data.
#11=horsekicks:
#12=myosinV: File g42myosinwalk.mat; data from Yildiz et al., 2003, Fig. 6. The file
contains these variables:
yildizHistoRed = stepping histogram for myosins taking only 70 nm steps;
yildizHistoGreen = stepping histogram for myosins taking (70 − x) nm steps alternating
with x nm steps.
In each case, each row consists of the pair ((center of histogram bar in seconds), (observed
frequency)).
redDelta=0.5, greenDelta=1.0 are the respective bin widths in seconds.
Files yildizHistoRed.csv and yildizHistoGreen.csv contain the same information as
the corresponding variables in the .mat file.
#13=actinimg: Actin network image. (Data courtesy Andre Brown.)
#14=LDexpt: LDexpt23.mat. Data from Luria & Delbruck, 1943, table 5 p505, experiment
number 23, shown in the text in Figures 4.3 and 4.4. Each culture is has a certain number
m of resistant bacteria. The data specify a histogram of m with variable-width bins: Bin
#i includes outcomes for m in the range bins(i) through (bins(i+1)-1). The number of
cultures with m in this range is expcounts(i).
File LDexpt23.csv: Same data with bins in the first column and expcounts in the second
column.
#15=linearFitPoisson: linearFitPoisson.mat xvals=(distance from radioactive source
to detector, m)−2 ; counts=(counts in detector in a fixed time interval).
#16=FRETdistance: FRET efficiency as a function of distance; see Figure 11.14.
#17=emersonLewis: Data from Emerson & Lewis, 1942; see Section 11.8.2.2. phycocyanin.txt,
carotenoids.txt, chlorophyll.txt: data from Figure 11.15a. QYield.txt: data from
Figure 11.15b.
Printed August 9, 2012
Datasets
17
#18=STA:
#19=novick: Novick/Weiner data. g149novickA.mat is data from Novick & Weiner, 1957,
Fig. 1. First column: Time in hours. The e-folding time was about 3 hours. Second column:
Fraction of maximum beta-galactosidase activity.
g149novickB.mat is data from ibid., Fig. 2. First column: Time in hours. The e-folding time
was about 3 hours. Second column: Fraction of maximum beta-galactosidase activity.
g149novickA.csv and g149novickB.csv are the same data.
#20=myoXstep: File dwelltimetype12.csv ... [Data courtesy Yujie Sun.]
#21=HbMb: Files hemoglobin.mat and myoglobin.mat.
#22=catphoto: File bwCat.tiff: A photograph of Emily. File gaussFilt.mat and
gaussFilt.csv: Contain the 45x45 array gauss specifying a Gaussian filter function.
#23=ProbSeeRodCell: File BaylorNunnSchnapf.mat: Data from Baylor et al., 1984, Fig. 8,
shown as the points in Figure 15.3b on page 407. The data give the probability of seeing for
macaque rod cells as a function of the density of photons supplied to the rod outer segment.
The variable log10Nbar contains the logarithm of the density of photon arrivals (units of
photons per µm2 ), applied over 50 µm2 ). The four columns correspond to four different cells;
the five rows correspond to five different flash intensities.
The variable Psee contains the fraction of 65 trials that elicited a rod response.
File BaylorNunnSchnapf.csv: Same data with the flash intensities in the first four columns
as above, and Psee in the next four columns.
#24=RodSignalHisto: File BaylorLambYau.mat: Data from Baylor et al., 1979b, Fig. 3
#25=sakitt: File sakittData.mat: Data from Sakitt, 1972, Table 1. The variable
Nsee(i,j,k) gives the number of times that stimulus i (1 ≤ i ≤ 3) elicited rating
j − 1 (0 ≤ j ≤ 6) from subject k (1 ≤ k ≤ 3). The variable photonsin(i) gives the strength
of stimulus #i in mean number of photons presented to the cornea.
c 2010,2011,2012 Philip Nelson
Download