Uploaded by Julius Fronda

Biostitatistics 1st assignment

advertisement
Julius P. Fronda
BSMT -2A
1. Types of Data
A. Data with reference to time factor
a) Time-independent data – The term refers to the data, which can be measured
repeatedly, e.g., data in geosciences and astronomy such as geological structures, rocks,
fixed stars, etc.
b) Time-dependent data – These can be measured only once, e.g., certain geophysical or
cosmological phenomena like volcanic eruptions and solar flares. Likewise, data
pertaining to rare fossils are time-dependent data.
B.
Data with reference to location factors
a) Location-independent data – These are independent of the location of objects
measured, e.g., data on pure physics and chemistry.
b) Location-dependent data – These are dependent on the location of objects measured.
Data in earth sciences and astronomy normally belong to this category. Data on rocks
are also location dependent.
C. Data with reference to mode of generation
a) Primary data – Data are primary when obtained by experiment or observation designed
for the measurement
b) Derived (reformatted) data – These data are derived by combining several primary
data with the aid of a theoretical model.
c) Theoretical (predicted) data – These are derived by theoretical calculations. Basic data
such as fundamental constants are used in theoretical calculations
D. Data with reference to nature of quantitative values
a) Determinable data – Data on a quantity, which can be assumed to take a definite value
under a given condition, are known as determinable data. Time-dependent data are
usually determinable data, if the given condition is understood to include the
specification of time.
b) Stochastic data – Data relating to a quantity, which take fluctuating values from one
sample to another, from one measurement to another, under a given condition are
referred to as stochastic. In geosciences, most data are stochastic.
E. Data with reference to terms of expression
a) Quantitative data – These are measures of quantities expressed in terms of welldefined units, changing the magnitude of a quality to a numerical value. Most data in
physical sciences are quantitative data.
b) Semi-quantitative data – These data consist of affirmative or negative answers to
posed questions concerning different characteristics of the objects involved
c) Qualitative data – The data expressed in terms of definitive statements concerning
scientific objects are qualitative in nature. Qualitative data in this sense are almost
equivalent to established knowledge.
F. Data with reference to mode of presentation
a) Numerical data – These data are presented in numerical values, e.g., most quantitative
data fall in this category.
b) Graphic data – Here data are presented in graphic form or as models. In some cases,
graphs are constructed for the sake of helping users grasp a mass of data by visual
perception. Charts and maps also belong to this category.
c) Symbolic data – These are presented in symbolic form, e.g., symbolic presentation of
weather data
G. Data with reference to scale of measurement
Nominal
Nominal scales are used for labeling variables, without any quantitative value. “Nominal” scales
could simply be called “labels.” Here are some examples, below. Notice that all of these scales
are mutually exclusive (no overlap) and none of them have any numerical significance. A good
way to remember all of this is that “nominal” sounds a lot like “name” and nominal scales are kind
of like “names” or labels.
Examples of Nominal Scales
Note: a sub-type of nominal scale with only two categories (e.g. male/female) is called
“dichotomous.” If you are a student, you can use that to impress your teacher.
Note #2: Other sub-types of nominal data are “nominal with order” (like “cold, warm, hot, very
hot”) and nominal without order (like “male/female”).
Ordinal
With ordinal scales, the order of the values is what’s important and significant, but the differences
between each one is not really known. Take a look at the example below. In each case, we know
that a #4 is better than a #3 or #2, but we don’t know–and cannot quantify–how much better it
is. For example, is the difference between “OK” and “Unhappy” the same as the difference
between “Very Happy” and “Happy?” We can’t say.
Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness,
discomfort, etc.
“Ordinal” is easy to remember because is sounds like “order” and that’s the key to remember with
“ordinal scales”–it is the order that matters, but that’s all you really get from these.
note: The best way to determine central tendency on a set of ordinal data is to use the mode or
median; a purist will tell you that the mean cannot be defined from an ordinal set.
Example of Ordinal Scales
Interval
Interval scales are numeric scales in which we know both the order and the exact differences
between the values. The classic example of an interval scale is Celsius temperature because the
difference between each value is the same. For example, the difference between 60 and 50 degrees
is a measurable 10 degrees, as is the difference between 80 and 70 degrees.
Interval scales are nice because the realm of statistical analysis on these data sets opens up. For
example, central tendency can be measured by mode, median, or mean; standard deviation can
Ratio
Ratio scales are the ultimate nirvana when it comes to data measurement scales because they tell
us about the order, they tell us the exact value between units, AND they also have an absolute
zero–which allows for a wide range of both descriptive and inferential statistics to be applied. At
the risk of repeating myself, everything above about interval data applies to ratio scales, plus ratio
scales have a clear definition of zero. Good examples of ratio variables include height, weight,
and duration.
Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These variables
can be meaningfully added, subtracted, multiplied, divided (ratios). Central tendency can be
measured by mode, median, or mean; measures of dispersion, such as standard deviation and
coefficient of variation can also be calculated from ratio scales.
Summary
In summary, nominal variables are used to “name,” or label a series of values. Ordinal scales
provide good information about the order of choices, such as in a customer satisfaction
survey. Interval scales give us the order of values + the ability to quantify the difference between
each one. Finally, Ratio scales give us the ultimate–order, interval values, plus the ability to
calculate ratiossince a “true zero” can be defined.
Summary of data types and scale measures
H. Data with reference to characteristic
a) Quantitative data – When the characteristic of observation is quantified we get
quantitative data. Quantitative data result from the measurement of the magnitude of
the characteristic used.
b) Qualitative data – When the characteristic of observation is a quality or attribute, we
get qualitative data.
2. Nature of Data
Data is the plural of datum, so it is always treated as plural. We can find data in all the situations
of the world around us, in all the structured or unstructured, in continuous or discrete conditions,
in weather records, stock market logs, in photo albums, music playlists, or in our Twitter accounts.
In fact, data can be seen as the essential raw material of any kind of human activity.
Data Data is a set of values of qualitative or quantitative variables; restated, pieces of data are
individual pieces of information. Data is measured, collected and reported, and analyzed,
whereupon it can be visualized using graphs or images. Data as a general concept refers to the fact
that some existing information or knowledge is represented or coded in some form suitable for
better usage.
According to the Oxford English Dictionary:
Data are known facts or things used as basis for inference or reckoning.
As shown in the following figure, we can see Data in two distinct ways: Categorical and
Numerical:
The nature of data
Data is the plural of datum, so it is always treated as plural. We can find data in all the situations
of the world around us, in all the structured or unstructured, in continuous or discrete conditions,
in weather records, stock market logs, in photo albums, music playlists, or in our Twitter accounts.
In fact, data can be seen as the essential raw material of any kind of human activity. According to
the Oxford English Dictionary:
Data are known facts or things used as basis for inference or reckoning.
As shown in the following figure, we can see Data in two distinct ways: Categorical and
Numerical:
Categorical data are values or observations that can be sorted into groups or categories. There are
two types of categorical values, nominal and ordinal. A nominal variable has no intrinsic ordering
to its categories. For example, housing is a categorical variable having two categories (own and
rent). An ordinal variable has an established ordering. For example, age as a variable with three
orderly categories (young, adult, and elder).
Numerical data are values or observations that can be measured. There are two kinds of numerical
values, discrete and continuous. Discrete data are values or observations that can be counted and
are distinct and separate. For example, number of lines in a code. Continuous data are values or
observations that may take on any value within a finite or infinite interval. For example, an
economic time series such as historic gold prices.
3. Different Sampling Method
Probability Sampling uses randomization to select sample members. You know the
probability of each potential member’s inclusion in the sample. For example, 1/100. However,
it isn’t necessary for the odds to be equal. Some members might have a 1/100 chance of being
chosen, others might have 1/50.
Non-probability sampling uses non-random techniques (i.e. the judgment of the researcher).
You can’t calculate the odds of any particular item, person or thing being included in your
sample.
Common Types
The most common techniques you’ll likely meet in elementary statistics or AP statistics
include taking a sample with and without replacement. Specific techniques include:

Bernaulli sampling have independent Bernoulli trials on population elements. The trials
decide whether the element becomes part of the sample. All population elements have an
equal chance of being included in each choice of a single sample. The sample sizes in
Bernoulli samples follow a binomial distribution.

Poisson sampling (less common): An independent Bernoulli trial decides if each
population element makes it to the sample.

Cluster Sampling divide the population into groups (clusters). Then a random sample is
chosen from the clusters. It’s used when researchers don’t know the individuals in a
population but do know the population subsets or groups.

In systematic sampling, you select sample elements from an ordered frame. A sampling
frame is just a list of participants that you want to get a sample from. For example, in the
equal-probability method, choose an element from a list and then choose every kth element
using the equation k = N\n. Small “n” denotes the sample size and capital “N” equals the
size of the population.

SRS Select items completely randomly, so that each element has the same probability of
being chosen as any other element. Each subset of elements has the same probability of
being chosen as any other subset of k elements.

In stratified sampling, sample each subpopulation independently. First, divide the
population into homogeneous (very similar) subgroups before getting the sample. Each
population member only belongs to one group. Then apply simple random or a systematic
method within each group to choose the sample. Stratified Randomization: a sub-type of
stratified used in clinical trials. First, divide patients into strata, then randomize
with permuted block randomization.
Less Common Types

Acceptance-Rejection Sampling: A way to sample from an unknown distribution using a
similar, more convenient distribution.

Accidental sampling (also known as grab, convenience or opportunity sampling): Draw a
sample from a convenient, readily available population. It doesn’t give a representative
sample for the population but can be useful for pilot testing.

Adaptive sampling (also called response-adaptive designs): adapt your selection criteria as
the experiment progresses, based on preliminary results as they come in.

Bootstrap
Sample:
Select
a
smaller
sample
from
a
larger
sample
with Bootstrapping. Bootstrapping is a type of resampling where you draw large numbers of
smaller samples of the same size, with replacement, from a single original sample.

The Demon algorithm (physics) samples members of a microcanonical ensemble (used to
represent the possible states of a mechanical system which has an exactly specified total
energy) with a given energy. The “demon” represents a degree of freedom in the system which
stores and provides energy.

Critical Case Samples: With this method, you carefully choose cases to maximize the
information you can get from a handful of samples.

Discrepant case sampling: you choose cases that appear to contradict your findings.

Distance sample : a widely used technique that estimates the density or abundance of animal
populations.

The experience sampling method samples experiences (rather than individuals or members).
In this method, study participants stop at certain times and make notes of their experiences as
they experience them.

Haphazard Sampling: where a researcher chooses items haphazardly, trying to simulate
randomness. However, the result may not be random at all — tainted by selection bias.
Additional Uncommon Types

Inverse Sample: based on negative binomial sampling. Take samples until a specified
number of successes have happened.

Importance Sampling: a method to model rare events.

The Kish grid: a way to select members of a household for interviews and uses a random
number tables for the selections.

Latin hypercube: used to construct computer experiments. It generates samples of plausible
collections of values for parameters in a multidimensional distribution.

In line-intercept sampling, a method where you include an element in a sample from a
particular region if a certain line segment intersects the element.

Use Maximum Variation Samples when you want to include extremes (like rich/poor or
young/old). A related technique: extreme case sampling.

Multistage sampling; one of a variety of cluster sampling techniques where you choose
random elements from a cluster (instead of every member in the cluster).

Quota sampling: a way to select survey participants. It’s similar to statified sampling but
researchers choose members of a group based on judgment. For example, people closest to
the researcher might be chosen for ease of access.

Respondent Driven Sampling. A chain-referral sampling method where participants
recommend other people they know.

A sequential sample doesn’t have a set size; take items one (or a few) at a time until you
have enough for your research. It’s commonly used in ecology.

Snowball samples: where existing study participants recruit future study participants from
people they know.

Square root biased samplea way to choose people for additional screenings at airports. A
combination of SRS and profiling.
References:

Afzalm M., & Rizwi, F. (2013). Journal of Islamabad Medical & Dental College.
Biostatistics and Data Types, 2(2), 103.

Guy, M, R.(2019, September 3). Types of Data & Measurement Scales: Nominal, Ordinal,
Interval and Ratio. Retrieved from https://www.mymarketresearchmethods.com/types-ofdata-nominal-ordinal-interval-ratio/.

Cuesta,
H.
(2013,
October).
Practical
Data
Analysis.
Retrieved
from
https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781783280
995/1/ch01lvl1sec15/the-nature-of-data

Horse,T.(2019). Sampling in Statistics: Different Sampling Methods, Types & Error.
Retrieved from
https://www.statisticshowto.datasciencecentral.com/probability-and-
statistics/sampling-in-statistics/
Download