DATA SCIENCE
1. Population Data: In Data Science, a population refers to the complete set of individuals or
items from which data is collected or about which inferences are made, such as all people in a
country or all crows in a forest. Population Data Science is a specialized interdisciplinary field that
analyzes large-scale datasets about people to generate population-level insights for positive
societal impact, focusing on ethical, privacy-sensitive methods and public
2. Sample Data : Small amount of data from population data
3. Statistics: divided in to two parts 1.Descriptive Statistics (Anaylysis on entire data or population
data Measures of Central Iendency Measures of Variability Measures of Shape Percentiles
Frequency Distribution Covariance and Correlation ) and 2.Inferential Statistics (Anaylysis on
small amount data or sample data Central Limit Theorer Hypothesis Testing z-test chi saqaurtest).
4. Measures of Central Iendency : Mean , Meidian , Mode making graph for mean at (27:00 mins)
Formula for mean from data : np.mean(d["S1: Demographic Pressures"]) # or
d["S1: Demographic Pressures"].mean()
Python range() function:
0 is the start value,
15 is the stop value (not included),
10 is the step size.
So, range(0, 15, 10)
When finding median consider odd and even values
For first 10 valus use .head(10).mean() other wise it will calculate mean for all values in
row/coloumn
Important: Mean Median Mode help in data cleaning in data science
5. Range=max-min
Mean Absolute Deviation: less spread so more accuracy we always take less spreaded area
(54:00)
When mean absolute deviation same so we take standard deviation or variance.WE TAKE DATA
WHICH HAS LOW Mean Absolute Deviation, standard deviation, variance.
6. PERCENTILE AND QUARTILES:
Percentiles are used in statistics to give you a number that describes the value that a given percen
t of the values are lower than.50 percentile is median
USED IN INTERQAURTILE RANGE TO FIND OUTLIER. If big difference between 75% and max
(*or min to 25%) so outlier present.On box plot holes represent outliers. Bascally used to
analyze and remove outliers
7. SKEWNESS :
Skewness measures the asymmetry of the distribution. a A skewness value of 0 indicates a
perfectly symmetrical distribution. = Positive skewness indicates that the distribution is sk
ewed to the right (i.e., ti tail is longer on the right), while negative skewness indicates a left
skew (i. the tail is longer on the left).
If skew value greater tha 0 so positive and viceversa
8. Probability :
Discrete Random Variable: A random variable that takes on a countable — number of distinct val
ues wholenumber
Continuous Random Variable: A random variable that can take on any value LE within a given ra
nge or interval can be in decimals.