Analysis of per capita Net State Domestic Product as a function of Unemployment & Literacy

Charvi Rampuria (310), Sukshma Amogha (762)
ABSTRACT: This project aims to learn more about India's unemployment and literacy condition and how
they impact the state counterpart to a country's net domestic product. We have used data from the years 20112012 in our project. In this project, regression analysis is used to determine the relationship between
India’s NSDP, unemployment rate, and literacy rate. The technique of finding the connections between two
or more variables is known as regression analysis. The unemployment rate and literacy rate are independent
variables and NSDP is the dependent variable. The findings of the final study are presented as a linear and
multiple regression analysis. We can readily determine how the unemployment and literacy rates of various
States and Union Territories across India influence the Net state domestic product using linear and multiple
regression analysis. Unemployment is greatly influenced by GDP, India's unemployment rate falls as the
country's GDP rises. Here, in this project, we have tried to study how unemployment impacts the state
counterpart of NDP (GDP-Depreciation). Furthermore, there is enough evidence to study the impact
education or literacy has on GDP, which we have tried to analyze in this project.
Keywords: unemployment, NSDP, literacy, GDP
Net state domestic product (NSDP) is the state counterpart to a country's net domestic product (NDP), which
equals the gross domestic product (GDP) minus depreciation on a country's capital goods. It is a list of Indian
states and union territories by NSDP per capita.
According to UNESCO, the literacy rate is defined by the percentage of a given age group population that can
read and write. The adult literacy rate corresponds to ages 15 and above, the youth literacy rate to ages 15 to
24, and the elderly to ages 65 and above. It is typically measured according to the ability to comprehend a
short simple statement on everyday life. Generally, literacy also encompasses numeracy, and measurement
may incorporate a simple assessment of arithmetic ability. The literacy rate and the number of literates should
be distinguished from functional literacy, a more comprehensive measure of literacy assessed on a continuum
in which multiple proficiency levels can be determined.
According to OECD the unemployed are people of working age who are without work, are available for work,
and have taken specific steps to find work. The uniform application of this definition results in estimates of
unemployment rates that are more internationally comparable than estimates based on national definitions of
unemployment. This indicator is measured in the numbers of unemployed people as a percentage of the labor
force and it is seasonally adjusted. The labor force is defined as the total number of unemployed people plus
those in employment.
In the project, we try to analyze the impact of literacy rate and unemployment rate on the per capita net state
domestic product across all States and Union Territories of India. We use regression analysis to understand
the impact of the independent variables, unemployment rate, and literacy rate on per capita state products. The
next section presents a literature review of the studies which have been referred to in producing this research.
In conclusion, we talk about some policy suggestions.
In their paper Impact of Schooling on the Economic Development of Low-Income Nations, Germinal G. Van
and Marcella Taleb Da Costa (2021) indicate that average years of education can raise GDP per capita in lowincome countries. They chose to look into the case of India, where there is no way to assess educational quality
due to a lack of data. They used data from the World Statistics Bank for GDP per capita and data from In Our
World for Mean Years of Schooling. They employed polynomial regression to estimate the model's
parameters. Their findings revealed a positive relationship between educational attainment squared and GDP
per capita, implying that a one-year increase in average educational attainment improves GDP per capita by
about 132.75 dollars. In the context of this study, we could speculate that a country's average years of
schooling are influenced by its GDP per capita. Increases in GDP per capita improve education levels. Their
findings suggest that education is a critical instrument for improving a country's economy. Individuals become
more productive and gain more talents to contribute to the labor market as a result of their education.
Furthermore, education fosters creativity, which leads to innovation, which is another aspect that adds to a
country's growth and economic prosperity.
Altaf Hussain Padder & B. Mathavan ( 2021) in their Granger Causality Approach: The Relationship between
Unemployment and Economic Growth in India, looked at the relationship between unemployment and real
gross domestic product in India's economy from 1990 to 2020. Their analysis used data such as the gross
domestic product, which is a true indicator of economic growth, and the unemployment rate. The final result
of the calculated regression of unemployment and economic growth as an explanatory variable for India
indicated that economic growth has only a 6% impact on unemployment and that they are negatively
connected, with the remaining 94% explained by other factors. The small value of R-squared showed that
unemployment rate evolution is largely influenced by other factors, which were not part of this study.
According to the study, the government should create more employment opportunities as soon as possible to
absorb the country's swarming population of unemployed workers by modernizing the agriculture sector,
which is the most important sector, providing more than 42 percent of livelihood while contributing only 13
percent to GDP.
Okun's Law is an empirically observed relationship between unemployment and losses in a country's
production (GDP). It can also be used to estimate gross national product (GNP). Further, Okun’s law states
that a country’s gross domestic product (GDP) must grow at about a 4% rate for one year to achieve a 1%
reduction in the rate of unemployment over time. The law has evolved to fit the current economic climate and
employment trends.
The study is based on secondary data. To understand the impact of the unemployment and literacy rates,
econometric analysis has been done using simple regression models for various States and Union Territories
of India. Cross-sectional data for the year 2011-12 has been taken for various States and Union Territories of
India from the Reserve Bank of India data source. The data for the NSDP variable is in INR. The ordinary
least square method under the Classical Linear Regression Model is used for regression analysis where NSDP
is taken as the dependent variable and unemployment and literacy rates as the independent variables.
R-Programming has been used for the analysis.
Regression is a technique used to model and analyze the relationships between variables and oftentimes how
they contribute and are related to producing a particular outcome together.
There are various types of regressions, but here we will only cover the types of regression that are relevant to
our research purpose.
Linear Regression attempts to model the relationship between two variables by fitting a linear equation to
observed data. The explanatory variable is one, while the dependent variable is the other.
A linear regression line has an equation of the form
Y = a + bX
where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the
intercept (the value of y when x = 0).
We have used linear regression in one variable for Model 1 and Model 2 of our study.
Multiple Linear Regression seeks to model the relationship between two or more explanatory variables and
a response variable by fitting a linear equation to observed data.
Henceforth, the model for multiple linear regression, given n observations, is
x +
x + ...
x + for i = 1,2, ... n.
We have used multiple linear regression for Model 3 of our study.
The regression model is linear in the parameters; it may or may not be linear in the variables. That is, the
regression model is of the following type.
The explanatory variable(s) X is uncorrelated with the disturbance term u.
Given the value of Xi, the expected, or mean, the value of disturbance term u is zero. That is,
E(u/Xi) = 0
The first assumption states that these other factors or forces are not related to Xi (the variable explicitly
introduced in the model) and therefore, given the value of Xi, their mean value is zero.
The variance of each ui is constant, homoscedastic. That is
var(ui)= σ2
There is no correlation between two error terms. This is the assumption of no autocorrelation
Algebraically, this assumption can be written as
cov(ui,uj) = 0 i ≠ j
This assumption means that there is no systematic relationship between two error terms, which means the
error terms ui are random.
The regression model is correctly specified. Alternatively, there is no specific bias or specification error
in the model used in empirical analysis.
The main objectives of this project are:
• To understand the concept of the NSDP and trends across all the Indian states and Union Territories.
• To highlight the nature of the relationship between unemployment, literacy, and NSDP.
• To analyze the literacy and unemployment situation across all states and Union Territories of India to
provide policy recommendations towards achieving higher per capita net domestic state product and a
higher GDP.
Interpretation of Model 1
One dependent and one independent variable
Per capita Net State Domestic Product and Literacy rates
Y: Per Capita Net State Domestic Product
X1:Literacy Rate
Y= B1 + B2X1 + ui (Population regression)
B1: Intercept coefficient
B2: Slope coefficient
ui: random error term
Ŷ=b1 + b2X1, where Ŷ is the estimator of Y and b1 and b2 are OLS estimators of B1 and B2 respectively.
A priori Expectations of CoefficientsHere, A priori expectations of b2 are positive because when there is an increase in literacy rates across all
States and Union Territories, there is an increase in the net state domestic product, establishing a positive
relationship between literacy rate and NSDP.
H0 : b2 = 0
Ha : b2 > 0
Running regression by OLS methodDependent variable: NSDP (Y)
Independent variable: Literacy rate (X1)
Literacy Rate
Mean of Dependent variable
Sum Of Residual Square
Multiple R-squared
F-statistic (1,30)
Std. Error
S.D. dependent variable
Residual standard error
Adjusted R-squared
Using the result of regression run by OLS method it can be seen that the estimated coefficients are:
b1= 4.5705
b2= 1.5272
Ŷ= 4.5705 + 1.5272X1
Interpretation of coefficientsb1= 4.505 essentially means that when the Literacy rate is 0, the NSDP would be about 4.5705%. In our linear
model, b1 is irrelevant as the Literacy rate can never be zero.
b2= 1.5272 means that other things remain the same, an increase in Literacy rates by one unit leads to an
increase in NSDP by 1.5272%. b2 is positive means that there is a positive relationship between Literacy rates
and NSDP in India.
R2 (overall goodness of fit measure) of 0.1032 means that 10.32% of total variation in NSDP around its mean
value is explained by Literacy rates.
Significance of the modelH0: b2 = 0
Ha : b2 > 0
Observed p value= 0.073
α - value= 0.05
Since the p-value > α, the data is statistically insignificant and we fail to reject the null hypothesis.
Analysis of VarianceRegression
Sum of Squares
Mean Square
R2 = 0.944/9.15 = 0.1032
F(1,31) = 0.9441/0.2735 = 3.451 [p-value 0.073]
F-test (test of overall significance)
H0: r2 = 0
Ha: r2 > 0
F-observed: 3.451
F-critical= 4.1708
F observed < F critical which means r2 is statistically insignificant at 5% level of significance. Thus, the
null hypothesis cannot be rejected.
Test for normality of residualsH0: error normally distributed
Ha: error is not normally distributed
Test statistic: Chi-square = 41.9 with observed p-value 0.073 (right tail)
Chicrit = 43.7729
Since, Chical < Chicrit
Errors are not normally distributed and we fail to reject the null hypothesis.
Interpretation of Model 2
One dependent and one independent variable
Per capita Net State Domestic Product and Unemployment rates
Y: Per Capita Net State Domestic Product
X2: Unemployment Rate
Y= B1 + B2X2 + uj (Population regression)
B1: Intercept coefficient
B2: Slope coefficient
uj: random error term
Ŷ=b1 + b2X2, where Ŷ is the estimator of Y and b1 and b2 are OLS estimators of B1 and B2 respectively.
A priori Expectations of CoefficientsHere, A priori expectations of b2 are negative because when there is an increase in unemployment rates
across all States and Union Territories, there is a decrease in the net state domestic product, establishing a
negative relationship between unemployment rate and NSDP.
H0 : b2 = 0
Ha : b2 < 0
Running regression by OLS methodDependent variable: NSDP (Y)
Independent variable: Unemployment rate (X2)
Unemployment Rate
Mean of Dependent variable
Sum Of Residual Square
Multiple R-squared
F-statistic (1,30)
Std. Error
<2e-16 ***
S.D. dependent variable
Residual standard error
Adjusted R-squared
Using the result of regression run by OLS method it can be seen that the estimated coefficients are:
b1= 11.25698
b2= -0.01435
Ŷ= 11.25698 -0.01435X2
Interpretation of coefficientsb1= 11.25698 essentially means that when the Unemployment rate is 0, the NSDP would be about 11.25698%.
In our linear model, b1 is irrelevant as unemployment can never be removed altogether in a developing
economy like India.
b2= -0.01435 means that other things remain the same, an decrease in Unemployment rates by one unit will
lead to an increase in the NSDP by 0.01435%. b2 is negative means that there is a negative relationship between
Unemployment rates and NSDP in India.
R2 (overall goodness of fit measure) of 0.0003794 means that 0.03794% of total variation in NSDP around its
mean value is explained by Unemployment rates.
Significance of the modelH0: b2 = 0
Ha : b2 < 0
Observed p value= 0.9157
α - value= 0.05
Since the p-value > α, the data is statistically insignificant and we fail to reject the null hypothesis.
Analysis of Variance-
Sum of Squares
Mean Square
R2 = 0.003/9.15 = 0.00032787
F = 0.00347/0.30489 = 0.01138115 [p-value 0.9157]
F-test (test of overall significance)
H0: r2 = 0
Ha: r2 > 0
F-observed: 0.01139
F-critical= 4.1708
F observed < F critical which means r2 is statistically insignificant at 5% level of significance. Thus, the
null hypothesis cannot be rejected.
Test for normality of residualsH0: error normally distributed
Ha: error is not normally distributed
Test statistic: Chi-square = 20.03 with observed p-value 0.9157 (right tail)
Chicrit = 43.7729
Since, Chical < Chicrit
Errors are not normally distributed and we fail to reject the null hypothesis.
Interpretation of Model 3
One dependent and two independent variables
Per capita Net State Domestic Product, Literacy and Unemployment rates
Y: Per Capita Net State Domestic Product
X1: Literacy Rate
X2: Unemployment Rate
Y= B1 + B2X1 + B3X2 + ui (Population regression)
B1: Intercept coefficient
B2: Partial regression coefficients
B3: Partial regression coefficients
uj: random error term
Ŷ=b1 + b2X1 + b3X2, where Ŷ is the estimator of Y and b1,b2 and b3 are OLS estimators of B1, B2, and B3
A priori Expectations of Partial Coefficientsb2: Here, a-priori expectations of b2 are positive because when there is an increase in literacy rates across all
States and Union Territories, there is an increase in the net state domestic product, establishing a positive
relationship between literacy rate and NSDP.
b3: Similarly, a-priori expectations of b3 are negative because when there is an increase in unemployment
rates across all States and Union Territories, there is a decrease in the net state domestic product,
establishing a negative relationship between the unemployment rate and NSDP.
Thus, the hypothesis to be tested areH0: b2 = 0
H0: b3 = 0
Ha: b2 > 0
Ha: b3 < 0
Running regression by OLS methodDependent variable: NSDP (Y)
Independent variable: Literacy rate (X1)
Independent variable: Unemployment rate (X2)
Literacy Rate
Unemployment Rate
Mean of Dependent variable
Sum Of Residual Square
Multiple R-squared
F-statistic (2,29)
Std. Error
S.D. dependent variable
Residual standard error
Adjusted R-squared
Using the result of regression run by OLS method it can be seen that the estimated coefficients are:
b1= 4.12743
b2= 1.70544
b3= -0.09287
Ŷ= 4.12743 + 1.70544X1 -0.09287X2
Interpretation of coefficientsb1= 4.1273 essentially means that when the Unemployment rate and Literacy rate are 0, the NSDP would be
about 4.1273%. In our linear model, b1 is irrelevant as unemployment and literacy rates can never be zero.
b2= 1.70544 means that other things remain the same, an increase in literacy rate by one unit leads
to an increase in the NSDP by 1.70544%. b2 is positive implying that there is a positive relationship between
Literacy rates and NSDP in India.
b3= -0.09287 means that other things remain the same, a decrease in the unemployment rate by one unit leads
to an increase in the NSDP by 0.09287 %. b3 is negative implying that there is a negative relationship between
Unemployment rates and NSDP in India.
R2 (overall goodness of fit measure) of 0.1177 means that 11.77% of total variation in NSDP around its
mean value is explained by Unemployment and Literacy rates in India.
Significance of the modelb1 is statistically insignificant as its p-value is greater than 5% (α), i.e. , 0.2688 > 0.05.
H0: b2 = 0
Ha : b2 > 0
Observed p value= 0.0593
α - value= 0.05
Since the p-value > α, the data is statistically insignificant and we fail to reject the null hypothesis.
H0: b3 = 0
Ha: b3 < 0
Observed p value= 0.4955
α - value= 0.05
Since the p-value > α, the data is statistically insignificant and we fail to reject the null hypothesis.
Analysis of Variance-
Sum of Squares
Mean Square
R2 = 1.077/9.151 = 0.1177
F = 0.5331/0.2784 = 1.914 [p-value 0.1628]
F-test (test of overall significance)
H0: r2 = 0
Ha: r2 > 0
F-observed: 1.934
F-critical= 3.33
F observed < F critical which means r2 is statistically insignificant at 5% level of significance. Thus, the
null hypothesis cannot be rejected.
Test for normality of residualsH0: error normally distributed
Ha: error is not normally distributed
Test statistic: Chi-square = 36.38 with observed p-value 0.1628 (right tail)
Chicrit = 42.5569
Since, Chical < Chicrit
Errors are not normally distributed and we fail to reject the null hypothesis.
The percentage of NSDP shows a positive relationship with the Literacy rate and a negative relationship with the
Unemployment rate. This means, that with an increase in literacy level, educational attainment, and the average
years of schooling, the percentage of NSDP would rise which would further increase the GDP of the country.
Further, with a decrease in the unemployment rate and an improvement in the country’s labor force, the percentage
of NSDP can improve adding more to the GDP. However, there are other factors not included in the model that
affect the percentage of net state productivity other than unemployment and literacy.
Both these aspects should be taken into consideration while building a more productive nation. Some of our
recommendations are as follows:
➢ Providing scholarships, so that talents from the economically challenged population could also access
better education and therefore a brighter future all along. Scholarships and grants would also incentivize
the parents to send their children to school rather than making them labour.
➢ Mid-day meal, encourages children to attend school, therefore increasing the enrolment ratio in primary
and secondary education.
➢ More relevant education system like those in foreign nations which helps in the job market rather than
focusing on rote learning should be encouraged.
➢ Revamping the teacher education (TE) system. We should focus on revamping the curriculum and
pedagogy to bring modern and innovative elements within it and make it a lot more rigorous.
➢ Since India is a labour abundant country, the study, therefore, suggests that government should as a
matter of urgency create more employment opportunities to absorb the teeming population of the
unemployed workforce in the country.
➢ Various measures like MGNREGA, National Policy for skill development and entrepreneurship, Startup India Initiative, Pradhan Mantri Kaushal Vikas Yojana, etc have been implemented by the
government of India in the past to improve the employment situation. More such initiatives and
measures should be taken up by the government, especially after the post-pandemic times when a lot
of workers have lost their livelihoods.
No work is free from limitations and this paper is no exception and thus the limitations need to be highlighted
for better critical appreciation.
➢ The result obtained from the estimated regression of unemployment, literacy, and NSDP confirms only
11.77 percent impact of unemployment and literacy on NSDP while the remaining 88.23 percent are
due to other factors. For this study, only two factors were taken into account while there are various
other factors affecting NSDP.
➢ In this study, the CLRM assumptions were not considered but the residuals were checked for normal
➢ The statistical significance of the collected data couldn’t be proved resulting in the failure of rejection
of the null hypothesis.
This study has analysed the relationship between per capita Net State Domestic Product, Unemployment and
Literacy rate using the data from Indian States and Union Territories for the year 2011-12.
From the study, it can be concluded that if the state governments take appropriate policy measures for the
expansion of literacy and employment opportunities, the percentage of NSDP and further the GDP will be
higher. Furthermore, the results of descriptive statistics revealed that the variables are not normally distributed.
Thus, our model has been effective in identifying two factors that affect the percentage of NSDP in different
states and regions of the country. However, several factors cannot be numerically measured easily, or data is
not available so far for them, or working out a mathematic model with them may be difficult, which could
therefore not be included in this model. Therefore, the model only partially explains why the percentage of
NSDP is higher in some states and lower in others.