GDP and  Gender  Inequalities A global study of the 

advertisement
 GDP and Gender Inequalities
A global study of the interaction between gender and GDP Samantha Sievering and Peter Wilson INTRODUCTION Shortly after Muhammad Yunus crafted the modern incarnation of micro financing – that is, the practice of lending small sums of money with low interest rates through community based initiates – statistics began to show that women were significantly more likely to repay the loans. From this, rose a new discourse, centered on the unique, and presumably powerful, role that women play in socioeconomic development. This project aims to use data‐mining techniques to investigate that belief by looking for correlation on the national scale between various indicators of gender equality and per capita Gross Domestic Product (GDP). In looking at the relationship between per capita GDP and gender inequality, we examine a very specific yet important indicator of economic development and the role that gender inequality has on that indicator. Our results show that – outside of female income – there is a relatively weak mathematical correspondence between per captia GDP and the measures of gender inequality examined. However, of note in our findings was a stronger correlation when all measures of inequality were included than only one single attribute, suggesting that a more holistic view of gender inequality might prove to better correlate to per capita GDP. DATASET DESCRIPTION We accessed multiple databases so as to get a variety of indicators of gender inequality. We pulled attributes from three databases. The attributes we used are listed in table 1. 1) Gender, Institutions and Development Database 2009 compiled by the Organization for Economic Co‐Operation and Development provided data on the percentage of female professional and technical workers in the workforce and the ratio of estimated female to male earned income under the Political and Economic Status of Women sub‐page. Under the sub‐page of Education in the same database, we pulled the female adult literacy rate. 2) Through the UN data website, we found the Gender Inequality Index database, from which we took the attributes Gender Inequality Index, percent of adult females with at least a secondary education, and the labor force participation rate of females. 3) We also found the Per Capita Gross Domestic Product database that we decided to use through the UN data website, which gave us our output value attribute of per capita GDP. TABLE 1: ATTRIBUTES USED Attribute Name Description Country The name of the country The percentage of professional and technical workers who are female. 2009. The Ratio of estimated female earned income to male earned income. Professional_Tech Female_to_Male Inequality Secondary_Ed Labor_Part Literacy GDP The Gender Inequality Index – a calculated index that is designed to reflect the level of inequality present in countries with sufficient data available. A score of zero indicates complete equality. The percent of the female population 25 years of age and up with at least a secondary education. The labor force participation rate of females. The percentage of women, 15 and older, who test as literate. The goods and services produced in 2009 within the given country, averaged among the population of that country. The values are in US$ at current prices. DATA PREPARATION The process of preparing our data for mining and analysis centered on finding a way to join the attributes pertaining to the same country across the multiple sources. Our raw data came in four csv files, each containing a different number of countries, as is shown in table 3. This made it so that we could not simply load all of the data into one relational table, instead we had to create individual tables for each dataset file and upload the file into the corresponding table in SQL. After doing this, we were able to join the tables using the LEFT OUTER JOIN command and thus query all tables at once based on the country attribute. Because GDP provided us with the most unique country entries we used that as our primary point having all other tables joined to GDP based on their “country” values. We also created a Python program that would allow us, if we so desired, to retrieve either all information provided by our variables regarding all countries in the database or by individual countries. The program also allows the user to decide whether or not they wish to see data regarding all attributes or one in specific. This could be utilized if we wished to alter the data that was input into Weka for data analysis as we will discuss later. TABLE 2: VARIATION IN NUMBER OF COUNTRIES REPRESENTED Relation name Attributes contained Number of countries represented
FemaleWorkers Literacy Professional_Tech, Female_to_Male Inequality, Secondary_Ed, Labor_Part Literacy GDP GDP 160 194 147 210 Inequality DATA ANALYSIS As our attributes were numerical, we used numeric estimation to attempt to predict the per capita GDP. The program we used for data analysis, Weka, contains multiple algorithms which analyze given data through various mathematical methods to create corresponding models of prediction. The algorithms we used are as follows: ● SimpleLinearRegression: Chooses the single attribute that allows for the most accurate prediction, and creates linear function model based on that attribute. Much like a line of best fit, only involving one variable. ● LinearRegression: The more complex form or SimpleLinearRegression, this algorithm creates a linear function model using all of the given attributes. Again, it computes a line a best fit for the multiple variables. ○ LinearRegression (M5 Pruning): a type of LinearRegression that uses the M5 method to take out unhelpful attributes. The M5 method determines whether or not each attribute actually helps in creating a more accurate model. It is essentially a midpoint between SimpleLinearRegression which only uses one attribute (which may underfit) and LinearRegression which may overfit by using all attributes. ● LeastMedSq: The least median squared algorithm is a type of regression that uses the LinearRegression to create a generalizing model. Various models are created by using random samples of the data, and the model with the lowest median squared error is selected as the final model. RESULTS The results of our data mining are listed in Table 3. TABLE 3: RESULTS OF DATA MINING Model Correlation Coefficent for Training Data SimpleLinearRegression LinearRegression (M5 Pruning) LinearRegression LeastMedSq 0.74 (Inequality) 0.7879 (excluded Labor_Part) Correlation Coefficient for Test Data .5968 (Inequality) 0.7067 (excluded Labor_Part) 0.79 0.6233 0.6983 0.6692 The LinearRegression model with and without M5 pruning produced nearly identical results, the implication being that the inclusion or exclusion of the Labor_Part attribute was largely unimportant. We will discuss the results of these two LinearRegression models as if they were one. The relatively higher correlation coefficient of the LinearRegression models is of more significance than the SimpleLinearRegression model. The findings suggest that multiple indicators of gender inequality better correlate with per capita GDP than a single indicator does. This could imply that multiple aspects of gender inequality combine to greater affect per capita GDP, speaking to the holistic nature of gender inequality. However, overshadowing these suggestions are the overall low correlation coefficients. The results suggest that there is relatively low correlation between the mathematical measurements of gender inequality that we used here and the measure of economic prosperity of per capita GDP. GRAPHIC 1: LABOR PARTICIPATION RATE AND INEQUALITY WITH RELATION TO GDP Graphic 1 shows the lack of specific correlation. Each country is represented by a dot. The higher the dot is on the chart, the higher the per capita GDP of the country is off the chart. The x‐axis denotes the labor force participation rate of women (chosen as it corresponds least to per capita GDP), with the countries with the least percentage of women on the left and most on the right. Lastly, the size of the dot represents the inequality index (chosen because it corresponds most with per capita GDP). If there was a perfect correspondence between per capita GDP and these two indicators of gender inequality, we would expect a perfectly slanting line from the bottom left to the upper right hand corner with the largest dots on the bottom and smallest on top. This, obviously is not the case. What we do see in this graphic is two useful observations. First, we see the extent to which labor force participation rate varies independently of per capita GDP. Secondly, this graphic shows a very generalized relationship between per capita GDP and the inequality index. As a conglomeration of other indicators of gender inequality, this showing suggests that there may be a correspondence between some measures of gender inequality and per capita GDP. On the surface, these finding suggests that gender inequality affects economic prosperity only marginally. However, to conclude that general statement from this study would be a vast over‐generalization. The measures of gender inequality used here are nowhere close to providing the full picture of gender inequality within a country, much less throughout the world. To the same point, per captia GDP is a very specific measure of economic prosperity, and it is quite possible that gender inequality would affect different economic measures while only moderately influencing per capita GDP. CONCLUSIONS This report shows that measures of gender inequality can predict per capita GDP with only marginal success, suggesting a low level of correlation between the two. However, due to the extremely limited scope of data that was mined to create the prediction models, we cannot make any conclusions about the relationship between gender inequality and economic success beyond the specific factors used. If we had been able to access more complete country specific information regarding the attributes we focused on we may have been able to show a correlation. APPENDIX # ## ### Created through collaboration by Samantha Sievering and Peter Wilson ## # import sqlite3 # Connects to "globalization.db" through sqlite3 db = sqlite3.connect ('globalization.db') cursor = db.cursor() # This allows the user to decide whether or not they wish you view information regarding # all countries in the database or only one at a time. country_option = raw_input ("Do you want to see information for all countries?(Yes/No) ") If country_option == "No": #Defines "country_name" as a specific country chosen by user country_name = raw_input ("What country would you like to view information on? ") Else: #Defines "country_name" as a blank variable allowing program to gather information on #all countries at once. country_name = "%" # Lists the headers from all tables in the database. cursor = db.execute("select * from femaleworkers, gdp, inequality, literacy") for i in cursor.description: print i[0] # Allows user to decide between viewing all or one attribute at a time. multiple_attributes = raw_input ("Would you like to view all attributes?(Yes/No) ") # This if‐command runs through asking for user to define from the list of attributes which they wish to see. if multiple_attributes == "No": attribute_1 = raw_input ("Aside from 'Country', which attribute do you wish to see? ") command = '''SELECT GDP.Country, ''' + attribute_1 + ''' FROM GDP JOIN FemaleWorkers on GDP.Country = FemaleWorkers.Country JOIN Inequality on GDP.country = inequality.country JOIN Literacy on GDP.Country = Literacy.Country WHERE GDP.Country LIKE ?;''' cursor.execute(command, [country_name]) count = 0 for tuple in cursor.fetchall(): print " ", tuple[0], tuple[1] count = count + 1 # Informs user if there is no data for the country they requested. if count == 0: print "We have no records pertaining to that country." # This else‐command prints values for all attributes. else: command = '''SELECT GDP.Country, professional_tech, female_income, female_to_male, gdp, inequality, secondary_ed, labor_part, literacy From GDP JOIN FemaleWorkers on GDP.Country = FemaleWorkers.Country JOIN Inequality on GDP.Country = Inequality.Country Join Literacy on GDP.Country = Literacy.Country Where GDP.Country LIKE ?;''' cursor.execute(command, [country_name]) count = 0 for tuple in cursor.fetchall(): print " ", tuple[0], tuple[1], tuple[2], tuple[3], tuple[4], tuple[5], tuple[6], tuple[7], tuple[8] count = count + 1 # Informs user if there is no data for the country they requested. if count == 0: print "We have no records pertaining to that country." ## Closes the database db.close() 
Download