CERAM February-March-April 2008 Quantitative Methods For Social Sciences Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr Objective of The Course The objective of the class is to provide students with a set of techniques to analyze quantitative data. It concerns the application of quantitative and statistical approaches as developed in the social sciences, for future decision makers, policy markers, stake holders, managers, etc. All courses are computer-based classes using the SPSS statistical package. The objective is to reach levels of competence which provide the student with skills to both read and understand the work of others and to carry out one's own research. Class Password: stmarec123 Examples Rise in biotechnology Should the EU fund fundamental research in biotechnology? Has biotechnology increased the productivity of firm-level R&D? Did it increase the speed of discovery in pharmaceutical R&D? Increasing university-industry collaborations Does it facilitate innovation by firms? Does it increase the production of new knowledge by academics? Does it modify the fundamental/applied nature of research? Examples Economic (productivity) Growth Does it come mainly from new firms or improving existing firms? Is market selection operating correctly? Why do good firms exit the market? How does the organisation of knowledge impact on performance? How do knowledge stock and specialisation impact on productivity? How do firms enter into new technological fields? Do firms diversify in new technologies/businesses purposively? Structure of the Class Class 1 : Descriptive Statistics Class 2 : Statistical Inference Class 3 : Relationship Between Variables Class 4 : Ordinary Least Squares (OLS) Class 5 : Extension to OLS Class 6 : Qualitative Dependent variables Structure of the Class Class 1 : Descriptive Statistics Mean, variance, standard deviation Data management Class 2 : Statistical Inference Class 3 : Relationship Between Variables Class 4 : Ordinary Least Squares (OLS) Class 5 : Extension to OLS Class 6 : Qualitative Dependent variables Structure of the Class Class 1 : Descriptive Statistics Class 2 : Statistical Inference Distributions Comparison of means Class 3 : Relationship Between Variables Class 4 : Ordinary Least Squares (OLS) Class 5 : Extension to OLS Class 6 : Qualitative Dependent variables Structure of the Class Class 1 : Descriptive Statistics Class 2 : Statistical Inference Class 3 : Relationship Between Variables ANOVA, Chi-Square Correlation Class 4 : Ordinary Least Squares (OLS) Class 5 : Extension to OLS Class 6 : Qualitative Dependent variables Structure of the Class Class 1 : Descriptive Statistics Class 2 : Statistical Inference Class 3 : Relationship Between Variables Class 4 : Ordinary Least Squares (OLS) Correlation coefficient, simple regression Multiple regression Class 5 : Extension to OLS Class 6 : Qualitative Dependent variables Structure of the Class Class 1 : Descriptive Statistics Class 2 : Statistical Inference Class 3 : Relationship Between Variables Class 4 : Ordinary Least Squares (OLS) Class 5 : Extension to OLS Regressions diagnostics Qualitative explanatory variables Class 6 : Qualitative Dependent variables Structure of the Class Class 1 : Descriptive Statistics Class 2 : Statistical Inference Class 3 : Relationship Between Variables Class 4 : Ordinary Least Squares (OLS) Class 5 : Extension to OLS Class 6 : Qualitative Dependent variables Linear probability model Maximum likelihood (logit, probit) Class 1 Descriptive Statistics Types of Data Descriptive statistics is the branch of statistics which gathers all techniques used to describe and summarize quantitative and qualitative data. Quantitative data Continuous Measured on a scale (value its the range) The size of the number reflect the amount of the variable Age; wage, sales; height, weight; GDP Qualitative data Discrete, categorical The number reflect the category of the variable Type of work; gender; nationality Descriptive Statistics All means are good to summarize data in a synthetic way: graphs; charts; tables. Quantitative data Graphs: scatter plots; line plots; histograms Central tendency Dispersion Qualitative data Graphs: pie graphs; histograms Tables, frequency, percentage, cumulative percentage Cross tables Central Tendency and Dispersion A distribution is an ordered set of numbers showing how many times each occurred, from the lowest to the highest number or the reverse Central tendency: measures of the degree to which scores are clustered around the mean of a distribution Dispersion: measures the fluctuations around the characteristics of central tendency In other words, the characteristics of central tendency produce stylized facts, when the characteristics of dispersion look at the representativeness of a given stylized fact. Central Tendency The mode The most frequent score in distribution is called the mode. The median The middle value of all observed values, when 50% of observed value are higher and 50% of observed value are lower than the median The mean The sum of all of the values divided by the number of value 1 X N in x i 1 i The mode, the mean and the median ore equal if and only of the distribution is symmetrical and unimodal. Dispersion The range Difference between the maximum and minimum values R xmax xmin The variance Average of the squared differences between data points and the mean (average) quadratic deviation i n 2 x i 1 i X 2 N The standard deviation Square root of variance, therefore measures the spread of data about the mean, measured in the same units as the data i n 2 x i 1 i X N 2 Dispersion The range Difference between the maximum and minimum values R xmax xmin The variance Average of the squared differences between data points and the mean (average) quadratic deviation i n 2 x i 1 i X 2 N The standard deviation Square root of variance, therefore measures the spread of data about the mean, measured in the same units as the data i n 2 x i 1 i X N 2 Research Productivity in the Bio-pharmaceutical Industry EU Framework Programme 7 Stylised Facts about Modern Biotech 1. 2. Innovations emerge from uncertain, complex processes involving knowledge and markets: Roles of networks. Economic value is created in many ways – globally and in geographical agglomerations 3. Various linkages exist among diverse actors (LDFs, DBFs, Univ, Venture Capital) in innovation processes, but the firm plays a particularly important role. 4. Regulations, social structures and institutions affect ongoing innovation processes as well as their impacts on society: Importance of IPR. SPSS Statistical Package for the Social Sciences The SPSS software Statistical Package for the Social Sciences (1968) Among the most widely used programs for statistical analysis in social sciences. Market researchers, health researchers, survey companies, government, education researchers, and others. Data management (case selection, file reshaping, creating derived data) Features of SPSS are accessible via pull-down menus The pull-down menu interface generates command syntax. SPSS : Opening SPSS SPSS : Importing data SPSS : Importing data SPSS : Importing data Settings in the “import text” dialogue box No predefine format (1) Delimited (2) First lines contains the variable names (2) One observation per line // all observations (3) Tab delimited only (4) Finish (6) SPSS windows SPSS has opens automatically windows The datasheet window Observe, manage, modify, create, data The results window Everything you do will be stored there The syntax window can be opened SPSS : Data sheet (1) SPSS : Data sheet (2) SPSS : Result / Journal SPSS : Saving data SPSS : working, at last! Recoding Variables Changing existing values to new values (biotechnologie → DBF, pharmaceutique → LDF) 1 2 3 Computing New Variables Taking logarithm (normalization of continuous variables) 1 2 Creating Dummy Variables Taking logarithm (normalization of continuous variables) 1 2 3 Computation of Descriptive Statistics 1 3 2 Descriptive Statistics Statistiques descriptives N patent assets rd spe pharma biotech N valide (listwise) 457 457 457 457 457 457 457 Intervalle 286 35788473.97 1917997.980 2.0235309 1 1 Minimum 0 4422.18 858.53204 -1.1298400 0 0 Maximum Moyenne Ecart type 286 11.92 22.901 35792896.15 4358371.54 6086530.85 1918856.512 330236.630 405160.516 .8936909 -.056808610 .3374751802 1 .63 .482 1 .37 .482 Variance 524.470 3.705E+013 164155043889 .114 .232 .232 Splitting Database 1 2 Descriptive Statistics (by type) Statistique s de scriptives type DB F LDF N patent as sets rd spe pharma biotech N valide (lis twis e) patent as sets rd spe pharma biotech N valide (lis twis e) Int ervalle Minimum 167 202 0 167 2442619 4422.18 167 495443.5 858.53204 167 1.7544527 -1. 12984 167 0 0 167 0 1 167 290 286 0 290 4E +007 218006.47 290 1912600 6256.248 290 1.6904465 -.7967556 290 0 1 290 0 0 290 Maximum 202 2447041 496302.1 .6246127 0 1 Moyenne 12.11 342934.49 58116. 590 -.10630582 .00 1.00 Ec art t ype 21.066 478511.938 88638. 5347 .343286812 .000 .000 Variance 443.764 2E +011 8E +009 .118 .000 .000 286 4E +007 1918857 .8936909 1 0 11.81 6670709.4 486940.24 -.02830504 1.00 .00 23.929 6605972.68 432514.940 .331330781 .000 .000 572.609 4E +013 2E +011 .110 .000 .000 Assignments Compute logarithm for all quantitative variables patent, assets, rd, and name them lnpatent, lnassets and lnrd, respectively. Compute descriptive statistics for both LDFs and DBFs. Draw conclusion by comparing means. Logarithm Normalization Taking the logarithm is a transformation which usually normalize distribution. Elasticities http://en.wikipedia.org/wiki/Elasticity_(economics) A change in log of x is a relative change of x itself. Cobb-Douglas production function log x x 1 x log x x x