Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com CHAPTER 1 Overview of multivariate statistical analysis 1.1 Introduction In medical research, it is often necessary to include multiple variables simultaneously to fully describe and analyze the phenomena of interest. For example, health status assessment may involve using indicators of physiological, psychological, and social adjustment, whereas disease diagnosis may require the integration of clinical manifestations, imaging examinations, and laboratory tests. In the prediction of cardiovascular events, variables such as body mass index, blood pressure, lipid levels, diabetes mellitus, and smoking status may be considered. Multivariate statistical analysis consists of a collection of methods that can be used when several measurements are made for each individual or object in one or more samples. The goal of multivariate statistical analysis is to extract important information that is hidden within these complex variable relationships and to identify the essential features of the phenomena being studied. The need to understand the relationships between many variables makes multivariate analysis an inherently difficult subject. Often, the human mind is overwhelmed by the amount of data. Additionally, more mathematics is required to derive multivariate statistical techniques for making inferences than in a univariate setting. In this textbook, we introduce the basics of multivariate statistical analysis based on algebraic concepts and, to avoid derivations of statistical results that require the calculus of many variables, we make use of illustrative examples and a minimal amount of mathematics. Despite this, basic mathematical sophistication and a desire to think quantitatively will be required. We have attempted to motivate readers’ study of multivariate analysis and provide rudimentary, but important, methods for organizing, summarizing, and displaying multivariate data. Multivariate statistics originated in the 1920s, and famous statisticians such as J. Wishart, H. Hotelling, R.A. Fisher, and S.N. Roy were pioneers in this field. The specific content of multivariate statistical analysis not only includes the direct extension of methods used in univariate analysis but also covers problems unique to scenarios in which multiple variables are encountered simultaneously. With the development of computer technology, multivariate statistical analysis has been widely used in fields such as geology, meteorology, economics, and medicine. 1 Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com We Don’t reply in this website, you need to contact by email for all chapters Instant download. Just send email and get all chapters download. Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com You can also order by WhatsApp https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph one_number&app_absent=0 Send email or WhatsApp with complete Book title, Edition Number and Author Name. 2 Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Applied Multivariate Statistical Analysis in Medicine In 1975, British statistician M.G. Kendall summarized the issues studied using classic multivariate statistical analysis into the following categories: 1. Data reduction or structural simplification Data reduction is the process of converting data from a space with high dimensions to a space with low dimensions while retaining important features and valuable information from the original data. The intention is that this will be a simplified and more easily interpretable representation of the phenomenon of interest. Principal component analysis and factor analysis, which we introduce in Chapters 8 and 9, are typical structural simplification methods. 2. Classification and discrimination Classification and discrimination refer to classifying (or clustering) individuals (or variables) on the basis of measured characteristics. Additionally, rules for classifying objects into well-defined groups may be required. Typical and frequently used methods include cluster analysis and discriminant analysis, which we introduce in Chapters 11 and 12. 3. Investigation of the relationship between variables of interests Biomedical research often focuses on examining the relationship between variables of interest. This type of research aims to determine whether there is a correlation between variables and whether predictions can be made about one variable based on others. Statistical methods such as regression analysis (Chapters 4e7) and canonical correlation (Chapter 10) are commonly used to address these issues. 4. Statistical inference of multivariate data The statistical inference of multivariate data is similar to that of univariate analysis, with a focus on estimating and testing hypotheses about the parameters of multivariate populations. This may be performed to validate assumptions or to reinforce prior convictions. We introduce these contents in Chapters 2 and 3. 5. Theoretical basis of multivariate statistical analysis The theoretical basis of multivariate statistical analysis involves multidimensional random vectors and mostly normal random vectors, in addition to various multivariate statistics defined using these vectors, and deriving their distribution and studying their properties, and studying their sampling distribution theory. We also introduce these contents in Chapter 3. We conclude this brief introduction to multivariate analysis with a quotation from F.H.C. Marriott: “You should keep it in mind whenever you attempt or read about a data analysis. It allows one to maintain a proper perspective and not be overwhelmed by the elegance of some of the theory.” 1.2 Application of multivariate statistical analysis To further illustrate the application of multivariate statistical analytic techniques in medical research, we provide several examples (or problems) that we have experienced Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Overview of multivariate statistical analysis personally in this book, which may help to promote readers’ conceptual understanding of multivariate statistical methods and facilitate a deepened understanding of this topic in combination with their own practice. We classify these examples on the basis of the objective and content of multivariate statistical analysis. 1. Data reduction or structural simplification • The assessment of self-care ability is a crucial component in research on the quality of life among older adults and encompasses 12 indicators that range from basic abilities, such as dressing and eating, to more complex abilities, such as shopping and financial management. In practice, a challenge arises in the analysis of such data because it is essential to determine how to simplify the data structure while preserving critical information. • A study aims to evaluate the health status of residents in Wuhan urban, China. The project encounters the issue of managing a large number of variables, and it becomes challenging to evaluate the health status of the participants comprehensively using a relatively large index system. 2. Classification and discrimination • An approach involves grouping geographic areas based on demographic, medical, and health service indicators, followed by an assessment of the appropriateness of medical resource allocation. • For patients with pulmonary nodules, how are malignant tumors identified using image information such as the size, location, and shape of the nodules, combined with the clinical manifestations of the patients? 3. Investigation of the relationship between variables of interests • Based on data from a national health survey, in a study, researchers aim to examine the correlation between the physical development of adolescents and their lung function status, taking into consideration their physical development status. • Prognostic factors that influence the outcome of breast cancer surgery are explored and the extent of the influence of various prognostic factors on the survival time of patients is determined in a follow-up study. 4. Statistical inference of multivariate data Multivariate inference is particularly useful for curbing the researcher’s natural tendency to read too much into the data. Examples include the following: • How is the efficacy of a new drug evaluated compared with existing drugs in the treatment of patients with AIDS through changes in laboratory indicators such as virology and immunology? • Systolic blood pressure, total cholesterol, and body mass index are important predictors of cardiovascular disease. How are the distributions of these indicators compared across various ethnic groups based on sample data, and used to guide the derivation of the prevention and control policy of cardiovascular disease? Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com 3 4 Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Applied Multivariate Statistical Analysis in Medicine It should be noted that real-world research is often multifaceted and complex, and multiple approaches could be suitable for the same problem, in many cases. The field of multivariate statistics in medical research is highly practical and useful. Although we do not intend to overemphasize the mathematical foundation of this field, it is important to recognize the inherent relationship between statistical theory and application. The vast range of research topics within basic medicine, clinical medicine, public health, and preventive medicine provides ample opportunities for the application of multivariate statistics. Additionally, the use of multivariate statistical methods continues to expand into new fields, with basic statistical theory serving as a common theoretical foundation for these methods. 1.3 Structure of multivariate data Whenever a researcher intends to investigate a phenomenon or validate a certain hypothesis, more than one variable (characteristic) is usually involved and thus a data structure with multivariate data can be formed. We now introduce the preliminary concepts that underlie these first steps of data organization. Let xij ði ¼ 1; 2; /; n; j ¼ 1; 2; /; pÞ denote the particular measurement of the ith item (object, or observation) of the jth variable. Consequently, n measurements for p variables can be displayed as shown in Table 1.1. Alternatively, we can display these data as a rectangular array, called data matrix X, of n rows and p columns: 2 T 3 Xð1Þ 2 3 x11 x12 / x1p 6 T 7 6 x21 x22 / x2p 7 6 X 7 6 7 6 ð2Þ 7 X ¼6 (1.1) 7; orb X1 ; X2 ; /; Xp ; 7b6 4 « « « 5 6 « 7 5 4 xn1 xn2 / xnp T XðnÞ Table 1.1 Tabular representation of the multivariate data structure. Variable Item X1 X2 / Xj / Xp 1 2 « i « n x11 x21 « xi1 « xn1 x12 x22 « xi2 « xn2 / / x1j x2j « xij « xnj / / x1p x2p « xip « xnp / / / / Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Overview of multivariate statistical analysis where the superscript “T ” denotes “transpose”. Data matrix X may be simplified as xij np. T ¼ x ; x ; /; x , which denotes The ithði ¼ 1; 2; /; nÞ row of matrix X is XðiÞ i1 i2 ip the observation of the ith item is called row vector. Before a specific observation behavior occurs, it is a p-dimensional random vector. The jthðj ¼ 1; 2; /; pÞ column of matrix X is 2 3 x1j 6 x2j 7 6 7 X j ¼ 6 7; 4 « 5 xnj which denotes n observations of the jth variable, is called column vector. Before a specific observation behavior occurs, it is an n-dimensional random vector. In multivariate statistical analysis, all the content involved consists of random vectors or random matrices that are composed of multiple random vectors. For details, see Chapter 13. Representing multivariate data using a data matrix has the following advantages: (1) it may be more convenient for the transformation, processing, and calculation of data; and (2) it is easy to program the data matrix on a computer; hence, the calculation of some statistics can be completed by the program. Example 1.1: In a national project aimed at understanding the health status and basic physiological parameters of different regions in China, chest circumference (cm) X1 , waist circumference (cm) X2 , and hip circumference (cm) X3 were measured. Part of the data of 57 junior girls (12 years old) from Jiangsu province is shown in Table 1.2. Three random variables (X1 , X2 , and X3 ) are involved in this research. The measurements of these variables for each participant constitute a row vector, which is a random Table 1.2 Physiological data of 12-year-old girls in Jiangsu province. Individual X1 X2 X3 1 2 3 4 5 6 7 « 57 72.0 78.0 75.0 70.0 76.0 71.0 63.0 « 80.0 65.0 67.0 62.0 61.0 60.0 62.0 58.0 « 68.0 80.0 91.0 80.0 88.0 91.0 83.0 78.0 « 92.0 CAMS Innovation Fund for Medical Sciences (CIFMS) (2020-I2M-2e009). Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com 5 6 Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Applied Multivariate Statistical Analysis in Medicine vector with three dimensions ðp ¼ 3Þ. When the measurements of the 57 junior girls are complete, the row vectors are T T T ¼ ð72:0; 65:0; 80:0Þ; Xð2Þ ¼ ð78:0; 67:0; 91:0Þ; /; Xð57Þ ¼ ð80:0; 68:0; 92:0Þ. Xð1Þ Similarly, we can obtain column vectors of the three variables (chest circumference, waist circumference, and hip circumference): 3 3 3 2 2 2 72:0 65:0 80:0 6 78:0 7 6 67:0 7 6 91:0 7 7 7 7 6 6 6 X1 ¼ 6 7; X2 ¼ 6 7; X3 ¼ 6 7. 4 « 5 4 « 5 4 « 5 80:0 68:0 92:0 1.4 Descriptive statistics of multivariate data Much of the information contained in data can be assessed by calculating certain numerical characteristics known as descriptive statistics. For example, the sample arithmetic mean (or sample mean) in univariate analysis is a descriptive statistic that provides a measure of the central location for a set of data. Additionally, the average of the squares of the distances of all the values from the mean provides a measure of the spread, or variation. In multivariate analysis, we rely most heavily on descriptive statistics that measure location, variation, and linear association between variables. We provide formal definitions of these quantities in Chapter 2. In the present chapter, we introduce commonly used descriptive statistics, such as the sample mean vector, sample covariance matrix, and sample correlation matrix. 1.4.1 Sample mean vector The sample mean vector plays a central role in the description of the sample data matrix. Let n be the number of items of each of p variables. The mean vector calculated from the sample data is denoted by 2 3 X1 6 7 T 6X 7 X ¼ 6 2 7 ¼ X 1 ; X 2 ; /; X p ; (1.2) 4 « 5 Xp P where X j ¼ 1n ni¼1 xij ðj ¼ 1; 2; /; pÞ. In Example 1.1, there are three variables (X1 , X2 , and X3 ), the sample mean of each variable can be calculated as Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com We Don’t reply in this website, you need to contact by email for all chapters Instant download. Just send email and get all chapters download. Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com You can also order by WhatsApp https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph one_number&app_absent=0 Send email or WhatsApp with complete Book title, Edition Number and Author Name. Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Overview of multivariate statistical analysis X1 ¼ 57 1 X 1 ð72:0 þ 78:0 þ / þ 80:0Þ ¼ 76:58; xi1 ¼ 57 i¼1 57 X2 ¼ 57 1 X 1 ð65:0 þ 67:0 þ / þ 68:0Þ ¼ 67:14; xi2 ¼ 57 i¼1 57 X3 ¼ 57 1 X 1 ð80:0 þ 91:0 þ / þ 92:0Þ ¼ 87:51. xi3 ¼ 57 i¼1 57 can be obtained as follows: Thus, the sample mean vector X 3 3 2 2 76:58 X1 7 7 6 6 X ¼ 4 X 2 5 ¼ 4 67:14 5 ¼ ð76:58; 67:14; 87:51ÞT . X3 87:51 1.4.2 Sample covariance matrix The variance-covariance matrix generalizes the notion of variance from one-dimension to multiple dimensions. We can use the variance-covariance matrix to depict the degree of dispersion of multiple random variables in the sample and the relationship between any two variables. To improve readers’ understanding of the concept, we write the calculation of the sample variance-covariance matrix in two parts: 2 1 Xn sjj ¼ x ðj ¼ 1; 2; /; pÞ; (1.3) x ij j n 1 i¼1 1 Xn xij xj ðxik xk Þðj; k ¼ 1; 2; /; p; j s kÞ. (1.4) sjk ¼ i¼1 n1 Eq. (1.3) is the calculation of the variance of each component of the p-dimensional random vector. Eq. (1.4) is the covariance between any two variables Xj and Xk in the p-dimensional random vector. In fact, Eqs. (1.3) and (1.4) can be expressed in a uniform manner because the variance of Xj could be viewed as its own covariance. For convenience, hereafter, we refer to the variance-covariance matrix of samples as the covariance matrix. Thus, for any given p-dimensional random vector, the sample covariance matrix is 2 3 s11 s12 / s1p 6 s21 s22 / s2p 7 6 7 S¼6 7. 4 « « « 5 sp1 sp2 / spp Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com 7 8 Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Applied Multivariate Statistical Analysis in Medicine Matrix S is a symmetrical matrix (i.e., S ¼ S T ) that constitutes p variances and pðp 2 1Þ covariances, where the variances lie on the leading diagonal of the matrix. Example 1.2: Referring to Example 1.1, calculate the sample covariance matrix of the three variables (X1 , X2 , and X3 ). Solution: First, calculate the variance of each component of the three-dimensional random vector: 1 X57 s11 ¼ s21 ¼ ðxi1 x1 Þ2 57 1 i¼1 1 ¼ ð72:0 76:58Þ2 þ ð78:0 76:58Þ2 þ / þ ð80:0 76:58Þ2 ¼ 67:32; 56 1 X57 s22 ¼ s22 ¼ ðxi2 x2 Þ2 57 1 i¼1 1 ¼ ð65:0 67:14Þ2 þ ð67:0 67:14Þ2 þ / þ ð68:0 67:14Þ2 ¼ 69:02; 56 1 X57 ðxi3 x3 Þ2 s33 ¼ s23 ¼ 57 1 i¼1 1 ¼ ð80:0 87:51Þ2 þ ð91:0 87:51Þ2 þ / þ ð92:0 87:51Þ2 ¼ 38:47. 56 Then, calculate the covariance between any two variables: 1 X57 ðxi1 x1 Þðxi2 x2 Þ 57 1 i¼1 1 ¼ ½ð72:0 76:58Þð65:0 67:14Þ þ / þ ð80:0 76:58Þð68:0 67:14Þ ¼ 60:85; 56 1 X57 s13 ¼ ðxi1 x1 Þðxi3 x3 Þ 57 1 i¼1 1 ¼ ½ð72:0 76:58Þð80:0 87:51Þ þ / þ ð80:0 76:58Þð92:0 87:51Þ ¼ 47:31; 56 1 X57 s23 ¼ ðxi2 x2 Þðxi3 x3 Þ 57 1 i¼1 1 ¼ ½ð65:0 67:14Þð80:0 87:51Þ þ / þ ð68:0 67:14Þð92:0 87:51Þ ¼ 43:27. 56 s12 ¼ Since s12 ¼ s21 , s13 ¼ s31 , and s23 ¼ s32 . Thus, the sample covariance matrix of the three random variables X1 , X2 , and X3 can be obtained as follows: Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Overview of multivariate statistical analysis 2 67:32 60:85 47:31 47:31 43:27 38:47 6 S ¼ 4 60:85 3 7 69:02 43:27 5. 1.4.3 Sample correlation matrix Another important descriptive statistic is the sample correlation matrix. The Pearson correlation coefficient between variables Xj and Xk in p-dimensional space is defined as Pn sjk i¼1 xij xj ðxik xk Þ ffi ðj; k ¼ 1; 2; /; p; j s kÞ. (1.5) rjk ¼ pffiffiffiffiffiffiffiffiffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 Pn Pn sjj skk 2 x x ðx x Þ j k i¼1 ij i¼1 ik The values of the correlation coefficient lie between 1 and þ1, and the magnitude of the absolute value of rjk denotes the strength of the correlation between variables Xj and Xk , whereas the sign indicates the direction of the correlation. Based on Eq. (1.5), the sample correlation matrix is defined as 2 3 1 r12 / r1p 6 r21 1 / r2p 7 6 7 R¼6 7. 4 « « « 5 rp1 rp2 / 1 The reason that we are interested in the correlation coefficient statistic is that it is unit free; that is, it does not vary as the unit of measurement changes. In fact, when each variable is normalized, the covariance matrix obtained after the normalized transformation is equal to the correlation matrix of the original variable; this standardized covariance is called a correlation. In practice, the correlation coefficient is more intuitive than covariance in the measurement of the correlation between variables. Example 1.3: Referring to Example 1.1, calculate the correlation matrix of the three variables (X1 , X2 , and X3 ). Solution: Based on Eq. (1.5), correlation matrix of X is as below 3 2 1:00 0:89 0:93 7 6 R ¼ 4 0:89 1:00 0:84 5. 0:93 0:84 1:00 Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com 9 10 Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Applied Multivariate Statistical Analysis in Medicine 1.5 Statistical distance Although they may appear formidable at first, most multivariate techniques are based on the simple concept of distance. Distance quantifies how far two objects are from each other. Because most multivariate methods rely on the measurement of distance between samples or variables, it is necessary to introduce the concept of distance prior to the introduction of a specific multivariate statistical method. A comprehensive discussion about distance is available in Chapter 11. The Euclidean distance (or straight-line distance) is the most common measure of distance. If we consider point Pðx1 ; x2 Þ in two-dimensional space, the Euclidean distance to origin point O(0,0) is, according to the Pythagorean theorem, qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (1.6) dðO; PÞ ¼ x21 þ x22 . Generally, as we expand two-dimensional space to p-dimensional space, for any given point P in p-dimensional space with coordinate x1 ; x2 ; /; xp , its Euclidean distance from P to origin point Oð0; 0; /; 0Þ is qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (1.7) dðO; PÞ ¼ x21 þ x22 þ / þ x2p . The Euclidean distance between two arbitrary points P ¼ x1 ; x2 ; /; xp and Q ¼ y1 ; y2 ; /; yp is given by rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 dðP; QÞ ¼ ðx1 y1 Þ2 þ ðx2 y2 Þ2 þ / þ xp yp . (1.8) Although the Euclidean distance is simple and intuitive, it is unsatisfactory for most statistical purposes. This is because each coordinate contributes equally (equal weight) to the calculation of the Euclidean distance. Therefore, the Euclidean distance may fail to capture the change of values of indicators with varying degrees of variation. The purpose now is to develop a statistical distance that accounts for differences in variation and, in due course, the presence of correlation. Because our choice depends on the sample variances and covariances, at this point, we use the term statistical distance to distinguish it from the ordinary Euclidean distance. To illustrate, suppose we have n pairs of measurements on two variables each having mean zero. Call the variables x1 and x2 , and assume that the x1 measurements vary independently of the x2 measurements. In addition, assume that the variability in the x1 measurements is larger than the variability in the x2 measurements. A scatter plot of the data would look something like the one pictured in Fig. 1.1. It can be easily found that the number of observation points contained in the unit length (density) of the x1 -axis is much less than the number of observation points Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Overview of multivariate statistical analysis Figure 1.1 Schematic scatter plot of point in a plane. contained in the unit length of the x2 -axis, which may be caused by the different dimensions of x1 and x2 or the degree of variation itself. A common approach to solve this problem is to divide each coordinate by the sample standard deviation. Therefore, on division by the standard deviations, we obtain the * 1 2 pxffiffiffiffi “standardized” coordinates x*1 ¼ pxffiffiffiffi s11 and x2 ¼ s22 . The standardized coordinates ensure the consistency of the measurement scale. Thus, the statistical distance of point Pðx1 ; x2 Þ from origin Oð0; 0Þ can be computed from its standardized coordinates x*1 ; x*2 : sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi * 2 * 2ffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffi 2 pffiffiffiffiffiffi 2 x21 x22 þ ; (1.9) dðO; PÞ ¼ x1 þ x2 ¼ ðx1 = s11 Þ þ ðx2 = s22 Þ ¼ s11 s22 that is, the statistical distance is the weighted Euclidean distance from the original coordinates. The difference between Eqs. (1.9) and (1.6) is that k1 ¼ s111 and k2 ¼ s122 , which are the weights for x21 and x22 , respectively, are added to Eq. (1.9). When the two variables have the same variance, that is, k1 ¼ k2 , the statistical distance differs from the Euclidean distance by a constant term; that is, if the variability in the x1 direction is the same as that in the x2 direction, and the x1 values vary independently of the x2 values, then the Euclidean distance is appropriate. Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com 11 We Don’t reply in this website, you need to contact by email for all chapters Instant download. Just send email and get all chapters download. Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com You can also order by WhatsApp https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph one_number&app_absent=0 Send email or WhatsApp with complete Book title, Edition Number and Author Name. 12 Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Applied Multivariate Statistical Analysis in Medicine Setting the right-hand side of Eq. (1.9) to cðc 0Þ and squaring both sides of Eq. (1.9), we obtain x21 x22 þ ¼ c2. s11 s22 (1.10) Eq. (1.10) indicates that the locus of all points whose statistical distance from the origin is squared by constant c 2 is an ellipse centered on the origin, with the major (long) and minor (short) axes coinciding with the coordinate axes. The concept of statistical distance can be easily generalized to p-dimensional space. Given an arbitrary point P ¼ x1 ; x2 ; /; xp and any fixed point Q ¼ ðy1 ; y2 ; /; yp Þ, and if we assume that the coordinate variables vary independently of one another. Let s11 ; s22 ; /; spp be sample variances constructed from n measurements. Then the statistical distance from P to Q is vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 u u 2 2 y x p p tðx1 y1 Þ ðx2 y2 Þ þ þ/þ dðP; QÞ ¼ . (1.11) s11 s22 spp Eq. (1.11) has a similar geometric implication to Eq. (1.9); that is, it represents all points whose statistical distances squared with respect to fixed point Q are a constant and distributed on a hyperellipsoid, whose center is Q. Additionally, each principal axis is parallel to the corresponding coordinate axis. Eq. (1.11) also indicates that when s11 ¼ s22 ¼ / ¼ spp ¼ 1; that is, the lengths of the main axes of the ellipsoid are all 1, the hyperellipsoid becomes a unit sphere, and then the statistical distance is reduced to the Euclidean distance. The distance in Eq. (1.11) still does not include most of the important cases we encounter because of the assumption of independent coordinates. As shown in Fig. 1.2, the spread of the points indicates that variables x1 and x2 are related to each other. In fact, the coordinates of the pairs ðx1 ; x2 Þ exhibit a tendency to be large or small together, and the sample correlation coefficient is positive. Moreover, the variability in the x2 direction is larger than that in the x1 direction. Fig. 1.2 shows that, in the case in which the distribution of the points remains unchanged, rotating the original coordinate system counterclockwise by an angle of q leads to new coordinates, which makes the new coordinates e x1 and e x2 independent. Thus, we define the statistical distance from point P e x1 ; e x2 to origin Oð0; 0Þ as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi e x21 e x2 dðO; PÞ ¼ þ 2; es11 es22 (1.12) Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Overview of multivariate statistical analysis Figure 1.2 Schematic diagram of positive correlation data and a rotating coordinate. where es11 and es22 denote the sample variances computed using the e x1 and e x2 measurements, respectively. The original coordinates ðx1 ; x2 Þ and coordinates ðe x1 ; e x2 Þ after the rotation have the following relationship: e x1 ¼ x1 cos q þ x2 sin q; e x2 ¼ x1 sin q þ x2 cos q. (1.13) Eq. (1.13) is substituted into Eq. (1.12). Then, after a simple calculation, the distance from point Pðe x1 ; e x2 Þ to origin Oð0; 0Þ can be calculated using (x1 ; x2 ); that is, the original coordinate of P: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dðO; PÞ ¼ a11 x21 þ 2a12 x1 x2 þ a22 x22 . (1.14) The coefficients a11 ; a12 ; a22 in Eq. (1.14) are determined by q. The difference between Eqs. (1.14) and (1.12) lies in the presence of the cross-product term 2a12 x1 x2 necessitated by the nonzero correlation r12 . Generally, under the condition that variables are correlated with each other, the statistical distance between point Pðx1 ; x2 Þ and any fixed point Qðy1 ; y2 Þ is expressed as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (1.15) dðP; QÞ ¼ a11 ðx1 y1 Þ2 þ 2a12 ðx1 y1 Þðx2 y2 Þ þ a22 ðx2 y2 Þ2 . Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com 13 14 Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Applied Multivariate Statistical Analysis in Medicine Additionally, the coordinates of all the points Pðx1 ; x2 Þ whose distance to Qðy1 ; y2 Þ is constant c satisfy a11 ðx1 y1 Þ2 þ 2a12 ðx1 y1 Þðx2 y2 Þ þ a22 ðx2 y2 Þ2 ¼ c 2 ; (1.16) Eq. (1.16) is the equation of an ellipse centered at Q. The graph of such an equation is displayed in Fig. 1.3. The major and minor axes are indicated. They are parallel to the e x1 and e x2 axes. Eqs. (1.15) and (1.16) can be directly generalized to p-dimensional space for the calculation of the distance between two points. For a p-dimensional random vector, the Mahalanobis distance, proposed by P. C. Mahalanobis in 1936, for two observed points P and Q (xðiÞ and xðjÞ ) is defined as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi T xðiÞ xðjÞ S1 xðiÞ xðjÞ ; dðP; QÞ ¼ (1.17) where S1 denotes the inverse matrix of the sample covariance matrix. The Mahalanobis distance is also called the generalized distance, which is a more general form of the statistical distance. Figure 1.3 Ellipse of points at a constant distance from fixed point. Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Overview of multivariate statistical analysis The statistical distance is a basic element of statistical description and statistical inference. The main difference between the statistical distance and Euclidean distance is that, by incorporating the reciprocal of the standard deviation of each observation index as the weighted Euclidean distance, the statistical distance can take both the variability between observed values and the relationship between observed variables into consideration, which makes it immune to the influence of the dimensions of each variable. In the following chapters, we repeatedly use the concept of the statistical distance so that readers can understand it further through the study of statistical principles and the analysis of case studies. 1.6 Statistical software The advancement of multivariate statistical analysis has been significantly propelled by rapid progress in computer technology. The availability of modern computer facilities enables the analysis of large datasets, thereby helping the application of multivariate techniques to emerging domains such as image analysis and improving the efficacy of data analysis, particularly in fields such as medicine. Given the substantial number of variables involved and the growing complexity of calculation methods, the application of multivariate statistics in medical research would face severe limitations without the assistance of specialized statistical software. Statistical software, which serves as a pivotal data analysis tool, represents a distinct technology within the realm of statistics and assumes an indispensable role in the execution of various intricate multivariate statistical approaches. At present, there is an abundance of software choices available, each with a wide range of capabilities for conducting multivariate analysis. These options encompass, but are not restricted to, well-known software such as Statistical Product and Service Solutions (SPSS), Statistics Analysis System (SAS), Stata, and R. Among the prominent statistical software options for multivariate analysis, each possesses distinct functionalities and characteristics that cater to diverse analytical needs. SAS is well known for its adaptability and strong statistical prowess. It is prominent in the realm of data handling by offering efficient utilities for data manipulation. Furthermore, SAS offers sophisticated statistical modeling methods, which makes it the preferred option in sectors such as healthcare, finance, and research. Stata is highly esteemed for its user-friendly interface and comprehensive statistical capabilities. It accommodates a diverse array of data types, and provides an extensive library of statistical and graphical functions. Researchers value Stata for its adeptness in managing large datasets and its robust regression analysis tools. Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com 15 16 Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Applied Multivariate Statistical Analysis in Medicine SPSS is widely recognized for its intuitive interface and user-friendly approach to statistical analysis. It is the preferred tool for beginners and social scientists. SPSS excels in data visualization, which makes it easier for researchers to present and interpret results. It also integrates seamlessly with other data analysis software, which enhances its versatility. R stands out as an open-source programming language and environment for statistical computing and graphics. Its strengths are its extensibility and the vast communitycontributed packages available for specialized analyses. R is favored by data scientists and statisticians because of its flexibility, which allows custom script creation and the implementation of cutting-edge statistical techniques. The statistical software that researchers use depends on the particular analysis required, the amount of data, the expertise of the user, and personal preferences. SAS, Stata, SPSS, and R each have strengths; hence, they are all useful for data analysts and researchers. In this book, we focus on SAS, but the concepts presented in the textbook can be applied to other software. This allows readers to adapt and switch between various programs as needed in their research. 1.7 Problems 1. What is multivariate analysis? Are these study variables generally correlated or independent? 2. Among the five main contents for multivariate statistical analysis introduced in this chapter, which methods can be regarded as the direct expansion of univariate analysis and which methods have more characteristics of multivariate analysis? Please explain, with examples. 3. What is a random vector? Does the concept of random vectors exist in univariate statistical analysis? Please provide your explanation. 4. What is the statistical distance? Have we been exposed to this concept in the study of univariate statistics? Please provide examples of its role in statistical inference. 5. Please explain the role that multivariate analysis plays in medical research combining the context of this chapter and your own experience. 6. To explore the relationship between body weight (kg) X1 and forced vital capacity (FVC) (L) X2 of adults, 30 males under 40 years old were randomly sampled, and their body weight and FVC measured. Data are shown in Table 1.3. (a) Create scatter plots and marginal scatter plots (plotting only one variable each time) of the data and interpret these graphs. (b) Assess the signs of the sample covariance based on the scatter plot. (c) Calculate the sample mean vector, sample covariance matrix, and sample correlation matrix. Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com Overview of multivariate statistical analysis Table 1.3 Body weight and FVC data of 30 adult males. Individual X1 X2 Individual X1 X2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 69.8 85.5 74.8 52.3 67.4 61.8 49.2 56.9 59.1 48.9 48.9 60.3 76.7 66.9 53.1 4.13 4.44 4.02 4.21 3.83 4.74 4.26 4.32 4.42 4.27 4.27 4.18 4.61 4.44 3.83 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 60.5 90.4 80.2 80.2 51.7 71.1 71.1 57.5 55.3 50.7 85.8 77.9 68.5 67.1 77.9 4.48 4.69 5.01 5.01 4.49 4.78 4.78 5.11 4.15 3.93 4.92 5.23 4.53 4.14 4.57 National Survey on Health Status and Basic Physiological Parameters (2022). Bibliography Acock, C. (2023). A gentle introduction to Stata (Revised Sixth Edition). Stata Press. Adachi, K. (2016). Matrix-based introduction to multivariate data analysis. Springer. Cotton, R. (2013). Learning R: A step-by-step function guide to data analysis (1st ed.). O’Reilly Media. Elliott, A. C., & Woodward, W. A. (2023). SAS essentials: Mastering SAS for data analytics (2nd ed.). Wiley. Johnson, R., & Wichern, D. (2018). Applied multivariate statistical analysis (6th ed.). Pearson. Kendall, M. G. (1975). Multivariate analysis. Griffin. Mahalanobis, P. C. (1936). On the generalized distance in s49-55tatistics. The National Institue of Sciences of India, 2(1), 49e55. Marriott, F. H. C. (1974). The interpretation of multiple observations. Academic Press. Pituch, J. A., & Stevens, J. P. (2016). Applied multivariate statistics for the social sciences analyses with SAS and IBM’s SPSS (6th ed.). Routledge. Rencher, A. C., & Christensen, W. F. (2012). Methods of multivariate analysis (3rd ed.). Wiley. Salcedo, J., McCormick, K., Peck, J., & Wheeler, A. (2017). SPSS statistics for data analysis and visualization (1st ed.). Wiley. Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com 17 We Don’t reply in this website, you need to contact by email for all chapters Instant download. Just send email and get all chapters download. Get all Chapters For Ebook Instant Download by email at etutorsource@gmail.com You can also order by WhatsApp https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph one_number&app_absent=0 Send email or WhatsApp with complete Book title, Edition Number and Author Name.