Invited lecture to the Conference on Computers in Sport, Israel, Netanya, Wingate Institute, 1992 SYSTEMS THEORY APPROACH TO USING STATISTICS IN SPORT PERFORMANCE ANALYSIS Petr Blahuš Faculty of Physical Education and Sport Charles University, Prague 1. Introduction Systems theory paradigm is considered to be a general methodological approach to using statistical methods for analysis of sport performance. There are two main motivations for that: (i) The everyday experience shows that in many cases statistical methods are applied without a clear methodological conception. The systems theory approach helps to formulate the solved problem more clearly, typically as a problem of input - output or training - performance statistical relationship or as a network of statistical relationships. (ii) On the other hand some rather more practical aspects can be found if one takes into account the huge amount of different statistical methods and software statistical packages. Then systems theory approach helps to choose proper design of the future investigation or select appropriate statistical methods for analysis of given sport training data with respect to desired aim. The systems theory conception stresses that a researcher or analyst working jointly with the coach should formulate the problem to be solved in terms of systems theory paradigm of input, output and further types of system variables. 2. An athlete - statistical unit or dynamic system ? The very first assumption for any application of statistics can be seen in explicit definition of mutual correspondence between the pairs of notions: observed data --- values of a variable athlete s attribute --- statistical random variables This seemingly apparent correspondence very often relies just on intuition or it is taken as granted. This fact may become a source of blunders, for instance if data were represented by fallible pseudo-categories that would not be actually disjunctive and exhaustive in the sense of random events. If a researcher wishing to use a statistical method avoids mistakes of this kind he usually starts to deal with rather purely statistical problems like those of normality, linearity etc. But we find such an approach to be too formalistic and we assume that some more general questions of scientific inquiry are to be answered first. From the viewpoint of applications the expression that "random variable x takes on a value " means that the value is somehow connected with a statistical unit E. In our case the unit, sometimes also called "unit of observation" typically is an investigated athlete. The idea of joint observation of several different variables x, y, z, ... that take on their values over a sample of athletes seems to follow the old Aristotle s paradigm of attributes connected to entities. See Fig. 1 (a). For this conception it is typical that attributes are considered as "labels" that are indifferent with respect to possible dynamic behavior of corresponding entity E. -------------------------insert Fig. 1 around here -------------------------Applied to performance analysis the above-mentioned considerations mean that from now (1) an athlete should be no more recognized as a "statistical unit" with observed "attributes" but as a system with possibly dynamic behavior (2) its investigated attributes - like performance, amount of training load, level of abilities etc. - cannot be taken as indifferent qualities or quantities but have to be distinguished into several types: at least input variables and output variables or some further types of system variables like internal state variables etc. can be recognized following the kind of problem to be solved. 3. Problems in monitoring performance as system tasks Typical problem is the input - output relationship, i.e. the training - performance relationship. In the simplest case we can look at the problem as a direct relationship, i.e. relationship which is not transmitted through variables inside the system. If, moreover, the relationship is assumed to be linear we can describe this statistical problem in terms of multivariable linear system by a multiple regression equation, as it is the case in Fig. 2. -------------------------insert Fig. 2 around here -------------------------In modeling or representing a practical problem in the systems theory way it is necessary to be aware what kind of variability is inherent in the data we use for estimation of relationships and of parameters in the equation similar to the regression equation in Fig. 1. It is quite fundamental difference if, say, the input-output relationship is studied on the basis of intraindividual variability of one athlete and based on his diachronic investigation over time. Or, if the same relationship is investigated as interindividual (among different athletes) and synchronically (in one time "slice"). This latter case of investigation is known as the so-called correlational research. It is more frequent, more convenient but tells us less about monitoring individual training and its planning. The reason is that this type of relationship is too far from the ideal of desired cause-effect influence. Of course there are some other fruitful kinds of problems and relationships than inputoutput analysis. For instance, let us consider a sport discipline where output performance cannot be directly measured or observed in sufficiently short time periods to evaluate effects of training for short time feed back control. Such a case can be find e.g. in gymnastics with its two or three competitions in a year. Or in some situations we don t have the performance even defined - like in the case of individual player performance in ice hockey, basketball and in other team games.(A counterexample to those could be long jump - it can be measured directly and at least each week to follow the change of performance). In such sport disciplines where the performance is "difficult to measure" we have to solve the problem of constructing auxiliary indirect indicators of output performance. I.e. we have several indirect criteria of performance and we are in position to solve tasks as the following: to reduce the number of indirect output indicators to find the most relevant outputs etc. But some complicated problems arise if we take into account the more realistic situation, namely that the system has an internal structure which intermediates the connection between input and output, i. e. between training and performance. 4. The internal structure and latent variables From the classical systems theory point of view the intermediating structure is represented by internal state variables s, as it is pictured in Fig. 3. In the figure there are at least two possibilities: (a) input x influences internal state s and that influences further the output y (b) the past state s (in time t-1) interacts with past input to yield a new present state which produces present output in interaction with the present input. -------------------------insert Fig. 3 around here -------------------------From the viewpoint of sport training the internal state s could be interpreted as "general state of preparedness" of an athlete or as a "state of development of motor abilities" influencing athlete s performance. This athlete s internal state interacts with the last training load to yield a new value of performance. If the training-state-performance paradigm works as linear dynamic system then its control can be described by linear equations indicated in Fig. 3. For the purpose of statistical analysis of training data the system can be described by set of regression equations. In these equations some of the variables - those hidden inside the system have character of directly unobservable or latent variables. The latent variables have to be estimated by a diagnostic procedure. The procedure includes process of statistical modeling in terms of latent variable models such as factor analysis and others. For instance the case in Fig. 3 (a) can be modeled by interbattery factor analysis model: the "state of preparedness" including level of motor abilities can be estimated by latent factor scores of so-called interbattery factors. Those are the factors which intermediate the connection of input variables in battery one with output variables in battery two. I.e. they intermediate connection between input variables, or training load indexes, and the output variables, or performance criteria. 5. Statistical models for latent internal structure Several sophisticated models are available for theories and hypotheses about more complicated internal structures intermediating the connection between the input training and the output performance. One of them is the quite well known LISREL model presented in Fig. 4. The model (cf. Jöreskog and Sörbom 1988) can be seen as a straightforward generalization of the system of Fig. 2. Just imagine that the data x on input training load are contamined by errors and observable only indirectly through auxiliary indicators. Thus, we are in position to estimate the "true" input through socalled measurement model II. Then, the same holds for "true" output which is observable only through indirect auxiliary performance criteria y and their measurement model I. But, what we are interested about is the relationship between true performance and true training. -------------------------insert Fig. 4 around here -------------------------In some even more complicated models we can proceed further as in the case of Fig. 5. The "true" input-output relationship can again be intermediated by internal state (a), and the internal state can have its higher level internal structure of latent variables connected each other by chains of internal relationships (b). -------------------------insert Fig. 5 around here -------------------------The idea of variables that influence sport performance by acting in chains resembling "causal chains" is more realistic than the older approach which was using multiple regression to combine "causal variables" without any hierarchy of cause-effect sequences. This seems to be the very reason why former correlational studies, hunting for significant correlations of various variables to performance, were able to add so little to the knowledge about monitoring training and performance. The above mentioned chains of influence on the sport performance can be modeled by different statistical tools of the family of path analytic models with latent variables. These can formulated also in terms of LISREL, or in a very helpful model RAM (the McArdle s so called Reticular Action Model, cf. McArdle,McDonald 1984) jointly with a very general COSAN model (McDonald 1978). 6. Monitoring of training as a system control process Another question is the process of monitoring performance. From the point of view of systems control theory we can accept the idea that the above mentioned conception of a monitored athlete actually deals with controlled system. The controlled system represents a particular subsystem within a complete system of control. In the frame of system of control it is the controlling subsystem which selects appropriate stimuli that function as controlling inputs to the controlled system. An appropriate controlling inputs can be applied only on the basis of knowledge of three streams or three blocks of information, namely about input, output and internal state. The information then proceeds through the process of storing, processing, and evaluation as illustrated in the Fig. 6. -------------------------insert Fig. 6 around here --------------------------------------------------insert Fig. 7 around here -------------------------The applied parallel situation of monitoring sport training can be seen in Fig. 7. There athlete s monitoring is modeled as in the preceding Fig. 6. An athlete represents the controlled system with its internal structure, and the control process is realized through activities as training data collection, its storing, processing and evaluation. Finally, it leads to the creation of modified training plan and new training load stimuli applied to the athlete s workout as controlling inputs. References Blahuš, P. (1974). De la causalité dans l analyse de correlation et factorielle des aptitudes. Scientia Paedagogica Experimentalis, 11, 1, 24-33. Blahuš, P. (1982). Methodological problems of latent variable models. Acta Universitatis Carolinae G., 23/1, 25-37. Blahuš, P., Hruby,J., Kvapil,J., & Paichl,J. (1988). Systems theory approach to using statistics in social sciences. Praha: Charles University Press. Galtung, J. (1968). Diachronic correlation, process analysis, and causal analysis. UNESCO Seminar on Developmental Sociology, Rio Janeiro, July 1968. Jöreskog, K.G., & Sörbom, D. (1988) LISREL 7: A guide to the program and applications. Chicago: SPSS, Inc. Klir, G.J. (1972). Trends in general systems theory. N. York: Wiley. McArdle, J.J., & McDonald, R.P.(1984). Some algebraic properties of the RAM logic for structural equation model specification. Brit. J. math. statist. Psychol. 37, 234251. McDonald, R.P. (1978). A simple comprehensive model for the analysis of covariance structures. Brit.J. math.statist. Psychol. 31, 59-72. McDonald,R.P. (1985). Factor analysis and related methods. Hillsdale, NJ: Lawrence Erlbaum Ass. Mesarovic, M. D., & Takahara, T. ( 1975). General systems theory: mathematical foundations. N. York: McGraw. Morrison, D.F. (1967). Multivariate statistical methods. N. York: McGraw. TITLES TO FIGURES: Fig. 1 An athlete as an object of statistical investigation Fig. 2 Input - output, traing - performance relationship described by linear regression equation Fig. 3 Internal state intermediating the training - - performance relationship Fig. 4 LISREL model description of internal structure Fig. 5 Further possible conceptions of the internal structures: (a) "true" training and "true" performance, (b) path analytic structure Fig. 6 General control system Fig. 7 Control system applied to monitoring of training performance