Biconstituent phenomenon of information and cognitive data analysis S.V. Smirnov* Institute for the Control of Complex Systems, Russian Academy of Sciences, Sadovaya St. 61, Samara, 443020, Russia Annotation In this paper the principle of biconstituent information is considered as the unity of its "properties" and its "carrier" (for example, in optics knowledge about the reference and reflected rays is necessary for information extraction). In data analysis it is proposed to consider the measurement procedure as the "carrier" and the measured value as a "property". The existence of the risk of obtaining an incorrect result of processing the measurement data is stated because of the knowledge incompleteness of complicated in general case information mechanism. A generalized form of the object-property table is proposed to reflect the realities of the empirical data accumulation. Presence of dependencies between the results of the measurement procedures has been researched as one of the fundamental risk factors. The necessity of introducing the so-called "properties existence constraints" into the context of data analysis problems is justified as illustrated by examples from cognitive data analysis. Key words: data analysis; information; the principle of biconstituent information; measuring procedure; table "objects-properties"; properties existence constraints; cognitive analysis 1. Introduction In cognitive activity metaphors are very useful tools for verbalization, for fixing in the language (and consciousness) of empirical experience and for hypotheses about objects of cognition. At first the intuitive observation of the similarity to the well-known concept, often expressed in speech just in the form of hidden comparisons, is able to start the formation of new scientific paradigms [1] and later - to empower the human mind in constructing specific methodologies [2] both within individual directions and in an interdisciplinary field. At the same time the opposite is also true when the aspect of the object of cognition supported by an expressive metaphor can to some extent "eclipse" its other significant aspects, hinder the emergence of adequate and effective theories and technologies. In thesaurus of data analysis (DA) metaphors are initially important such as the search (search, exploration), the retrieval (retrieval, extraction), extraction of minerals (mining), etc. It seems that the allusions induced by these references (like the actual etymology of the Russian word "dannyye", which is the usual translation of the English word "data"), lead the researcher away from the general scientific biconstituent concept (this low-use, but certainly lapidary term is derived in [3]) of the phenomenon of information, the meaning of obtaining which is the essence of DA. The proposed message aims to attract (most likely not for the first time, but with new ideas and on the basis of cognitive analysis) the attention of researchers to the frequently missed course, the line of thought in the development of the problematics of DA. 2. Information and the measurement aspect in data analysis It is generally accepted that two cognitive abilities of human consciousness play a fundamental role in comprehending the subject of reality: the distinction in the surrounding world of isolated objects and the discovery of connections between them. In this case, the immanent quality of any object - its "essential determinateness that is inseparable from being" [4] - can only be manifested by individual aspects-properties (attributes) and only as a result of interaction with the object. The content of the concept of "biconstituent" of the information phenomenon is associated precisely with the immutability of such interaction for obtaining information about the object, i.e. with the mandatory implementation of a certain action, which in the context of interest is known as the "measurement procedure". One component of this phenomenon concentrates knowledge about the measurement procedure; the other reflects the actual results of the measurement. For example, an image in optics is not only the fixed parameters of the front of a light wave reflected from an object. An adequate interpretation of the raster image by the subject, when the data represents the coordinates, amplitudes and frequencies of the points of the front, is possible only with knowledge of what light the object was illuminated with. In the case of a hologram that retains information on the amplitude and phase of the front, interpretation is impossible without knowledge of the characteristics of the reference coherent illumination of the object. Thus, in DA, "data" is reasonable to understand not only the measurement results fixed in the generally accepted "objectproperty" protocol and its extensive variations [5, 6], but also one or another set of a priori knowledge associated with measurement procedures, and also with the conditions for their implementation. * Corresponding author. Tel./fax: +7-846-333-2770. E-mail: [email protected] 3. Presentation of knowledge about the measurement procedures 3.1. Generalized table "objects-properties" The standard protocol for presenting empirical data to DA is the objects-properties table (OPT): (G*, M, V, I), (1) where G* = {gi}i = 1,…, r, r = G*k 1 – the set of observable objects: G* G, where G is the hypothetically conceivable set of objects of the subject domain (SD) being researched; M = {mj}j = 1,…, s, s = M 1 – the set of measured properties of the objects; V is the set of property values; I is the ternary relation between G*, M and V (I G*MV), defined for all pairs in G*M. The fundamental classification of DA problems is repelled by the traditional concept of "data" and indicates the tasks of two conjugate directions: the discovery of regular links between OPT elements and the use of the observed regularities for the prediction of some OPT elements by known values of its other elements [6]. Recognizing the constructiveness of this classification and the inviolability of its basis - OPT - it is appropriate, nevertheless, to raise the issue of reflecting in the current structural framework the data presentation of the "dimension aspect" discussed above in DA. At the same time, one should focus not on the creation of a universal methodology for implementing in OPT the possibility of describing the immense variety of methods and conditions for carrying out measurements, but on reflecting the most general realities of accumulating empirical information. Answering this call, it is natural to rely on the fact that the "measurement aspect" in the OPT is nominally presented: the set of columns of the table corresponds bijectively to the set of measurement procedures used in the formation of OPT. Further morphological analysis of OPT makes it easy to find ways to reflect the following fundamental factors of accumulation of empirical information: execution of, as a rule, multiple independent measurements (series of measurements) of the property mj M of the object gi G*; differentiation of confidence in the results of different series of measurements of the same object (for example, due to the difference in external conditions when performing a series of measurements); using several different procedures (congruent information sources) to measure the same property mj; differentiation of confidence in different measurement procedures. The object-property protocol (1) in this case replaces the tuple [7, 8] (G*, M, Sе, Pr, A), describing the generalized OPT, where (see Figure 1): mj M Pr(j) pr(j)1 pt(j)1 pr(j)2 pt(j)2 … Se(i) gi G* se(i)1 se(i)2 se(i)3 se(i)4 se(i)5 st(i)1 st(i)2 st(i)3 st(i)4 st(i)5 … pr(j)p(j) pt(j)p(j) … se(i)k st(i)k se(i)k+1 st(i)k+1 … se(i)q(i) st(i)q(i) A Figure 1. Structure of the generalized object-property table based on a two-dimensional matrix (for the given series of measurements, the dark cells of the matrix correspond to the results of V, and the light ones correspond to the result of NM). Se = ri=1 Se(i) is the set of all series of measurements performed during probing of the SD, |Se| = ri=1|Se(i)| = m, and Se(i) = {se(i)k}k=1,…, q(i), q(i) 1, i = 1,…, r - the set of series of measurements to which the object is subjected gi G*and every series se(i)k is characterized by a degree of confidence in its results st(i)k; Pr = sj=1 Pr(j) is the set of all measurement procedures used for probing the SD, |Pr| = sj=1|Pr(j)| = n, and Pr(j) = {pr(j)k}k=1,…, p(j), p(j) 1, j = 1,…, s - the set of congruent procedures for measuring the property mj M, and every procedure pr(j)k is characterized by a degree of confidence in its results pt(j)k; A = (aij)i=1,…, m; j=1,…, n is the matrix of measurement series results Sе properties M of objects from the sample G*, performed with the measurement procedures Pr, aij V NM, NM (not measured) – constant indicating that in the i-th series the measurement was not performed by the j-th procedure (the introduction of this formal result among other things is very useful for preserving the two-dimensional nature of the generalized OPT). 3.2. Properties existence constraints Of course, the generalization of OPT, performed at a high level of abstraction, is not able to reflect the full range of a priori information about the measurement procedures and the conditions for their implementation, which is necessary for performing effective data analysis. To describe and use this knowledge, a variety of methods are currently used, and their generalization is an actual scientific task. At the same time, a number of useful ideas can be indicated here today, and, in particular, the proposal to fix in the problem of data analysis the limitations of the existence of properties in the objects of the investigated SD. The term "property existence constraints" was proposed in [9], where a priori knowledge of measurement procedures (in [9] these were text analysis procedures) were converted into a limited number of necessary binary relations on the set of measurable properties. In principle, this approach is known and used in pattern recognition [10]. To the authors [9] he provided with an effective solution to the problem of conceptual analysis of the body of texts and later was applied and developed in other works on cognitive analysis [8, 11]. 4. Examples of involving knowledge of measurement procedures in cognitive data analysis 4.1. Ontological data analysis In the ontological data analysis [7, 8, 12] empirical evidence of the kind «the object gi possesses the property mj» is processed. The generalized OPT fully reflects the conditions for obtaining similar semantic judgments about the investigated SD, however, when data consolidation, it becomes necessary to use multi-valued logics, for example, vector ones [13, 14]. Another necessity is the construction of a special model for limiting the existence of properties, since the emergence of such limitations in cognitive analysis problems is organically associated with the application of a fundamental procedure for conceptual scaling of properties [8, 15]. 4.2. Formation of collective cognitive maps Cognitive maps [16] are a widely used in practice and actively developed set of models, methods and computer tools for formalizing expert knowledge in managing weakly structured situations (see, for example, [17-19]). However, the stage of constructing a cognitive map of the dynamic situation, which, as a rule, involves a large group of experts, is poorly supported by formal models and methods. It seems that the approaches outlined above can be applied to link the understanding of different experts in the analyzed problem situation [20]. The simplest cognitive map - the sign graph - can be described by a tuple (F, W), where F = (fi)i=1,…, n is the set of vertexfactors of the situation, W = (wij)i=1,…, n; j=1,…, n is the adjacency matrix of the graph: wij {+1, 0, -1}, wij 0 indicates the existence of an edge (i, j), in the graph fixing the influence of the i-th factor on the j-th: «positive» for wij = +1 and «negative» for wij = -1, wii = 0 (0 indicates that there is no influence). The result of the collective work of k experts on the formation of such a cognitive map can be expressed as follows: F = i=1,…, k Fi, W = i=1,…, k Wi. This means taking into account in the final model all the factors allocated by each expert (in the first approximation, by the set-theoretical combination of these factors Fi, i = 1,…, k), and also the formation of some composition of descriptions of the interfactor influences Wi, for the search of which the proposed methods of DA. As objects of the domain of interest to us we select the ordered pairs of factors (fi, fj) F, i j. The actual semantic judgments in this SD will be independent expert opinions on the «positive» («+») and «negative» («-») effects of the i-th factor on the j-th factor. The data received from the experts are structured in the form of a generalized OPT (see Table 1). In it, the constant X indicates the presence, and None indicates the absence of a property (fi, fj) for a property, the NM constant indicates that the expert did not evaluate the relevant influence (for example, because his assessment of the situation lacks one or both considered in the line of the GOPT influence factor), the Failure constant fixes the case of doubt and the expert's refusal from a certain assessment of the influence of the i-th factor on the j-th factor. It is not difficult to see that the examination data presented in Table 1, generally speaking, are incomplete and contradictory, and just as in ontological analysis of data for their combination it is expedient to apply multi-valued logic. Properties existence constraints (i.e., interfactor influences) for the considered SD should be sought in the fundamental definition of the sign digraph [16, 17]. Here it is possible to note the following: in the ordered pair of factors (fi, fj), i j the factor fi affects the factor fj either «positively» or «negatively» (i.e., the properties of the «+» and «-» ordered pairs of factors are incompatible); the edges of the sign digraph describe the aggregate of direct influences of one factor of the situation on another, and the indirect effects are calculated from the known "product rule", but there is no restriction that the factor fi can influence the factor fj, i j, either directly or only indirectly; in general, there is no prohibition on the existence of loops in the sign digraph, although such a restriction can be easily satisfied by excluding pairs of factors of the form (fi, fi) from the analysis of influences - see Table 1. Table 1. Evidence from the impact of the factors of the situation Expert 1 «+» «-» (f1, f2) X None (f1, f3) Failure X Failure Failure … (fn, fn-1) … … Expert k «+» «-» NM NM NM NM None None 5. Conclusion Data analysis should be performed with the help of a priori knowledge of the procedures for measuring the properties of objects in the probed domain and the conditions for performing these measurements. The metaphorical terminology of data analysis can interfere with accounting for this situation, which directly results from the "biconstituent" of the information phenomenon. Part of the reflection of the realities of data accumulation for analysis can be achieved by a certain generalization of the standard form of representation of empirical material - the objects-properties table. However, in spite of the fundamental nature of the factors considered by such a generalization, this, in the general case, will not be enough. The methods of introducing the a priori knowledge associated with measurement procedures into the context of data analysis problems can be very different, which is the essence of specific analysis techniques. At the same time, it is expedient to select the most general ideas and approaches, among which, undoubtedly, we should include the model of the limitations of the existence of properties. Acknowledgments The article is prepared on the basis of the research materials within the framework of the subsidized state assignment to the Institute for the Control of Complex Systems, Russian Academy of Sciences for 2017 in accordance with the Programs of Fundamental Scientific Research of the Presidium of the Russian Academy of Sciences: Program III.4, the subprogram "Integrated Management Systems", the project "Models and methods for the formation of cognitive maps in conditions Incompleteness of information about the problem situation". Literature [1] Kuhn T. The Structure of Scientific Revolutions: 50th Anniversary Edition. The University of Chicago Press, 2012. [2] Novikov AM, Novikov DA. Methodology [In Russian]. Moscow: SINTEG, 2007. [3] Vikhnin AG, Sakipov NZ. Storm of the fourth mega-project: who will be the new Bill Gates? System analysis and choice of strategy [In Russian]. Moscow: “Dialog-MIFI” Publisher, 2008. [4] Zhilin DM. Theory of systems: the experience of building a course. 4th ed., Rev. [In Russian]. Moscow: Publishing house LCI, 2007. [5] Barsegyan AA, Kupriyanov MS, Holod II, Tess MD, Elizarov SI. Data and Process Analysis [In Russian]. – St. Petersburg: BHV-Petersburg, 2009. [6] Zagoruyko NG. Applied methods of data and knowledge analysis [In Russian]. Novosibirsk: Sobolev Institute of Mathematics, SB RAS, 1999. [7] Semenova VA, Smirnov SV. Intelligent analysis of incomplete data to building formal ontologies. Proc. of Int. conf. Information Technology and Nanotechnology ITNT-2016 (May 17-19, 2016, Samara, Russia). - CEUR Workshop Proceedings, 2016; 1638: 796-805. DOI: 10.18287/1613-0073-20161638-796-805. [8] Samoilov DE., Semenova VA, Smirnov SV. Incomplete data analysis of for building formal ontologies [In Russian]. Ontology of designing, 2016; 6(3): 317-339. DOI: 10.18287/2223-9537-2016-6-3-317-339. [9] Lammari N, Metais E. Building and maintaining ontologies: a set of algorithms. Data & Knowledge Engineering, 2004; 48(2): 155-176. [10] Zakrevsky AD. Logic of recognition [In Russian]. Minsk: “Science and Technology” Publisher, 1988. [11] Pronina VA, Shipilina LB. Using the relationships between attributes to build domain ontology [In Russian]. Control Science, 2009; 1: 27-32. [12] Zubtcov RO, Semyonova VA, Smirnov SV. Algorithms and Software for Ontological Data Analysis [In Russian]. Proc. of VI Int. conf. Open Semantic Technologies for Intelligent Systems OSTIS-2016 (February 18-20, 2016, Minsk, Belarus). Editor-in-Chief: V.V. Golenkov. - Minsk: Publishing house of the Belarusian State University of Informatics and Radioelectronics, 2016; 83-88. [13] Arshinskii LV. Substantial and formal deductions in logics with vector semantics. Automation and Remote Control, 2007; 68(1): 139-148. [14] Arshinskii LV. The application of vector formalism in logic and logical-mathematical modelling [In Russian]. Ontology of designing, 2016; 6(4): 436-451. DOI: 10.18287/2223-9537-2016-6-4-436-451. [15] Samoilov DE, Smirnov SV. Data formation and processing in formal concept analysis: subjective aspects. Proc. of Int. conf. Information Technology and Nanotechnology ITNT-2016 (May 17-19, 2016, Samara, Russia). - CEUR Workshop Proceedings, 2016; 1638: 806-812. DOI: 10.18287/1613-0073-20161638-806-812. [16] Axelrod R. The Structure of Decision: Cognitive Maps of Political Elites. Princeton University Press, 1976. [17] Roberts FS. Discrete Mathematical Models, with Applications to Social, Biological, and Environmental Problems. Prentice-Hall, Englewood Cliffs, 1976. [18] Kuznetsov OP, Kulinich AA, Markovsky AV Analysis of dependencies in managing weakly structured situations based on cognitive maps [In Russian]. The human factor in management. Eds.: N.A. Abramova, K.S. Ginsberg, D.A. Novikov. Moscow: Publishing house KomKniga, 2006; 313-344. [19] Kulinich AA. Computer systems for modeling cognitive maps: approaches and methods [In Russian]. Control Science, 2010; 3: 2-16. [20] Smirnov SV. Some models and methods for the collaborative development of cognitive maps [In Russian]. Proc. of 6th Int. conf. Information technologies and systems ITiS-2017 (March 1-5, 2017, Bannoe, Russia). Eds.: Yu.S. Popkov, A.V. Mel’nikov. - Chelyabinsk: Publishing house of the Chelyabinsk State University, 2017; 281-283.