Clustered Numerical Data Analysis Using Markov Lie Monoid Based Networks APS 2016 - Salt Lake City Utah Joseph E. Johnson, PhD University of South Carolina jjohnson@sc.edu April 16, 2016 ABSTRACT: Numerical values are incomplete without their units of measurement, accuracy level, & exact defining meaning (metadata descriptors). But these components are scattered in the titles to rows, columns, & footnotes in non-standard ways. While humans can read such data sets, computers generally cannot. Data must be preprocessed by humans prior to electronic reading. We developed and implemented (Python cloud environment) a proposed standardization that merged these four components and developed a structure that supports automated data exchange and a new type of automated analytics for clustering. IMAGINE: If every number came with its units, uncertainty, and exact meaning all attached as a “metanumber” object: Instantly readable by both humans and computers, And there was a simple unique name (IP path), for every number a universal standard for all numerical data – Which could be used as a name in expressions where Dimensional & error analysis is automatically done Supporting automated data exchange, Big Data & AI. AND ALSO: An algorithm continuously scraping all numerical data from the web and converting it to standardized MetaNumber tables and then With a novel process each table is converted to two networks From which the dominant clusters are extracted and ordered Using a novel agnostic cluster identification algorithm Then with still another algorithm these clusters are linked One among entities (rows) and one among properties (columns) into a single supernet spanning all numeric information. A single network spanning our entire numerical universe ! METANUMBER - POWERFUL YET EASY > 6.3*ft + 4.83*m -37*inch Mix as needed • >>5.810+/-0.032*m Result is metric by default > 6.3*ft + 4.83*m -37*inch ! ft Gives result in ft • >> 19.06+/-0.11*(ft) Uncertainty from sig. dig. > 43*yard*72e6*inch !acre No decimal implies exact • >> 17768.523967*(acre) An ‘exact’ result > 5.3*[e_gold_density] Use [table_row_column] • >> (1.022+/-0.019)e+05*m_3*kg Input standard data > 18*[my_482] Inputs the result of users line 482 SOME HARDER PROBLEMS Compute the gravitational attraction between a 188 lb man and a 632.3 kg golf cart that is 7.93 yards away: > g*188*lb*632.3*kg/(7.93*yard)**2 g = G >> (6.844+/-0.017)e-08*m*kg*s_2 note the uncertainty If a BMW can accelerate from 0 to 70 mi/hr in 1.3 sec. then how much acceleration is this in “ g’s ” : > (70*mile/hour)/(1.3*s) ! ag where ag is the acc. of gravity >> 2.45+/-0.19*(ag) UNIT AND CONSTANT NAMES Unit and constant names are: Lower case, alphanumeric (with internal ‘_’). • NO Fonts, symbols, upper cases, & plurals. • All prefixes are separate words (mega, billion,..) • • All of the above are joined as mathematical variables to the value with * and / operations. 3.51*mega*kg/m3 • 8.5e5*thousand*dozen*m • OF: SIX BASIC SI UNITS & SIX OPTIONAL UNITS 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. m = meter s = sec = second kg = kilogram a = amp = ampere k = kelvin cd = candela b = bit op = flop p = person d = usd = dollar bn ln length time mass electrical current temperature luminosity information (Shannon, 1/0, T/F) double precision floating pt op living human (as in per capita) value baryon number lepton number NUMERICAL ACCURACY (UNCERTAINTY) If a value contains a decimal point then it is converted to an ‘uncertain number’ (ufloat): > 2.34 * 7.632e2 >> 1785.9+/-7.6 If there is no decimal (or is removed using scientific notation) then it is treated as exact: > 234e-2 * 7632e-1 >> 1785.888 METADATA & MEANING-1 Information is accessed on the default server as: • [table_row_column…] • [e_gold_density] • [mass_higgs] Data on any other server is denoted as: [IP Path to server _ dir __ table_row_column…] These unique names can be used to recall the value for calculations. FEATURES 1: Automatically converts all units and performs all dimensional analysis Metric (SI) units are the default for results but output in any valid units is optional Computes all numerical accuracies from the number of significant digits Documents the meaning of all values with associated metadata and tags Unlimited metadata can be attached to a value without any operating overhead Archives all values with a unique name for every single number given by its internet path All past results are archived permanently for users reuse The unique name can be used in all processing including computations Each retrieved MN can be easily read by both humans and computers. Web based numerical data is continuously scanned and standardized. FEATURES 2: It manages fully automated data exchange with API calls from the customers software Supports both Big Data and new levels of artificial intelligence (AI) such as Conversion of each standardized table into a mathematical network with our analytics supporting: A novel cluster analysis identifying hidden information in correlated sets of nodes from MN tables. The dominant clusters can be linked into a supernet of all numerical information. The six basic metric units are extended with six new information & socioeconomic processing units. The MN system can be effective within a corporation or agency providing a competitive advantage The revolutionary analytics can provide multiple new insights into hidden information structures The MetaNumber design and development is complete and operational MN revolutionizes the standardization, processing, linking, and analysis of all numeric information. Our system is unique in the standardization and computational execution of these attributes. MN TABLES -> NETWORKS A MN table Tij representing things (rows) with properties (columns) suggests that some ‘things’ are more alike. Rewrite Tij as ratios to remove dimensionality. Construct a network C’jk = exp-(Si Tij - Tik )2 Define the diagonal Cjj = - Si≠j C’ij We have proved that Cij is in the Lie algebra that generates all continuous Markov transformations: M(a) = exp(a C) is a Markov transformation. The eigenvectors of M identify the clusters in C And can be labeled with the eigenvalue corresponding to each cluster THE NUMERICAL UNIVERSE EXPRESSED IN A SINGLE SUPER-NETWORK Clusters abstract the essential information in each table. These clusters can be linked to other clusters using the eigenvalue weights of table indices (nodes). This results in a single network that joins all numerical information in … THANK YOU Joseph E. Johnson, PhD Distinguished Professor Emeritus Department of Physics and Astronomy University of South Carolina, Columbia SC, 29208 Room 405 Physical Sciences Center Email: jjohnson@sc.edu www.asg.sc.edu and www.metanumber.com