Clustered Numerical Data Analysis Using Markov Lie Monoid Based Networks

advertisement
Clustered Numerical Data Analysis
Using Markov Lie Monoid Based
Networks
APS 2016 -
Salt Lake City Utah
Joseph E. Johnson, PhD
University of South Carolina
jjohnson@sc.edu
April 16, 2016
ABSTRACT:

Numerical values are incomplete without their



units of measurement,
accuracy level, &
exact defining meaning (metadata descriptors).

But these components are scattered in the titles to rows,
columns, & footnotes in non-standard ways.

While humans can read such data sets, computers generally
cannot.


Data must be preprocessed by humans prior to electronic reading.
We developed and implemented (Python cloud environment)



a proposed standardization that merged these four components and
developed a structure that supports automated data exchange and
a new type of automated analytics for clustering.
IMAGINE:

If every number came with its units, uncertainty, and exact
meaning all attached as a “metanumber” object:



Instantly readable by both humans and computers,
And there was a simple unique name (IP path), for every
number


a universal standard for all numerical data –
Which could be used as a name in expressions where
Dimensional & error analysis is automatically done

Supporting automated data exchange, Big Data & AI.
AND ALSO:


An algorithm continuously scraping all numerical data from
the web and converting it to standardized MetaNumber tables
and then
With a novel process each table is converted to two networks



From which the dominant clusters are extracted and ordered
 Using a novel agnostic cluster identification algorithm
Then with still another algorithm these clusters are linked


One among entities (rows) and one among properties (columns)
into a single supernet spanning all numeric information.
A single network spanning our entire numerical universe
!
METANUMBER - POWERFUL YET EASY





> 6.3*ft + 4.83*m -37*inch Mix as needed
• >>5.810+/-0.032*m
Result is metric by default
> 6.3*ft + 4.83*m -37*inch ! ft Gives result in ft
• >> 19.06+/-0.11*(ft) Uncertainty from sig. dig.
> 43*yard*72e6*inch !acre No decimal implies exact
• >> 17768.523967*(acre)
An ‘exact’ result
> 5.3*[e_gold_density] Use [table_row_column]
• >> (1.022+/-0.019)e+05*m_3*kg Input standard
data
> 18*[my_482] Inputs the result of users line 482
SOME HARDER PROBLEMS

Compute the gravitational attraction between a 188
lb man and a 632.3 kg golf cart that is 7.93 yards
away:
> g*188*lb*632.3*kg/(7.93*yard)**2 g = G
>> (6.844+/-0.017)e-08*m*kg*s_2 note the uncertainty

If a BMW can accelerate from 0 to 70 mi/hr in 1.3
sec. then how much acceleration is this in “ g’s ” :
> (70*mile/hour)/(1.3*s) ! ag where ag is the acc. of gravity
>> 2.45+/-0.19*(ag)
UNIT AND CONSTANT NAMES

Unit and constant names are:
Lower case, alphanumeric (with internal ‘_’).
• NO Fonts, symbols, upper cases, & plurals.
• All prefixes are separate words (mega, billion,..)
•
•
All of the above are joined as mathematical
variables to the value with * and / operations.
3.51*mega*kg/m3
• 8.5e5*thousand*dozen*m
•
OF:
SIX BASIC SI UNITS & SIX
OPTIONAL UNITS
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
m = meter
s = sec = second
kg = kilogram
a = amp = ampere
k = kelvin
cd = candela
b = bit
op = flop
p = person
d = usd = dollar
bn
ln
length
time
mass
electrical current
temperature
luminosity
information (Shannon, 1/0, T/F)
double precision floating pt op
living human (as in per capita)
value
baryon number
lepton number
NUMERICAL ACCURACY (UNCERTAINTY)


If a value contains a decimal point then it is
converted to an ‘uncertain number’ (ufloat):
> 2.34 * 7.632e2 >> 1785.9+/-7.6
If there is no decimal (or is removed using
scientific notation) then it is treated as exact:
> 234e-2 * 7632e-1
>> 1785.888
METADATA & MEANING-1
Information is accessed on the default server as:
• [table_row_column…]

•
[e_gold_density]
•
[mass_higgs]
Data on any other server is denoted as:


[IP Path to server _ dir __ table_row_column…]
These unique names can be used to recall
the value for calculations.
FEATURES 1:

Automatically converts all units and performs all dimensional analysis

Metric (SI) units are the default for results but output in any valid units is optional

Computes all numerical accuracies from the number of significant digits

Documents the meaning of all values with associated metadata and tags

Unlimited metadata can be attached to a value without any operating overhead

Archives all values with a unique name for every single number given by its internet path

All past results are archived permanently for users reuse

The unique name can be used in all processing including computations

Each retrieved MN can be easily read by both humans and computers.

Web based numerical data is continuously scanned and standardized.
FEATURES 2:
It manages fully automated data exchange with API calls from the customers software
Supports both Big Data and new levels of artificial intelligence (AI) such as
Conversion of each standardized table into a mathematical network with our analytics supporting:
A novel cluster analysis identifying hidden information in correlated sets of nodes from MN tables.
The dominant clusters can be linked into a supernet of all numerical information.
The six basic metric units are extended with six new information & socioeconomic processing units.
The MN system can be effective within a corporation or agency providing a competitive advantage
The revolutionary analytics can provide multiple new insights into hidden information structures
The MetaNumber design and development is complete and operational
MN revolutionizes the standardization, processing, linking, and analysis of all numeric information.
Our system is unique in the standardization and computational execution of these attributes.
MN TABLES -> NETWORKS

A MN table Tij representing things (rows) with
properties (columns) suggests that some ‘things’ are
more alike.
Rewrite Tij as ratios to remove dimensionality.
 Construct a network C’jk = exp-(Si Tij - Tik )2
 Define the diagonal Cjj = - Si≠j C’ij


We have proved that Cij is in the Lie algebra that
generates all continuous Markov transformations:


M(a) = exp(a C) is a Markov transformation.
The eigenvectors of M identify the clusters in C

And can be labeled with the eigenvalue corresponding to
each cluster
THE NUMERICAL UNIVERSE EXPRESSED IN A
SINGLE SUPER-NETWORK



Clusters abstract the essential information in
each table.
These clusters can be linked to other clusters
using the eigenvalue weights of table indices
(nodes).
This results in a single network that joins all
numerical information in …
THANK YOU

Joseph E. Johnson, PhD

Distinguished Professor Emeritus
Department of Physics and Astronomy
University of South Carolina, Columbia SC, 29208
Room 405 Physical Sciences Center

Email: jjohnson@sc.edu

www.asg.sc.edu and
www.metanumber.com




Download