NeuralNetworkChoice - Software Engineering @ RIT

advertisement
Team Braintrust
2/28/2011
Neural Network Rationale
Objective:
As defined in the Software Requirements Specifications document, an algorithm that the team chooses
to employ for this project must provide the following functionality:

Locate an End-User Device
(R.1)
The Braintrust system will pass visible Wi-Fi device data to the algorithm, and the
algorithm must return which location the end-user device in question is at.

Healing and Self Learning
(R.2)
New data points are continuously added to the system's database during operation, and
physical Wi-Fi devices may be added or removed from the system's operational scope.
The algorithm will ideally be able to adapt, heal, and learn from these changes in data
and visible devices.
The algorithm must also be able to meet the following performance requirements:

Device location to within 5-10 meters of accuracy is desired
(PR.1)
The algorithm should be able to use data available to the system to determine a location
to these levels of desired accuracy.

Accuracy of end-user device location will take priority over computation time required by
the algorithm
(PR.2)
The algorithm should ideally have a way to measure accuracy or error in order to be
used to fulfill this requirement.
This document will first explain exactly what neural networks are, and will then investigate into how
one can be used as an effective algorithm for fulfilling each requirement addressed above.
Neural Networks – In Depth Look
A 'Neural Network', technically 'Artificial Neural Network', is a mathematical model that roughly
models how neurons and synapses interact to solve problems within a biological neural network (a
brain). Through a series of supervised learning exercises, a network, through the use of regression
(function approximation), “learns” the underlying formulas and relations which describe a
mathematical system, and adjusts itself to be able to accurately model those functions.
A network is comprised of neuron and synapse objects which form an interconnected assembly where
neurons are linked to other neurons through synapses, each connection requiring one synapse. The
functionality of the machine itself is stored within the strengths, or weights, that are associated with
each synapse, which controls how strongly the output from one neuron affects the other neurons it is
connected to.
One of the most common and useful neural network models is called a 'Multilayer Perceptron.' In a
multilayer perceptron, neurons are arranged in a series of layers, where each neuron has inputs from
neurons in the layer below it, and each neuron outputs to neurons in the layer above it. Each input or
output operation is done by passing a signal through a synapse connection to the appropriate neuron.
Input is presented to the system at the lower most layer, and the output from the highest layer is
considered the output from the system.
Figure 1. A Multilayer Perceptron
This diagram shows the key components of a multilayer perceptron. In this example, the network has
three neuron layers: An input layer, a hidden layer, and an output layer. Each layer is comprised of three
neurons. Each neuron is connected, through a synapse, to every neuron in the next higher layer, and has
inputs coming into it from each neuron in the preceding layer. The number of neurons in each layer can
differ from the number in other layers, and a neuron can be connected to any or all of the neurons in the
layer below it, and above it. The network in this example is a “fully connected” network, as each
neuron is connected to every neuron in the layer below it and every neuron in the layer above it; this is
a common setup.
A network can have zero, one, or any number of hidden layers. Hidden layers add additional regressive
abilities to a network. How many hidden layers to use, or even how many neurons to use in each layer,
is not an easily computable value, and it is often up to trial-and-error to find the best network structure
to use for the problem the network needs to solve. This problem will be addressed later.
Input is presented to the system at the input layer. Input neurons take an input signal, usually scaled to
between the values of 0 and 1. If the input to a neuron is strong enough, the neuron will “fire” and send
its own signal to the neurons its outputs are connected to. If the input is not strong enough, the neuron
will not fire, or will fire weakly. How strongly the signal is received at the next neuron down the line is
determined by how strongly the neuron fired, and by how much the signal was either strengthened or
weakened by the weight associated with the synapse that connects the two neurons. This modification
is multiplicative: (Input Strength Received = Output Strength Sent * Weight of Connection).
At the end of the system is a layer of output neurons. Depending on the strength of the signals coming
into them, the output neurons fire their own outputs which are read as the output, or “answer”, from the
network as a whole.
If the desired output to a particular input is known, that desired output can be compared with the actual
output from the system, and through a series of error reduction algorithms, the weights associated with
synapses in the network can be adjusted so that at the next presentation of that specific input, the
network will have less error in its outputted answer. This is called supervised learning. A new network
is typically subjected to many thousands of input patterns and adjusted to have the least amount of
error, on average, over the entire training set that was presented to it.
An Example:
Consider a three layer network (Figure 1), where the input and hidden layers have two neurons, and the
output layer has one neuron. The goal of this exercise is to teach the network the 'AND' operation. One
would first create a training set of inputs that can be presented and the output that is desired for each of
those inputs. In this case, the training set would look like:
Input
Desired Output
0, 0
0
0, 1
0
1, 0
0
1, 1
1
This training set would be presented to the network, and an average error over the training set
calculated for each presentation, or iteration, until the level of error was deemed acceptable. In order to
reduce error, after each specific input, or pattern, is presented to the network, the network's weights are
adjusted appropriately. The technical details behind weight adjustment are out of the scope of this
document and encompass the bulk of the functionality of a neural network.
After each pattern is presented, the difference between actual and desired output can be measured in
terms of error. The error over an entire iteration of a training set can be computed, and the form this
error is commonly reported as is “Root Mean Square Error” (RMSE).
Neural Networks – Uses and Benefits
Neural networks excel at and are commonly used to solve the following problems:

Classification Tasks
“[Neural Networks] are used in problems that can be couched in terms of classification,
or forecasting” (Gurney 5). A classification task is taking a given set of input values,
and classifying that input into a specific category. Forecasting is predicting which output
should be associated with a given input. The Braintrust system needs to classify given
input Wi-Fi data to a specific location, and to be able to forecast what location devices
are at.
(Satisfies Requirement R.1)
(Christopher: Classification and Regresion)

Modeling complex relations
Complex relations can be difficult to model with mathematical formulas, if the formulas
can even be arrived at in the first place. Neural Networks can not only model arbitrarily
complicated formulas, but can arrive at those formulas through its training processes,
without any knowledge of the formulas needed by the trainer. The exact relations
between Wi-Fi input signals and the location an end-user device is at are most likely
very complicated relations. (Satisfies Requirement R.1)
“Provided we have a sufficiently large number of [hidden layers], we can approximate
any reasonable function to arbitrary accuracy.”
(Christopher: Model complexity)

Modeling non-linear statistical data
Neural networks can approximate functions of any order dimension. No matter how
complicated the data is that it is modeling, as long as the data can be linearly separable
over any number of dimensions, the network just requires enough hidden layers to
model those dimensions. The relations a network needs to analyze for this project are
undoubtedly spanning many dimensions. Whether or not the data is linearly separable is
a potential issue that will be addressed later. (Satisfies Requirement R.1)
“The role of neural networks […] is to provide general parametrized non-linear
mappings between a set of input variables and a set of output variables.”
(Christopher: Multivariate non-linear functions)

Tasks that require an algorithm that can 'generalize'
Generalizing is the act of making intelligent “guesses” about the correct output to a
specific input, from just what has been learned in the past, even if it has never seen the
input before. Neural networks are very good at this. If an end-user device is at a physical
location that very little data has been collected for, the network can still generalize and
come up with a very reasonable answer for where the device is.
(Satisfies Requirement R.2)
(Christopher: Learning and Generalization)

Tasks that require an algorithm that is flexible and can change and adapt
Neural networks are able to “learn”. Not only can they learn, they can keep learning as
often as needed. Since there is no specific mathematical algorithm that is being relied
on, a network can even be completely retaught if it isn't performing as needed. Networks
can constantly be being trained in the background to always have a network that fits the
available data with as little error as possible. Networks can adapt to changes in the data
by learning the new relations inherent in the data. (Satisfies Requirement R.2)
(Christopher: Learning and Generalization), (Gurney: Multilayer Nets and Backpropagation)
Neural Networks with regards to Team Braintrust
There are a number of ways in which neural networks were a logical choice for the team to use as an
algorithm (besides being an excellent solution for the problem).

Team Experience
Two of the members of the team, Jeff and Joe, have prior experience using neural
networks. Joe has done extensive implementation of neural networks for past projects.
This means that the team has a more reliable idea of how applicable the algorithm can
be, and the algorithm can be implemented more rapidly than it could be for those that
had no prior experience with neural networks.

Language Compatibility
The team has chosen C# as the programming language for the project. A somewhat lowlevel and performance-oriented language such as this needs to be used for implementing
a neural network algorithm. Training sessions can involve millions of computations as
thousands of patterns are constantly presented to the network, which can be a very
intensive activity. A language with enough power, such as C, C++, C#, or Java, must be
used.

Resources
For any additional resources the team needs, not only is the RIT library full of
information, but one team member also owns a number of books on neural networks,
which can provide the formulas and general help needed when designing, implementing,
and testing neural networks.
Proposed Neural Network Structure
There are three main areas that need to be addressed when designing a neural network to solve a
particular problem: the inputs to the network, the output from the network, and the general structure of
the network itself.
Network Inputs:
What needs to be decided for the network's inputs is what data will be presented to the network and
how that data will be formatted.
One idea that hasn't been touched upon yet is the idea of feature extraction. A user with prior
knowledge of a problem domain can pre-process some of the raw input data and combine it into a new,
useful, value, that can also be presented to the network as input. For a network that was to recognize
images, for example, along with the pixel data for the image, some other inputs presented could be the
total grayscale value for the image, average pixel density of the image, and other such values. The use
of relevant features such as these can provide the network with additional relations to learn from.
For the team's neural network, the least amount of input that can be presented should be which Wi-Fi
devices the end-user device sees. This in itself proves to be a challenging task to accomplish since a
neural network has a fixed number of input neurons, and the number of Wi-Fi access points an end-user
device can see at any given time varies. There are two possible approaches to solve this dilemma:
1. Run all Wi-Fi devices seen through some hashing algorithm which returns a set of data of a
fixed length
2. Have input neurons for each device that it is possible to see, and just pass in zero data to
input neurons that represent devices that aren't currently seen by the end-user device.
There are advantages and disadvantages to each solution.
With the first solution, a single network can be used to describe our system no matter what Wi-Fi
devices are seen or not seen, satisfying R.2. The second solution would not be able to cope when new
Wi-Fi devices are seen that the network doesn't have an input neuron for, seemingly at odds with R.2.
With the first solution, however, there is no simple hashing function that the author can think of that
still retains the relations inherent in the physical system that we want a neural network to learn from,
while the second solution always presents the exact Wi-Fi data the end-user device can see to the
network at all times (except for devices we have no input nodes for).
The solution which is proposed as a compromise between the two is to use a neural network with an
input neuron for each Wi-Fi device that we're aware of, and silently ignore any devices that are seen
that we have no input nodes for, until eventually a new network can be swapped in that's trained over
the new, more up to date, list of Wi-Fi devices. This would involve a subsystem to be created that can
generate multiple neural networks, each of different architectures, one resembling that of the current, in
use, neural network, and one that uses the new architecture defined by our new list of Wi-Fi devices. It
would train both, evaluate the average error/accuracy of both, and depending on what criteria are set,
will swap in the new network in real time to be used by the system from that point on.
This solution would completely satisfy the project's self-learning and healing requirements, since new
networks that are trained over the most recent system data can be swapped in whenever the current
network isn't performing up to the defined accuracy standard any longer.
In addition, there are some features that can be defined that can also potentially be passed in as input. It
is up to testing to determine how useful these features truly are. Some such features are:


Average signal strength
Some sort of unique signature that represents a ratio between all of the signals strengths seen
Sample Input Scenario:
Suppose that the list of all WI-Fi access points that the network knows about is: {A, B, C, D},
and that we have one extra feature that we want to use as input. The network would then have
five input neurons; one for each access point, and one for the feature.
Suppose that an end-user device saw the following signals at a given time:
Device
Strength (RSSI)
A
50
B
80
C
20
For presentation to the network, each RSSI value would have to be scaled to a value between 0 and 1
(Note: Not all wireless card chipsets use the same integer ranges for RSSI values!). Each resulting, or
processed value would then be passed as input to the neuron that represents that device. The neuron for
device D, that wasn't seen, would be passed a value of 0, effectively preventing it from firing. The
neuron for the feature would also be passed a processed value in the range of [0,1] that was computed
from the given Wi-Fi data.
Network Outputs:
The solution that the network needs to provide is the location that the end-user device is in. A natural
way to map a network's output to a location is to have one output neuron for each location a device
could possibly be in. The output neuron with the highest/strongest output value would be considered
the “answer” that the network decided on, the location that it is most confident that the end-user device
is in.
Such a setup also provides us another way to track error/confidence (PR.2). If the network was 100%
sure where an end-user device is, then the output neuron corresponding to that location would have an
output value of 1.0, the maximum possible, while all others would have an output value of 0.0, the
minimum possible. This would be the ideal case. We can measure confidence in the answer two
different ways, each of which would have to be tested to find which is a more accurate descriptor of
confidence:

Simply ignore all the other outputs other than the strongest, and assume the confidence in that
output is directly related to its strength. If it had an output of .8, we could say the network is (.8
* 100 ) = 80% sure the end-user device is in that location.

Using some algorithm yet to be decided on, rank the strongest output while taking it in relation
to the strengths of other outputs. For example, a strongest output of .8 doesn't seem very
confident if the outputs of all other neurons are .7, but it seems very confident if the outputs of
all other neurons are .05.
The first solution is a very simple solution that can be implemented at first for testing purposes, but an
implementation of the second solution would need to be completed at some point for a more accurate
measure of output confidence.
Using one output neuron for every possible location would be a very useful setup if the number of
possible locations is a fixed number and known beforehand. This would be the case when a building
has been scanned, for example, and every room, area, or “location” has been scanned and entered into
the database. The output structure of the neural network generated would never have to change in
format or in size. However, there has been some talk about special situations when a user is outdoors,
and the GPS location that they're at can in itself become a specific location. This functionality needs to
be reviewed again. If the set of possible locations can change, then, like the input neuron structure, the
network would have to be retrained when there's new locations that have been entered into the system.
This would be a very rare occurrence in most situations that the the end system would be deployed for.
General Neural Network Architecture:
As stated before, figuring out the best general architecture of a neural network to solve a particular
problem is mostly a trail-and-error approach. That being said, there's very standard variations that
perform well in most circumstances, and will be tried first. The most common architecture being an
either three or four layer fully connected multilayer perceptron, with either one or two hidden layers,
and between three and ten neurons in each hidden layer. Such a setup will be the first to be tested
against Wi-Fi data that has been collected.
The subsystem that was mentioned before that can be used to train new networks should have accuracy
and error measuring tools built in, so that it will be easy to generate new, slightly different,
architectures, and compare their ability to complete the problem against other architectures. Using this
subsystem, the team can generate networks of any arbitrary architecture, evaluate them against others,
and decide on the best architecture for the system as a whole.
If there is time, the team could even implement an evolutionary algorithm that would spawn new neural
networks to be trained by the subsystem, and then depending on which performed the best over the
dataset, continue to evolve them until the best fit for our particular problem is evolved. This can be
implemented after the rest of the system is implemented, so if there is implementation time available
this could be an additional system that is tacked on to improve the networks generated.
With the input and output neuron architecture defined in the preceding sections, the only thing left to be
defined is just the number of hidden layers and the number of neurons in each hidden layer. If an
evolutionary algorithm system was implemented at some point, it could be allowed to experiment with
non fully-interconnected networks, but in general, fully-connected networks seem to perform the best
over most situations, and will be what's implemented and tried by the team.
Team Data Trial Runs
At this point, the data the team had collected was going to be run through some existing, free, neural
network implementations, and the results analyzed. However, the data the team has collected so far has
been exported to the wrong file format and is rendered unusable. It is a high priority task now for the
team to recollect sample data, which will then be analyzed, and this section updated.
This section will also (hopefully) demonstrate the potential linear separability of Wi-Fi data, which is
the only potential problem from the preceding sections that hasn't been addressed yet.
Conclusion
To be completed after experimental data has been analyzed.
References
Bishop, Christopher M. Neural Networks for Pattern Recognition. Oxford: Clarendon, 1995. Print.
Gurney, Kevin. An Introduction to Neural Networks. London: UCL, 1997. Print.
Download