NeuralNetworkChoice - Software Engineering @ RIT

Team Braintrust 2/28/2011 Neural Network Rationale Objective: As defined in the Software Requirements Specifications document, an algorithm that the team chooses to employ for this project must provide the following functionality:  Locate an End-User Device (R.1) The Braintrust system will pass visible Wi-Fi device data to the algorithm, and the algorithm must return which location the end-user device in question is at.  Healing and Self Learning (R.2) New data points are continuously added to the system's database during operation, and physical Wi-Fi devices may be added or removed from the system's operational scope. The algorithm will ideally be able to adapt, heal, and learn from these changes in data and visible devices. The algorithm must also be able to meet the following performance requirements:  Device location to within 5-10 meters of accuracy is desired (PR.1) The algorithm should be able to use data available to the system to determine a location to these levels of desired accuracy.  Accuracy of end-user device location will take priority over computation time required by the algorithm (PR.2) The algorithm should ideally have a way to measure accuracy or error in order to be used to fulfill this requirement. This document will first explain exactly what neural networks are, and will then investigate into how one can be used as an effective algorithm for fulfilling each requirement addressed above. Neural Networks – In Depth Look A 'Neural Network', technically 'Artificial Neural Network', is a mathematical model that roughly models how neurons and synapses interact to solve problems within a biological neural network (a brain). Through a series of supervised learning exercises, a network, through the use of regression (function approximation), “learns” the underlying formulas and relations which describe a mathematical system, and adjusts itself to be able to accurately model those functions. A network is comprised of neuron and synapse objects which form an interconnected assembly where neurons are linked to other neurons through synapses, each connection requiring one synapse. The functionality of the machine itself is stored within the strengths, or weights, that are associated with each synapse, which controls how strongly the output from one neuron affects the other neurons it is connected to. One of the most common and useful neural network models is called a 'Multilayer Perceptron.' In a multilayer perceptron, neurons are arranged in a series of layers, where each neuron has inputs from neurons in the layer below it, and each neuron outputs to neurons in the layer above it. Each input or output operation is done by passing a signal through a synapse connection to the appropriate neuron. Input is presented to the system at the lower most layer, and the output from the highest layer is considered the output from the system. Figure 1. A Multilayer Perceptron This diagram shows the key components of a multilayer perceptron. In this example, the network has three neuron layers: An input layer, a hidden layer, and an output layer. Each layer is comprised of three neurons. Each neuron is connected, through a synapse, to every neuron in the next higher layer, and has inputs coming into it from each neuron in the preceding layer. The number of neurons in each layer can differ from the number in other layers, and a neuron can be connected to any or all of the neurons in the layer below it, and above it. The network in this example is a “fully connected” network, as each neuron is connected to every neuron in the layer below it and every neuron in the layer above it; this is a common setup. A network can have zero, one, or any number of hidden layers. Hidden layers add additional regressive abilities to a network. How many hidden layers to use, or even how many neurons to use in each layer, is not an easily computable value, and it is often up to trial-and-error to find the best network structure to use for the problem the network needs to solve. This problem will be addressed later. Input is presented to the system at the input layer. Input neurons take an input signal, usually scaled to between the values of 0 and 1. If the input to a neuron is strong enough, the neuron will “fire” and send its own signal to the neurons its outputs are connected to. If the input is not strong enough, the neuron will not fire, or will fire weakly. How strongly the signal is received at the next neuron down the line is determined by how strongly the neuron fired, and by how much the signal was either strengthened or weakened by the weight associated with the synapse that connects the two neurons. This modification is multiplicative: (Input Strength Received = Output Strength Sent * Weight of Connection). At the end of the system is a layer of output neurons. Depending on the strength of the signals coming into them, the output neurons fire their own outputs which are read as the output, or “answer”, from the network as a whole. If the desired output to a particular input is known, that desired output can be compared with the actual output from the system, and through a series of error reduction algorithms, the weights associated with synapses in the network can be adjusted so that at the next presentation of that specific input, the network will have less error in its outputted answer. This is called supervised learning. A new network is typically subjected to many thousands of input patterns and adjusted to have the least amount of error, on average, over the entire training set that was presented to it. An Example: Consider a three layer network (Figure 1), where the input and hidden layers have two neurons, and the output layer has one neuron. The goal of this exercise is to teach the network the 'AND' operation. One would first create a training set of inputs that can be presented and the output that is desired for each of those inputs. In this case, the training set would look like: Input Desired Output 0, 0 0 0, 1 0 1, 0 0 1, 1 1 This training set would be presented to the network, and an average error over the training set calculated for each presentation, or iteration, until the level of error was deemed acceptable. In order to reduce error, after each specific input, or pattern, is presented to the network, the network's weights are adjusted appropriately. The technical details behind weight adjustment are out of the scope of this document and encompass the bulk of the functionality of a neural network. After each pattern is presented, the difference between actual and desired output can be measured in terms of error. The error over an entire iteration of a training set can be computed, and the form this error is commonly reported as is “Root Mean Square Error” (RMSE). Neural Networks – Uses and Benefits Neural networks excel at and are commonly used to solve the following problems:  Classification Tasks “[Neural Networks] are used in problems that can be couched in terms of classification, or forecasting” (Gurney 5). A classification task is taking a given set of input values, and classifying that input into a specific category. Forecasting is predicting which output should be associated with a given input. The Braintrust system needs to classify given input Wi-Fi data to a specific location, and to be able to forecast what location devices are at. (Satisfies Requirement R.1) (Christopher: Classification and Regresion)  Modeling complex relations Complex relations can be difficult to model with mathematical formulas, if the formulas can even be arrived at in the first place. Neural Networks can not only model arbitrarily complicated formulas, but can arrive at those formulas through its training processes, without any knowledge of the formulas needed by the trainer. The exact relations between Wi-Fi input signals and the location an end-user device is at are most likely very complicated relations. (Satisfies Requirement R.1) “Provided we have a sufficiently large number of [hidden layers], we can approximate any reasonable function to arbitrary accuracy.” (Christopher: Model complexity)  Modeling non-linear statistical data Neural networks can approximate functions of any order dimension. No matter how complicated the data is that it is modeling, as long as the data can be linearly separable over any number of dimensions, the network just requires enough hidden layers to model those dimensions. The relations a network needs to analyze for this project are undoubtedly spanning many dimensions. Whether or not the data is linearly separable is a potential issue that will be addressed later. (Satisfies Requirement R.1) “The role of neural networks […] is to provide general parametrized non-linear mappings between a set of input variables and a set of output variables.” (Christopher: Multivariate non-linear functions)  Tasks that require an algorithm that can 'generalize' Generalizing is the act of making intelligent “guesses” about the correct output to a specific input, from just what has been learned in the past, even if it has never seen the input before. Neural networks are very good at this. If an end-user device is at a physical location that very little data has been collected for, the network can still generalize and come up with a very reasonable answer for where the device is. (Satisfies Requirement R.2) (Christopher: Learning and Generalization)  Tasks that require an algorithm that is flexible and can change and adapt Neural networks are able to “learn”. Not only can they learn, they can keep learning as often as needed. Since there is no specific mathematical algorithm that is being relied on, a network can even be completely retaught if it isn't performing as needed. Networks can constantly be being trained in the background to always have a network that fits the available data with as little error as possible. Networks can adapt to changes in the data by learning the new relations inherent in the data. (Satisfies Requirement R.2) (Christopher: Learning and Generalization), (Gurney: Multilayer Nets and Backpropagation) Neural Networks with regards to Team Braintrust There are a number of ways in which neural networks were a logical choice for the team to use as an algorithm (besides being an excellent solution for the problem).  Team Experience Two of the members of the team, Jeff and Joe, have prior experience using neural networks. Joe has done extensive implementation of neural networks for past projects. This means that the team has a more reliable idea of how applicable the algorithm can be, and the algorithm can be implemented more rapidly than it could be for those that had no prior experience with neural networks.  Language Compatibility The team has chosen C# as the programming language for the project. A somewhat lowlevel and performance-oriented language such as this needs to be used for implementing a neural network algorithm. Training sessions can involve millions of computations as thousands of patterns are constantly presented to the network, which can be a very intensive activity. A language with enough power, such as C, C++, C#, or Java, must be used.  Resources For any additional resources the team needs, not only is the RIT library full of information, but one team member also owns a number of books on neural networks, which can provide the formulas and general help needed when designing, implementing, and testing neural networks. Proposed Neural Network Structure There are three main areas that need to be addressed when designing a neural network to solve a particular problem: the inputs to the network, the output from the network, and the general structure of the network itself. Network Inputs: What needs to be decided for the network's inputs is what data will be presented to the network and how that data will be formatted. One idea that hasn't been touched upon yet is the idea of feature extraction. A user with prior knowledge of a problem domain can pre-process some of the raw input data and combine it into a new, useful, value, that can also be presented to the network as input. For a network that was to recognize images, for example, along with the pixel data for the image, some other inputs presented could be the total grayscale value for the image, average pixel density of the image, and other such values. The use of relevant features such as these can provide the network with additional relations to learn from. For the team's neural network, the least amount of input that can be presented should be which Wi-Fi devices the end-user device sees. This in itself proves to be a challenging task to accomplish since a neural network has a fixed number of input neurons, and the number of Wi-Fi access points an end-user device can see at any given time varies. There are two possible approaches to solve this dilemma: 1. Run all Wi-Fi devices seen through some hashing algorithm which returns a set of data of a fixed length 2. Have input neurons for each device that it is possible to see, and just pass in zero data to input neurons that represent devices that aren't currently seen by the end-user device. There are advantages and disadvantages to each solution. With the first solution, a single network can be used to describe our system no matter what Wi-Fi devices are seen or not seen, satisfying R.2. The second solution would not be able to cope when new Wi-Fi devices are seen that the network doesn't have an input neuron for, seemingly at odds with R.2. With the first solution, however, there is no simple hashing function that the author can think of that still retains the relations inherent in the physical system that we want a neural network to learn from, while the second solution always presents the exact Wi-Fi data the end-user device can see to the network at all times (except for devices we have no input nodes for). The solution which is proposed as a compromise between the two is to use a neural network with an input neuron for each Wi-Fi device that we're aware of, and silently ignore any devices that are seen that we have no input nodes for, until eventually a new network can be swapped in that's trained over the new, more up to date, list of Wi-Fi devices. This would involve a subsystem to be created that can generate multiple neural networks, each of different architectures, one resembling that of the current, in use, neural network, and one that uses the new architecture defined by our new list of Wi-Fi devices. It would train both, evaluate the average error/accuracy of both, and depending on what criteria are set, will swap in the new network in real time to be used by the system from that point on. This solution would completely satisfy the project's self-learning and healing requirements, since new networks that are trained over the most recent system data can be swapped in whenever the current network isn't performing up to the defined accuracy standard any longer. In addition, there are some features that can be defined that can also potentially be passed in as input. It is up to testing to determine how useful these features truly are. Some such features are:   Average signal strength Some sort of unique signature that represents a ratio between all of the signals strengths seen Sample Input Scenario: Suppose that the list of all WI-Fi access points that the network knows about is: {A, B, C, D}, and that we have one extra feature that we want to use as input. The network would then have five input neurons; one for each access point, and one for the feature. Suppose that an end-user device saw the following signals at a given time: Device Strength (RSSI) A 50 B 80 C 20 For presentation to the network, each RSSI value would have to be scaled to a value between 0 and 1 (Note: Not all wireless card chipsets use the same integer ranges for RSSI values!). Each resulting, or processed value would then be passed as input to the neuron that represents that device. The neuron for device D, that wasn't seen, would be passed a value of 0, effectively preventing it from firing. The neuron for the feature would also be passed a processed value in the range of [0,1] that was computed from the given Wi-Fi data. Network Outputs: The solution that the network needs to provide is the location that the end-user device is in. A natural way to map a network's output to a location is to have one output neuron for each location a device could possibly be in. The output neuron with the highest/strongest output value would be considered the “answer” that the network decided on, the location that it is most confident that the end-user device is in. Such a setup also provides us another way to track error/confidence (PR.2). If the network was 100% sure where an end-user device is, then the output neuron corresponding to that location would have an output value of 1.0, the maximum possible, while all others would have an output value of 0.0, the minimum possible. This would be the ideal case. We can measure confidence in the answer two different ways, each of which would have to be tested to find which is a more accurate descriptor of confidence:  Simply ignore all the other outputs other than the strongest, and assume the confidence in that output is directly related to its strength. If it had an output of .8, we could say the network is (.8 * 100 ) = 80% sure the end-user device is in that location.  Using some algorithm yet to be decided on, rank the strongest output while taking it in relation to the strengths of other outputs. For example, a strongest output of .8 doesn't seem very confident if the outputs of all other neurons are .7, but it seems very confident if the outputs of all other neurons are .05. The first solution is a very simple solution that can be implemented at first for testing purposes, but an implementation of the second solution would need to be completed at some point for a more accurate measure of output confidence. Using one output neuron for every possible location would be a very useful setup if the number of possible locations is a fixed number and known beforehand. This would be the case when a building has been scanned, for example, and every room, area, or “location” has been scanned and entered into the database. The output structure of the neural network generated would never have to change in format or in size. However, there has been some talk about special situations when a user is outdoors, and the GPS location that they're at can in itself become a specific location. This functionality needs to be reviewed again. If the set of possible locations can change, then, like the input neuron structure, the network would have to be retrained when there's new locations that have been entered into the system. This would be a very rare occurrence in most situations that the the end system would be deployed for. General Neural Network Architecture: As stated before, figuring out the best general architecture of a neural network to solve a particular problem is mostly a trail-and-error approach. That being said, there's very standard variations that perform well in most circumstances, and will be tried first. The most common architecture being an either three or four layer fully connected multilayer perceptron, with either one or two hidden layers, and between three and ten neurons in each hidden layer. Such a setup will be the first to be tested against Wi-Fi data that has been collected. The subsystem that was mentioned before that can be used to train new networks should have accuracy and error measuring tools built in, so that it will be easy to generate new, slightly different, architectures, and compare their ability to complete the problem against other architectures. Using this subsystem, the team can generate networks of any arbitrary architecture, evaluate them against others, and decide on the best architecture for the system as a whole. If there is time, the team could even implement an evolutionary algorithm that would spawn new neural networks to be trained by the subsystem, and then depending on which performed the best over the dataset, continue to evolve them until the best fit for our particular problem is evolved. This can be implemented after the rest of the system is implemented, so if there is implementation time available this could be an additional system that is tacked on to improve the networks generated. With the input and output neuron architecture defined in the preceding sections, the only thing left to be defined is just the number of hidden layers and the number of neurons in each hidden layer. If an evolutionary algorithm system was implemented at some point, it could be allowed to experiment with non fully-interconnected networks, but in general, fully-connected networks seem to perform the best over most situations, and will be what's implemented and tried by the team. Team Data Trial Runs At this point, the data the team had collected was going to be run through some existing, free, neural network implementations, and the results analyzed. However, the data the team has collected so far has been exported to the wrong file format and is rendered unusable. It is a high priority task now for the team to recollect sample data, which will then be analyzed, and this section updated. This section will also (hopefully) demonstrate the potential linear separability of Wi-Fi data, which is the only potential problem from the preceding sections that hasn't been addressed yet. Conclusion To be completed after experimental data has been analyzed. References Bishop, Christopher M. Neural Networks for Pattern Recognition. Oxford: Clarendon, 1995. Print. Gurney, Kevin. An Introduction to Neural Networks. London: UCL, 1997. Print.

NeuralNetworkChoice - Software Engineering @ RIT

Related documents

Products

Support

NeuralNetworkChoice - Software Engineering @ RIT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib