EEGStore: Building a System for Large Scale Analysis of EEGs CS294 Final Project 16 December 2013 Orianna DeMasi Jordan Kellerstrass Department of Computer Science UC Berkeley odemasi@eecs.berkeley.edu Department of Computer Science UC Berkeley kellerstrass@berkeley.edu Abstract—Some neuropsychiatric disorders are developed in childhood and can affect individuals for their entire lives. Emerging research shows that EEG signals can be used for early diagnosis, which would enable early intervention and possibly alleviate the severity of afflictions. However, the computing backend to collect, process, and do machine learning on EEG data for a large scale study does not currently exist. Without such a system, medical research in this area cannot progress. In this paper we present EEGStore, our prototype of the data collection end of this system, OpenEmo, a novel device for data collection, and discuss the challenges in implementing the entire system in a scalable way. I. I NTRODUCTION Neuropsychiatric disorders developed in childhood can continue through an individuals life, severely affecting that individual’s happiness, productivity, and ability to support himself. Some of these conditions can be diagnosed early and research is showing the potential for early interventions to radically affect the prognosis of those individuals. For example, there is a high correlation between children who contract cerebral malaria to later be diagnosed with autism in sub-Saharan Africa [1]. If the development of autism could be better predicted, additional treatment of the malaria and early interventions could be taken to combat the severity of the autism. Additional research shows that the diagnosis and severity of autism may be predicted by certain EEG signal patterns [2]. This research is promising, but currently only shown on a small research population. It is not yet possible to put EEG monitoring into practice because of the lack of a mobile device to monitor patients and the compute infrastructure to analyze the data. Further studies using EEG data within populations or for other disorders are not yet possible due to the lack of computing infrastructure to handle incoming data on a large number of patients. Until an appropriate mobile device has been developed for collecting EEG data and sufficient infrastructure has been developed for handling and processing the data, the use of EEG for early diagnosis in community health clinics will not be possible. Further, additional research to study the extent to which EEG data can be used for diagnosis and treatment of a myriad of disorders will be stunted and confined to small scale case studies. We would like to develop a low power, low cost, mobile device that can be used in small clinics by community health workers to routinely collect EEG data on pediatric patients in Kenya. To support this hardware, we need a data system pipeline for collecting and storing data on individuals, cleaning the data, gathering data from various clinics into a population, and finally applying published techniques for mining the EEG signals and extracting diagnostic information. Eventually, a general system will need to be developed that will allow for the development of novel data mining techniques targeting a wider spectrum of conditions. This paper introduces a novel hardware device OpenEmo and a prototype of the general data system that OpenEmo will be part of. OpenEmo can collect EEG from a commodity EEG headset and transmit the data to a mobile phone or table. The prototype of the data system surrounding OpenEmo is called EEGStore and is a proof of concept for the scalability, cost, and general viability of a full system. This paper makes a case for the full system and shows that such a system is not only possible but able to satisfy the needs of the problem setting. The rest of this paper proceeds as follows. Section II describes the environment that EEGStore is being designed for and the resulting needs that must be addressed. Section III describes the target system that EEGStore will be modeled after. Section IV describes the prototype EEGStore system that we implemented as well as the OpenEmo device for data collection. Section V evaluates our prototype system and discusses its viability as a larger system in the environment for which it is intended. Section VI reviews previous research that this project is based on. Section VII and VIII discuss the lessons that we learned during this project and the future work that we see as the most pressing. Finally we conclude the paper in Section IX. II. P ROBLEM S ETTING This project addresses improving EEG data collection and processing for community health clinics in emerging regions. In many emerging regions, there are not enough doctors to Fig. 1. Diagram of data flow in ideal EEGStore system. Data is collected on mobile phones in a field environment and shared with local clinics and remote research institutions via a web service. Various levels of encryption protect what data gets transferred. meet the basic medical needs of the population, let alone specialists for more complex mental health issues. In Ghana for example, there are fewer than 10 psychiatrists and 2 neurologists for a population of 24 million people [3]. Most of these few specialists are located in the cities and must prioritize emergencies. Meanwhile patients with neuropsychological disorders of varying severity will likely go unseen. As such, care and monitoring are left to individuals and their families [3]. Early detection and intervention could save people from a lifetime of disability. To address the scarcity of medical professionals, many countries have adopted a highly effective model of community health care. Trusted community members are trained to be community health workers (CHWs). They visit people in their homes to provide basic care and connect them with a health clinic or hospital when necessary. Our goal is to build a mobile system that enables CHWs in resource-poor areas to monitor and diagnose neuropsychological disorders - such as autism and epilepsy - which may be identifiable via electroencephalography (EEG) recording even before behavioral signs are evident. local clinics. These villages could require a long drive for access over rough roads. Within villages, visits may occur outside of any permanent structure and will be attended by an entire family. Often medical equipment is too delicate to take into these outdoor and potentially sandy settings. In addition to no permanent enclosure, there is frequently no electricity, and thus these visits must be fully mobile in that they have self provided power. The typical work day varies dramatically for healthcare providers around the world, but the assumptions we made for the use case of EEGStore are inspired by the setting that we expect in Kenya. These assumptions are as follows: • • The EEG device should be robust, which we define as not easily broken during field work or travel. Our Android app should collect and process raw EEG data from any EEG device with a bluetooth dongle because these devices may be donated or built from open source hardware such as Open EEG. More specifically, we will target deploying a study of our system in Kenya, where this community healthcare model is in practice. Figure 2 gives an overview of the size of the CHW population in Kenya and the potential size of the population that our system must serve. In Kenya, CHWs will make home visits to patients, where they actually go to different homes to administer care or common checkups. Often these visits include visiting small villages and other locations that are not particularly near to Fig. 2. This table summarizes the size of the population in Kenya that we will initially target [4]. It gives an idea of the scale of data that will be collected. Eventually EEGStore will target an international population. • • Energy use should be minimal as there is no guarantee of electricity between patient home visits. Whether being used in a clinic or home visit, the added time needed to collect an EEG recording should be a maximum of a few minutes. We have tried to make these assumptions as general as possible so that EEGStore will be applicable to many regions outside of Kenya and able to scale to a larger population. In the next section we consider the the necessary aspects that a system must have to be amenable to the above described conditions. We forth a set of goals that an ideal system must have to suit this problem setting. III. S YSTEM D ESIGN The ideal system to address some of the needs described above would make EEG technology accessible to resource poor areas. To do this, the system would have to be low cost, energy efficient, mobile, and easily accessible for development and integration into other systems. To address these needs, we envision an end to end data collection and processing system. Figure 1 shows an overview data flow through the desired system. The system begins with any commodity wireless EEG headset. Data is collected from the headset via bluetooth onto a mobile phone or tablet. Data is sent from the mobile device over the internet to local health clinic servers and to remote research centers via a scalable web service. Sending data to multiple locations allows for various levels of processing, e.g. for different purposes, such as treatment versus research. It also enables data sharing for specialized cohort studies and for addressing different levels of data security that will be needed to share with different institutions. In addition to the system meeting the above needs, we would like it to be very modular. We believe that modularity will add to the robustness of the system by allowing people to utilize whatever resources they have access to. For example, we want our system to be independent of a given headset or phone. Wireless EEG headset devices are continually improving and becoming more easily accessible, even used to control some off-the-shelf video games [5]. With modularity properly implemented, our system can be made compatible with any headset and thus can evolve with the rapidly developing EEG technology that may become available in the near future. To summarize, our system goals are the following: Fig. 3. • • • • • • A scalable pipeline for collecting and processing EEG data for entire patient populations. A low cost implementation of the system end to end so that it is widely accessible. An easy interface so that the system is easy to use. The equipment associated with EEGStore must be robust to difficult conditions. All equipment must be self powered and have enough battery life to last an entire day of home visits. EEGStore must be compatible with an Android phone, as these are more prevalent than laptops. IV. I MPLEMENTATION As a prototype for a larger system, we developed EEGStore. EEGStore is a system that would make EEG data more available for medical treatment and research and opens the possibility for a myriad of new research opportunities. A diagram of the implemented system is shown in Figure 3. A. General framework One component of EEGStore is the device that collects the EEG signal from patients. This device is mobile, wireless, and can be taken into remote field locations for use on rural or hard to access patients. The next stage of EEGStore is to pass the data to the mobile phone or tablet. For this stage we developed a novel device OpenEmo, which is a universal EEG adapter and will be described in more depth below. From OpenEmo, the data is passed through a mobile phone (where some processing and encryption occurs), into the cloud. B. OpenEmo As a component of EEGstore, we developed the novel OpenEmo adapter. OpenEmo can be seen in Figure 5. As shown in Figure 3, OpenEmo is a critical part of the current EEGStore system. OpenEmo works as a translator between the EEG headset and the phone. OpenEmo is a physical device that reads data from bluetooth dongle supported EEG devices, decodes it from the headsets proprietary encryption, and sends it via bluetooth 3.0+HS to a mobile phone or tablet. The importance of OpenEmo is that many wireless headsets are forcibly paired with companion dongles, which need to be plugged into a computer. Laptops and other computers are not common in resource poor areas, so it would be difficult to Data flow diagram for prototype of EEGStore. Data is collected via OpenEmo onto a mobile phone. It is then transferred, via DropBox, to servers. Fig. 5. The OpenEmo Android app is easy to use. With minimal training, EEG scans can be collected. Fig. 4. OpenEmo is a universal adapter for connecting dongle supported EEG headsets with mobile phones and tablets use any of these dongle supported devices in field studies or community health clinics. We envision OpenEmo to be a universal adapter that connects any dongle supported headset to a phone or tablet. However, the current prototype of OpenEmo is compatible with the popular Emotiv EPOC headset [6] and Android mobile devices. We chose the EPOC as the first headset that OpenEmo is compatible with, as Emotiv is in contact with autism researchers and it seems there is the potential that they might donate headsets to community clinics or a research lab [7]. The headsets communicate over bluetooth to a dongle, which cannot be plugged into the phone due to hardware constraints. Our app is currently written for Android, as this platform is more prevalent in remote and developing regions, which our system targets. The OpenEmo adapter is prototyped with a RaspberryPi [8] and Bluetooth 3.0+HS. It has three LEDs to indicate its status. The RaspberryPi runs a Linux based system and the code base to decrypt the incoming EPOC signal, which is written in Python, uses previous work that looked at how to extract data from the Emotiv EPOC [9], [10]. folder and is low cost. It has also proven to be scalable and has many protocols to check if data has been fully and successfully transferred. Finally, DropBox was an extremely easy and quick way to prototype a reliable form of data transmission with HTTP. The way we utilized DropBox was to associate an account with each phone or tablet. This would correspond to each CHW having an individual account. Each research institution and clinic would also have a DropBox account. At the end of a day of home visits or in-clinic checkups, we envision that each CHW will upload their collected data to their DropBox account. As their folder will be shared only with the appropriate institutions, we can control the data flow. This allows local clinics to receive data on their patients and research institutions to receive data on all the patients across the clinics. Further, this only requires paid DropBox accounts for the research institutions, as we project that free accounts will be large enough for each CHW (see Section V for more details). D. Server prototype The server side of EEGStore is prototyped with a Macbook laptop running python. Preliminary functions are written to move data from DropBox into a file system and do minor processing of the data, such as visualizations. V. C. Network prototype While the data collection is transmitted over bluetooth, the data processing and sharing is done over HTTP. Our prototype of this section of the system is done using DropBox [11]. A very preliminary implementation was done with direct HTTP calls to an apache server running PHP on a laptop. While this worked, the connection was brittle and unable to recover from connections lost midway through transmission. After these preliminary tests, we chose to use DropBox as the HTTP section prototype due to its reputation for fault tolerance and error checking. DropBox has extremely low latency and is robust to multiple users accessing the servers at a time. It is also backed up on multiple remote servers, which decreases the chance for lost data from individual machine failure. Dropbox also does not limit the number of users who share a single E VALUATION There is no other system comparable to EEGStore. None other has the end to end capability that EEGStore has. As a result, it is not clear how to evaluate EEGStore relative to another system. We cannot say that EEGStore is doing better than another system, as there is none that targets the same situation that we target. In order to display benefit of EEGStore, instead we consider how well it meets the goals of the system that were set forth is Section III. The elements of the system that we evaluate are how well EEGStore does on • • • • energy consumption scalability time to collect samples system cost. bluetooth” is the time that it takes to send a 2 minute data file over bluetooth to the phone. This phase is less than a second because the file size is on the order of 800KB for the 2 minutes of data collected with the Emotiv EPOC, see Figure 2. Even with a more sophisticated headset, one with more channels and a higher sampling rate, the data file should remain under 8MB and still transmit in under a second. A takeaway from Figure 6 is that there is not a strong need to optimize energy use by replacing the Bluetooth 3.0+HS with Bluetooth Low Energy. Because the data files are relatively small, transmitting a data file from the OpenEmo adapter to the mobile phone is so short that is not relatively energy intensive. Fig. 6. Energy consumption of the various stages of OpenEmo sampling. Unsurprisingly, the most energy consumed is during the longest phase of energy consumption, data collection. In general the energy consumption is low, which indicates that OpenEmo could successfully be used in the field for prolonged periods of time. A. Energy Considerations In order for EEGStore to be acceptable for deployment in resource poor areas, the entire data collection portion of the system must be fully mobile and thus have a self contained energy source. Further, there is no guarantee of electricity for recharging between patient home visits or even on a daily basis. To last for prolonged periods away from energy sources, the mobile phone, OpenEmo adapter and headset must be energy efficient. In this section, we discuss the energy consumption of the Emotiv EPOC, the headset EEGStore is prototyped with, the mobile phone, and OpenEmo during data collection. We demonstrate that all three components have reasonable charging cycles and are suitable for our use case. 1) Energy Breakdown of OpenEmo Adapter: The OpenEmo adapter is prototyped with a RaspberryPi microcontroller, which is powered via a micro USB port. To investigate OpenEmo’s power consumption on a deeper level, we used the Smartronix USB power meter [12] to evaluate its consumption during each step of data collection and a timer to record the duration of each phase. We measure the consumption during each phase of sampling, as it was not possible for us to measure the energy drain of simultaneous activities separately. It is likely that the Bluetooth connection from the OpenEmo adapter to the Android app is using some energy in the background to simply check whether or not it needs to be sending or receiving something [13], but this is averaged into the measured approximate draw during each phase. The results of this analysis are summarized in Figure 6. Sampling data from the EEG device was expected and shown to be the largest energy cost. This is because it takes the most time and requires the most communication. The sampling time was set to be 2 minutes, as this the duration of a sample in an EEG Austim study [2]. The startup time for the OpenEmo was 59 seconds. It is possible that this time would be reduced in future generations of OpenEmo and that the sampling time would be increased for different studies. The decryption from the headset dongle occurs during the sampling phase, so that computational cost is included in that measurement. “Sending Our conclusion is that, overall, OpenEmo’s energy consumption is low. The consumption is low enough that OpenEmo is not the component of the end to end EEGStore system limiting energy robustness. This is explained further in the next section. 2) EEG Data Samples per Component per Charge: To calculate how many days our system will be able to run in the field on battery power, we assume that a community health worker may visit up to 15 patients per day and thus take 15 samples in 1 visit-day. This number corresponds to 3 families of 5 people each and is potentially an over estimate. We assume that the amount of time needed to collect a patient’s EEG data is 3 minutes, including a 2 minute recording (based on the time needed for a prior study [2]) and about a minute for startup and shutdown. The summary of calculations based on these assumptions is shown in Figure 7. The purpose of these calculations is simply to compare the number of samples each component will be able to collect before needing to be recharged. We analyze the battery life assuming zero idle time, and therefore have high error margins. Our measurements and basic calculations give us enough information to have confidence that the OpenEmo adapter and mobile EEG device are not limiting factors for implementing the system in an energy constrained environments. The OpenEmo adapter comes with a 4,400 mAh (5V at 1A) battery. We calculated the process of starting up the RaspberryPi, collecting EEG data from the headset, sending it in a file over bluetooth to the Android phone, and shutting down the RaspberryPi takes about 50 mAh (1A is drawn from the battery for about 3 minutes). Therefore, under perfectly efficient conditions, 88 samples, or 5.89 visit-days of 15 patients per day, could be taken before the OpenEmo needs to be recharged. The Emotiv EPOC has a built-in Lithium Ion battery that Emotiv claims will run for 12 hours of continuous use [6]. Twelve hours divided by 3 minutes is 240 samples before needing to be recharged, again assuming near perfect efficiency, which means the device is not left on idle for long periods of time. The Android devices we analyzed were the Nexus 4 and Nexus 7. We chose these devices as they are Android compatible, more prevalent in developing regions, and Google has shown interest in donating phones and tablets. As a result, these two devices are representative of devices that would be used by health workers for more than patient data collection. Community health workers enjoy having phones as a means of accessing educational material and keeping in touch with other health workers and clinics in their area. At the same time, the phone runs multiple processes simultaneously. Even if it were possible to calculate the exact energy cost of OpenEmo application processes, it is not realistic to assume it is running an isolated process. For these reasons, the error bar on this analysis is high and would result in more significantly frequent charging than the other components calculated with ideal efficiency. Another major determinant is screen size and brightness. That said, the following is how we roughly determined the perfect case charging frequency for the Nexus 4 and Nexus 7. We used an app called Battery Doctor (Battery Saver)1 to confirm that a full Nexus 7 battery could power 1,220 minutes in idle mode, which is equal to about 600 minutes of Bluetooth communication or reading. For one EEG sample collected with OpenEmo, the app spends about 65% of its time using Bluetooth and the other 45% of its time open for other tasks, which is closest in energy drain to Battery Doctor’s interpretation of reading a backlit screen with some processing of input from the touch screen. Therefore, 65% of the time while OpenEmo is running Bluetooth and reading the battery is draining twice as fast as reading alone, which is draining the battery about twice as fast as the idle state. 1,220 minutes of idle divided in 65% and 45% portions is equal to 793 minutes and 549 minutes respectively. 793 minutes divided by 4 because the battery is draining 4 times that of idle state while the app is communicating over Bluetooth and being open for the user to input information, is 198 minutes. 549 minutes divided by 2 because the battery is draining twice as fast as the idle state for the remaining 45% of the time, is 274 minutes. For our 3 minute case scenario, the OpenEmo app could run on repeat for 157 samples. The same calculations repeated for the Nexus 4 phone results in 76 samples. Figure 5 assumes 75% would be maximum efficiency, although this may even be too high. Based on these calculations, we conclude that the phone or tablet, not the OpenEmo adapter would be the limiting factor for battery life in the field. The OpenEmo adapter and Android device are powered via USB, and would therefore be easy to charge with backup external battery packs which are also increasingly inexpensive. B. Data Scale The initial deployment of EEGStore targets Kenya. The authors of a previous study [2] have expressed interest in expanding their study to this region, but they have also expressed interest in eventually scaling up to international proportions. As such, we first consider the size of data that we could expect to see for our initial Kenya deployment and whether EEGStore will support this scale. Figure 2 gives an indication of the size of the population that EEGStore needs to accommodate. It is estimated that there are 15,000 CHWs in Kenya and that each visits a population of 50-500 patients [4]. At this scale, we would Fig. 7. The number of EEG samples that each of the mobile devices in the system can take before needing to be recharged. A visit-day is assumed to be 15 visits per day and is representative of how many visits would be made in a typical setting. Note - OpenEmo is not the limiting factor when a mobile phone is used. like to accommodate the data for, at most, 7,500,000 patients. The EEG files we collect are relatively small. They are simple time series recorded in basic text files. The size of these files for the 14 channel EPOC sampling at 128Hz for 2 minutes is about 860KB. We estimate that if a more sophisticated 64 channel device that samples at up to 250Hz were used with OpenEmo instead, then the EEG file size could be as much as 7.7MB. Thus, each CHW will collect at most 3.85 GB of data from their patient population. As long as data is transferred to local and remote servers in a timely fashion, this size of data could fit into a free 2GB DropBox account. It does imply that a local clinic collecting the more fine grain studies would have to invest in a 50GB DropBox for $99 per year. However, as a CHW will visit at most 15 patients a day, they will collect at most 115.5MB of data per day. As long as data is harvested daily from the clinic’s DropBox folder and stored in more permanent storage, then free DropBox accounts are feasible both for the local servers as well as the phones. C. Time Due to it’s mobility and the size of data collected, EEGStore extremely time efficient. The transfer of data between OpenEmo and the phone is under a second. The time to upload a single file to DropBox is also under a second, on our high speed internet connections. The time to upload files to DropBox, and thus transfer them from the phones to remote research servers, will be longer on the slower internet connections that we expect to encounter in clinics. However, due to the small size of the files, this should still not be a limiting cost. It is also less critical as uploading can be done when the phone is plugged into a more stable power source. It definitely seems like the longest component of the system will be data sampling phase, which is entirely study dependent. D. Cost 1 Battery Doctor Android app has an average rating of 5 out of 5 star by 183,584 users at the time of our download. Explanation of how remaining battery life for each task is calculated is not published. The EEGStore system is one of many puzzle pieces that could help save people from a lifetime of disability by enabling TierStore is a filesystem that implements a variety of features for the sake of effectiveness in developing regions with challenging network environments [15]. Some of the functionality is similar to DropBox. EEGStore is prototyped with Dropbox because it provides some of these same helpful features, especially delay and fault tolerance. Fig. 8. Cost of each component of the mobile section of EEGStore indicates the OpenEmo is an affordable component. Note that both headsets and phones will potentially be donated by Emotiv and Google, which is why we chose these specific devices. early intervention. Neuropsychological disorders are a painful human experience as well as expensive for healthcare systems. The regions of the world that stand to gain the most from the ability to monitor and diagnose mental health are resource poor regions lacking an adequate number of specialists. For these reason, we suggest that a few hundred dollars is a reasonable budget for all necessary technological components. See Figure 8 for more details. The OpenEmo adapter was prototyped for less than $75, and could be produced even cheaper in bulk. Smartphone and EEG device companies would likely be willing to donate some number of devices because of the potential scale if initial deployments are successful. VI. R ELATED W ORK While there is no system quite like EEGStore or device like OpenEmo, there is myriad previous work that is very related. In this section we highlight some of the most pertinent projects related to data systems for information and communication for development (ICTD), mobile medical devices, and open source EEG. A. ICTD Data Systems Open Data Kit (ODK) is an open source data collection system that enables users to quickly generate surveys and begin collecting data [14]. It consists of a web application for designing forms, an accompanying Android app called ODK Collect to use the forms, and a cloud storage component. EEGStore also has an Android application and cloud component. Similarly to EEGStore, ODK is optimized for use in challenging environments, including intermittent internet connectivity, which is accomplished by allowing delayed data synchronization until connected or until the user chooses. ODK was designed with the need for data collection in developing regions. ODK is designed to be generalized whereas EEGStore is highly specialized for collecting EEG data. ODK is in the process of implementing support for sensor data collection, but this feature has not yet been released for production. A beta version of ODK Sensor was released on 23 November 2013. Another difference is that a user of ODK has the option to implement Aggregate, server side component anywhere they like or not at all. To best serve the field of medical research, it is preferable that if a user is willing to share their deidentified EEG data for research purposes, it should be sent to one common international database, according to William Bosl [7]. ODK may have been an alternative tool for prototyping EEGStore, but it alone would not an alternative to the system itself. OpenMRS is the world’s leading open source medical record system platform [16]. Ideally, it would be on the receiving end of secure patient data collected through the EEGStore system. The mission of OpenMRS is to improve health care delivery in resource-constrained environments by coordinating a global community that creates a robust, scalable, user-driven, open source medical record system platform. EEGStore also aims to be a robust, scalable, open source system for the betterment of global health. Ideally, the EEGStore system will contribute data to an OpenMRS table structure so as to include EEG data in OpenMRS medical records. Another notable project in the data area of ICTD is Sensing Atmosphere, where researchers collected geographically distributed air quality data by equipping citizens’ basic phones with inexpensive air quality sensors [17]. A common way to get ICTD wrong is by excluding local knowledge. Like Sensing Atmosphere, which is a manifestation of “participatory urbanism”, EEGStore focuses on enabling people living in developing regions to make a great contribution to larger global issues. By contributing data, participants of EEGStore will contribute to the study of mental health. We have been told that by providing this contribution to a larger goal, people will be even more excited about participating in EEGStore [7]. B. Mobile Medical Devices Low cost mobile medical devices designed for use in emerging regions is a growing area of research. Two of many examples we found to be related to EEGStore and OpenEmo are the Berkeley Tricorder [18]and PartoPen [19]. The Berkeley Tricorder is wireless health monitoring device that stands in contrast to typically large and expensive technology for measuring vital signs [18]. This device may be worn on a part of the body and sends sensor data over Bluetooth to an iOS application. The Berkeley Tricorder incorporates multiple sensing options, although EEG is not one of them. This device is meant for personal self monitoring more than it is by a health worker. The PartoPen is a technology that makes simple partograph data easier for minimally trained birth attendants to monitor the birth process and avoid obstructed labor which is major preventable cause of death in developing countries [19]. EEGStore also aims to make data available for health worker decision support. The PartoPen reminds health workers when to revisit patients, which may be a beneficial feature to add to the OpenEmo Android app. C. Open source EEG There is growing interest in the power of EEG signals and the information that can be extracted. The open source project OpenEEG is a consolidation of some of the efforts that are going into making EEG signals accessible to more than just elite medical experts [20]. The OpenEEG project focuses on hardware and instructions for how individuals can make their own EEG. They also include links to other software projects for mining EEGs. Many open source projects are also arising for mining and analyzing EEG signals. These projects include EEGLAB [21] and FieldTrip [22], which are opensource EEG Matlab toolboxs. PyEEG is a similar toolbox for Python [23]. VII. L ESSONS L EARNED An international database for collecting EEG samples from a variety of devices has a significant amount of value for medical researchers. If there were a simple approach to implementing this end to end system, it would have been done. We find it useful to document our major hurdles here. Although initially deterred from building EEGStore from ODK, because of its newly developing support for sensor data, in hindsight, we may have tried to use more ready made open source components than we did. One question that is important to consider is how to verify that the data coming from the relatively inexpensive EEG device is valid and not noise. Complicated solutions to this issue are expanded upon in Section VIII. For the OpenEmo and EEGStore prototypes we are able to compare plots of data collected when the Emotiv EPOC is not on a person’s head (noise) to plots of data collected when the device is situated properly on a person’s head. The difference is clear, but we do not claim that this is good enough for use in the medical field. Most commodity EEG devices generate a timeseries of the sensor data along with a signal strength value. One solution is to only accept data into a recording if the signal strength is higher than a certain threshold at which the EEG device manufacturers promise that the data is of a certain quality. Deeper investigation of the methods used to verify signal strength would be necessary to determine if this simple solution is good enough. Further testing could reveal how often the signal strength is acceptable. We chose to focus more on the flow of data than we did the quality of the data, but both are vital components. Finally, knowing that energy was one of our constraining factors, we attempted to build a simple Android application using Bluetooth Low Energy (BLE) instead of Bluetooth 3.0. This was challenging for us as beginning Android developers because BLE’s use with Android it is only recently and not widely supported or documented. Adequate documentation was lacking for necessary low-level programming in a variety of dependencies. This is not ideal for any open source project. After analyzing our energy calculations, we realized that this was a premature optimization. BLE uses significantly less energy for an insignificant amount of time and significantly increased difficulty of development. VIII. F UTURE W ORK A. Server development We would like to further develop our server setting. While we currently have data processing and a full data flow established, we need to establish a database system and integrate incoming data into it. We need more efficient storage mechanisms and to better separate survey data from raw EEG data, while not losing connections between data components. B. Data processing and analytics We would like to implement further data processing on incoming data. This is important not only for future medically diagnostic machine learning algorithms, but also for assessing data quality in a timely manner. If data quality is established quickly enough, then poorly collected samples can quickly be retaken. We believe that more data processing must occur both on OpenEmo as well as on the server. Further processing on OpenEmo is of particular importance so that onsite assessments of data quality can be made. Such processing can incorporate discrete Fourier transformations to ensure that characteristic wavelengths occur at appropriate sensors. Strength of signal can also be address from sensor voltage reading. C. OpenEmo compatibility with more headsets While OpenEmo was prototyped with the Emotiv EPOC headset, we would like to make it compatible with more headsets. The extension will also inspire how we formulate the server side support as different headsets have different numbers of channels and thus collect different data. How to adapt the Android application to address which sensors are being used is not trivial. Adjusting the system and database storage to account for a variety of systems could also be an interesting problem. D. OpenEmo refinement We would like to increase the functionality and appearance of OpenEmo. This includes making a smaller enclosure, more stable bluetooth connect, and greater app functionality. One of the most immediate actions would be to put in a DropBox Drop-in for sending the data directly from the OpenEmo app to the remote Dropbox servers. Currently this functionality is underdevelopment at DropBox for Android. We will incorporate it as soon as it is available. We would also like to revisit the hardware decisions for OpenEmo. The current ARM processor is rather powerful and it may be possible to change to a different ARM processor architecture to decrease power consumption. One trade off that we will have to consider is the ease of development. RaspberryPi is well established and continuing to use it would enable external developers to contribute modules or drivers to adapt new headsets to our system. Basing OpenEmo on a different architecture could hinder its accessibility to outside contributors. E. Data encryption and security A very interesting problem that we did not have time to consider was that of data security. We would like to study the cost and need for various levels of data security between devices and levels in the system. This concept will probably be of critical importance when we consider moving the data between the phone and remote servers. How to move the data in a secure, HIPAA compliant, and energy efficient way is a very interesting issue that we would like to address. Most likely encryption will be needed at different levels, but how to manage the keys and where the encryption should occur, e.g. on the OpenEmo or on the phone, is an issue that we would like to delve into. F. Replacement of cloud storage While our initial prototype uses DropBox to transfer data between the mobile phone and servers, we would like to replace this with a more secure method for cloud storage and data transfer. We need a HIPAA compliant alternative that still offers all the benefits of dropbox - fault tolerance, backup, versioning, etc. Alternative cloud storage systems exist, such as Google Drive and Box [24], but it is not clear that they will be sufficient for our needs or if we need to build our own transfer system. [3] [4] [5] [6] [7] G. Data quality We would like to incorporate more use of analytics for data quality during the visit. Metrics such as signal strength are often available with EEG headsets, including the Emotiv EPOC that we prototyped for. This data has not been included in our app and has not been used to censor data that is not of appropriate quality. We would like to look further into developing quality and efficient metrics for assessing data signal quality and then how to include this information on the server for data processing. H. Integrate with other systems Having a data collection system is only as good as the functionality that it can connect to. We would like to connect EEGStore with other extant systems to provide further data analysis and integration into medical records. A few such systems are EEGLAB [21], FieldTrip [22], PyEEG [23], and OpenMRS [16]. Both EEGLAB and FieldTrip are well established, expect data from headsets that have more channels than many commodity headsets that we are looking to utilize. As a result, we would like to find a way to utilize the functionality of these two toolboxes from the variety of headsets that our system will be compatible with. Similar challenges will arise with the PyEEG toolbox. We would also like to integrate our EEG readings into OpenMRS, which is quickly being adopted as the only option for electronic medical health records in developing regions. Ideally, EEGStore would be compatible with OpenMRS so that EEG could become a more regular diagnostic tool. IX. C ONCLUSIONS We have created a prototype system of EEGStore. This system collects, stores, and processes raw EEG data from mobile headsets. We have also developed the OpenEmo adapter to collect data from dongle supported EEG headsets, dencrypt, and send the raw data to an Android phone or tablet. The hope is that the end to end system will help make EEG technology and data available for improved healthcare. While there is still much development needed in our system, we have presented some analysis that shows our prototype could be viable for the situation it targets and that further work is very promising. R EFERENCES [1] [2] R. Idro, A. Kakooza-Mwesige, S. Balyejjussa, G. Mirembe, C. Mugasha, J. Tugumisirize, and J. Byarugaba, “Severe neurological sequelae and behaviour problems after cerebral malaria in ugandan children,” BMC research notes, vol. 3, no. 1, p. 104, 2010. [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] W. Bosl, A. Tierney, H. Tager-Flusberg, and C. Nelson, “Eeg complexity as a biomarker for autism spectrum disorder risk,” BMC medicine, vol. 9, no. 1, p. 18, 2011. “Personal correspondence with Dr. Julius Awakame, founder & president. the west african health informatics fellowship program, staff grade psychiatrist, cygnet hospital wyke, uk. julius[at]wahifp.org,” http://www.wahifp.org, November 2013. “The global fund to fight aids tuberculosis and malaria kenya malaria proposal round 10,” (accessed July 17, 2011). [Online]. Available: http://www.theglobalfund.org/grantdocuments/KENR10-ML PROPOSAL 0 en “Neurosky: Brainwave sensors for everyone,” http://neurosky.com/, accessed: 15 Dec. 2013. “Emotiv epoc,” http://www.emotiv.com/epoc/, accessed: 15 Dec. 2013. “Personal correspondence with William Bosl PhD. director and assistant professor of health informatics, university of san francisco. wjbosl[at]usfca.edu,” November 2013. “Raspberrypi,” http://www.raspberrypi.org, accessed: 15 Dec. 2013. “python-emotiv github repository,” https://github.com/ozancaglayan/pythonemotiv, accessed: 15 Dec. 2013. “Emokit github repository,” https://github.com/openyou/emokit, accessed: 15 Dec. 2013. “Dropbox.com,” http://www.dropbox.com, accessed: 15 Dec. 2013. “Smartronix usb power meter,” http://www.cyberguys.com/productdetails/?productid=76178, accessed: 15 Dec. 2013. R. Heydon, Bluetooth Low Energy: The Developer’s Handbook, 2012. “Open data kit,” https://opendatakit.org, accessed: 15 Dec. 2013. M. J. Demmer, B. Du, and E. A. Brewer, “Tierstore: A distributed filesystem for challenged networks in developing regions.” in FAST, vol. 8, 2008, pp. 1–14. “Openmrs,” http://openmrs.org/, accessed: 15 Dec. 2013. E. Paulos, R. Honicky, and E. Goodman, “Sensing atmosphere,” Human-Computer Interaction Institute, p. 203, 2007. R. Naima and J. F. Canny, “The berkeley tricorder: wireless health monitoring,” in Wireless Health 2010. ACM, 2010, pp. 212–213. H. Underwood, “Partopen: Enhancing the partograph with digital pen technology,” in CHI’12 Extended Abstracts on Human Factors in Computing Systems. ACM, 2012, pp. 1393–1398. “Openeeg - eeg for the rest of us,” http://openeeg.sourceforge.net/doc/, accessed: 15 Dec. 2013. A. Delorme and S. Makeig, “Eeglab: an open source toolbox for analysis of single-trial eeg dynamics including independent component analysis,” Journal of neuroscience methods, vol. 134, no. 1, pp. 9–21, 2004. R. Oostenveld, P. Fries, E. Maris, and J.-M. Schoffelen, “Fieldtrip: open source software for advanced analysis of meg, eeg, and invasive electrophysiological data,” Computational intelligence and neuroscience, vol. 2011, p. 1, 2011. F. S. Bao, X. Liu, and C. Zhang, “Pyeeg: an open source python module for eeg/meg feature extraction,” Computational intelligence and neuroscience, vol. 2011, 2011. “Box.com,” https://support.box.com/hc/en-us, accessed: 15 Dec. 2013.