Big Data Becomes Personal: Knowledge into Meaning: Papers from the AAAI Spring Symposium Personal Life Repository as a Distributed PDS and Its Dissemination Strategy for Healthcare Services Kôiti Hasida Social ICT Research Center, Graduate School of Information Science and Technology, the University of Tokyo hasida.koiti@i.u-tokyo.ac.jp Abstract As a distributed PDS (personal data store), PLR (personal life repository) allows individual users to totally control their own data and drastically reduces the cost and risk to service providers in storing personal data. Using basically free cloud storage services for sharing data, PLR is also easy to deploy, inexpensive to operate, and hence smooth to disseminate. PLR can spread through collaboration with existing service providers. Personal Life Repository PDS (personal data store) is an IT tool to allow individual users to accumulate and maintain their own data and to socially share the data while explicitly specifying which data to share with whom. Shown in Figure 1, PLR (personal life repository; Hasida, 2012; 2103) is a kind of distributed PDS. Figure 2: Present Architecture of PLR PLR encrypts personal data in such a way that each PLR user herself and those whom she explicitly permits to access the data can decrypt the data but others including the storage service providers cannot. The fundamental functionalities of PDS for both data storage and sharing are provided by existing cloud storage services basically for free. One could alternatively use home servers or PaaS- or IaaS-type cloud services such as Google App Engine and Amazon EC2, but they offer rather limited free storage and an additional service is necessary in order to provide common IDs for authenticating accesses to shared data. Existing IDprovision services such as those of Facebook and Twitter may be employed here. However, the current PLR implementation instead utilizes cloud storage services integrating data storage and sharing, which is more convenient particularly for spreading PLR all over the world including those areas where those popular IDprovision services are unavailable. Figure 1: Functionalities of PLR It is currently designed to make full use of existing commodities. In particular, it utilizes personal cloud storage services such as Google Drive and Dropbox (or networks of cloud storage services, such as Respect Network) as shown in Figure 2, for the sake of sharing data among PLR users. 10 Data Aggregation sharing the care records with the residents’ family members. As many nursing homes are affiliated with hospitals, data sharing with those hospitals is beneficial to the nursing homes and hence to the hospitals. Second, some EHR (electronic health record) venders may well be interested in providing hospitals with the functionality of EHR for data coordination with PLR, which will benefit both the vendors and the hospitals. The venders and third parties will be able to provide commercial health-management services to patients of those hospitals. That will reduce the times and the durations of hospitalizations, and hence the hospitals’ expenditure. Either way, healthcare stakeholders will be able to share data as shown in Figure 4. One must physically aggregate personal data from many people’s PLR in order to efficiently analyze the big personal data. Of course PLR itself supports this aggregation, but one would use larger-scale servers to aggregate huge amount of data. Figure 3 shows a privacypreserving data aggregation, where the data broker collects personal data from many PLR users, let the analyst manipulate the data, and give him the analysis result, without disclosing raw data to him. Figure 3: Aggregation of Distributed Big Data The data broker can also conceal the analysis result if it contains any information possibly violating somebody’s privacy. This scheme of data aggregation and utilization both protects privacy and allows individuals to easily keep track of and control how their personal data are used. Note that it also promotes utilization of personal data, because PLR makes it very easy for the data broker to ask individuals to disclose their data. Figure 4: Healthcare Data Coordination Final Remarks PLR, OpenPDS (de Montjoye, et al., 2012), and the other PDSs should be interoperable. As they spread, PaaS-type cloud services to support data storage and sharing are expected to appear and facilitate the implementation and operation of distributed PDSs. Dissemination Strategy Typically expected usages of PDS are those in healthcare services, where PDS serves as PHR (personal health record). PHR has been successful only through top-down dissemination in some European countries, but privatesector PHR, such as Google Health, Microsoft HealthVault, and My Hospital Everywhere (in Japan), are far from being popular. This is probably because medical records have been accumulated and stored mainly at hospitals but they have not directly benefited from sharing the data. There are some dissemination strategies for PLR-based healthcare entailing profits of all stakeholders including hospitals, which are effective at least in Japan. First, most elderly nursing homes have not yet been computerized in Japan. The introduction of PLR into nursing homes will not only reduce the workload of caregivers but also create a new profitable service of References Kôiti Hasida. 2012. Personal Life Repository as Foundation of B2C Services ― Big Data without Big Brother ―. The 21st Annual Frontiers in Service Conference, Maryland University. Kôiti Hasida. 2013. Personal Life Repository for ConsumerInitiative Services. The 22nd Annual Frontiers in Service Conference, 08-102, National Taiwan University. Yves-Alexandre de Montjoye, Samuel S. Wang, Alex (Sandy) Pentland. 2012. On the Trusted Use of Large-Scale Personal Data. Bulletin of the Technical Committee on Data Engineering, 35(4), 5-8. 11