Current Issues Digital economy and structural change Big data May 5, 2014 The untamed force Authors Thomas F. Dapp +49 69 910-31752 thomas-frank.dapp@db.com Big data is increasingly becoming a factor in production, market competitiveness and, therefore, growth. Cutting-edge analysis technologies are making inroads into all areas of life and changing our day-to-day existence. Sensor technology, biometric identification and the general trend towards a convergence of information and communication technologies are driving the big data movement. Veronika Heine Editor Lars Slomka Deutsche Bank AG Deutsche Bank Research Frankfurt am Main Germany E-mail: marketing.dbr@db.com Fax: +49 69 910-31877 www.dbresearch.com DB Research Management Ralf Hoffmann Huge challenges must be overcome if the benefits are to be leveraged effectively. Matters of concern alongside increasing volumes of data, varying data structures and real-time processing include data security, data privacy policies that are in urgent need of reform and the rising quality expectations of the stakeholders. There is a widespread lack of suitable strategies to respond to the digital revolution. Data has an economic value. The appetite of many stakeholders for data, including personal data, is growing. Internet users themselves could become the traded commodity, without them ever being aware of this or experiencing any financial benefit. The opportunities for business, academia and politics are manifold: they range from increases in productivity, reduced costs and optimised customer communications to better disease forecasting and scientific modelling. In addition, more and more public authorities are offering anonymised data sets as a means of generating new products, services and business models. The risks are not to be underestimated: many people complain not only of a growing feeling of information overload but also of an increasing loss of data sovereignty. Any number of stakeholders are compiling, storing and analysing data, including personal data, for any number of different reasons. If only some of them misuse this data, trust in digital channels could be broken, or at least significantly reduced. This would deal a fatal blow to innovation and growth, and not only in the still relatively young field of research into information technology. The big data movement cannot be stopped. It is now a question of putting in place the necessary regulatory framework to allow these state-of-the-art methods and the technology that underpins them to properly flourish. With their success in the fields of sensor technology and biometrics as well as data protection, Germany and the European Union now have the opportunity to play a leading international role in the development of secure IT infrastructures. Big data – the untamed force 1. Big Data – A next logical stage of evolution for the internet ....................................................................... 3 2. Stakeholders, data characteristics, sources and drivers ............................................................................ 5 3. Sensors and biometrics take over the mass market ................................................................................. 10 4. On the role of digital ecosystems in the big data world ............................................................................ 13 5. The economic value of data ..................................................................................................................... 17 6. Big data and data privacy ........................................................................................................................ 20 7. (Big) data in practice: There is no ideal solution ....................................................................................... 24 8. The limits of big data ................................................................................................................................ 31 9. Conclusion ............................................................................................................................................... 33 Reference list ................................................................................................................................................. 36 Further reading ................................................................................................................................................ 38 2 | May 5, 2014 Current Issues Big data – the untamed force 1. Big data – A next logical stage of evolution for the internet “Technology is neither good nor bad; nor is it neutral.[…] Technology’s interaction with the social ecology is such that technological developments frequently have environmental, social and human consequences that go far beyond the immediate purposes of the technical devices and practices themselves.” [Melvin Kranzberg, 1986] Web searches for the term ‘big data’ 1 Worldwide interest, 2004-present, smoothed 100 80 60 40 These examples may seem somewhat outlandish at first, but the controversial and increasingly widespread discussion about the mega topic of ‘big data’ puts them in a completely new light. Using sensors, a multitude of data sets and specific algorithms, automatic predictions could soon be made about particular behavioural tendencies (and not just online) on the basis of simple correlations. Hidden in the ocean of data could be any number of unforeseen, surprising and potentially valuable correlations that we can only begin to imagine today. 20 0 Source: Google Trends (www.google.de/trends/) A new way of thinking about data ICT revenue growth 2 Germany, 2010=100, adjusted for working days and seasons 120 110 100 90 80 70 60 03 04 05 06 07 08 09 10 11 12 13 Information and communication Information services Data processing, hosting etc., web portals Source: German Federal Statistical Office Could there possibly be a correlation between people buying books and shoes from the online retailer Amazon at a certain time in the evening? Are credit card companies able to predict which of their customers may be heading for a marriage crisis, solely on the basis of individual transactions? Might it be possible to use GPS tracking of mobile devices in combination with near-realtime analysis of microblogging posts to predict and prevent or at least mitigate an impending epidemic? In the future, could an internet-enabled e-toothbrush send information about your oral hygiene straight to your dentist so that they can see if you need to make an appointment? The way in which people think about data and data analysis will gradually change as well, in addition to the technological possibilities. Thanks to the latest internet technologies, the potential for harnessing all that can be measured and analysed using solid data, intelligent sensors and filtering has never been as promising and lucrative as today, at the dawn of the digital era. We are in the midst of the digital revolution. The importance of web-based technologies is growing in all sectors of the economy. Almost all commercial transactions, and many private ones too, are now carried out digitally, which brings down the cost. Digitisation is changing society and people’s day-to-day economic lives, and stakeholders from business, academia and politics are responding to this – albeit at different speeds. The ever-growing economic benefit behind this digital development is derived from individual data points, 1 which are increasingly being queried in real time. Big data is the new, hotly debated topic that follows a series of logical stages in the development of the internet, such as individualisation, the relocation of data to the cloud and the rapidly growing demand for digital mobility. It bridges the gap to what has evolved before. In principle, the idea is to combine different volumes of data with new data sets and to identify any patterns in this aggregated data using intelligent software, with the ultimate aim of drawing the right (and wherever possible, lucrative) conclusions from the findings. Once they have been compiled, primary data sets can be analysed any number of times for different purposes and for different stakeholders. The data functions as a driver of innovation, creativity and out-of-the-box thinking, and in an ideal world results in new business ideas, products or services. However, it can and will lead to widespread concerns and fears, because data sovereignty, i.e. the degree of control that people have over their own information, can quickly be compromised, as many examples show. At the end of the big data process, it is important to draw the right conclusions from the combination of technology and methodology. This poses another challenge for the decision-makers, as this is where two worlds collide: the bits, 1 3 | May 5, 2014 Identifying patterns or useful information from large batches of data is known as data mining. Current Issues Big data – the untamed force Employment in the service sector (ICT) 3 Germany, 2010=100 125 115 105 95 85 75 65 03 04 05 06 07 08 09 10 11 12 13 Information and communication Information services This far-from-trivial challenge is a growing concern for stakeholders from business, academia, politics and society. The commercial importance of realtime data, such as financial and trading transaction, sensor/measurement, health and social network data, is growing all the time, as is the impact this has on business processes in a globally connected economy and on people’s personal lives and social environment. More and more data is being generated by different stakeholders that can (or could be) filtered, analysed and used for different purposes several times over – and this a trend that is growing exponentially. Data processing, hosting etc., web portals Source: German Federal Statistical Office The desire of man to quantify the world Global market potential of big data 4 EUR bn 18 16 14 12 10 8 6 4 2 0 15.7 12.0 8.8 6.3 3.4 2011 4.6 2012 2013 Sources: Bitkom, Experton Group 4 | May 5, 2014 2014 2015 bytes and algorithms on the one side and us, the people, on the other, with our instincts, experience and intuition. What counts more when it comes to making a decision? The results drawn from the statistics or a person’s long-standing experience or gut instinct? The potential that computer-based algorithms have for extracting useful information from the combined data sets is undoubtedly still in its infancy and there is plenty of room for improvement. The hope remains, however, that despite the technological progress and the barely imaginable pace of development in this field, a harmonious relationship between man and machine will emerge. Leaving aside the fact that data privacy regulations in this still young international field of research have so far proved inadequate, surely only very few people would like to live in a world in which decision-making processes are left solely to programmed algorithms, even to those that have the capacity to learn. On the other hand, of course, information and communication technologies are able to increase efficiency, reduce costs and minimise human error – quite a trade-off. 2016 Although we are only at the start of the big data revolution, the technology and techniques at our disposal mean we are heading towards a situation in the long term in which people’s actions, activities and moods can be made more and more measurable. This means turning data into a form that can be read by a machine so that it can be combined, perhaps with other data sets, and analysed. Thanks to the possibilities presented by sensor technology and biometric identification, we are getting closer to being able to quantify the world and the individuals that inhabit it – an aspiration that has existed for as long as human thought. It also brings us closer to the visions of humanoid robotics and adaptive artificial intelligence, as has been shown in countless science fiction films and now could become reality for the mass market too. As with every new and (r)evolutionary movement, there are opportunities as well as risks. Both should be subject to a broad and transparent discussion at this early stage of development. From a German and European perspective, it is important for reasons of competitiveness to create a favourable environment now that will stimulate the potential of the big data movement. Meanwhile, the scope for legally and morally questionable practices involving the misuse of data must be limited by legislation on data protection that applies throughout Europe and ideally further afield. Which steps are taking us closer to big data? The following chapter introduces the different stakeholders and explains the basic characteristics of the data. There will also be a focus on the sources and drivers of the growing flood of data as well as a number of examples that illustrate these. Chapter three puts the spotlight on sensors and biometrics. Both technologies are becoming increasingly suitable for the mass market and hold huge and lucrative potential for growth. The convergence of information and communication technologies, as a cross-sectoral trend, is generating additional impetus and scope for experimentation and for new business models, innovative products and services. The role of the digital ecosystem in the big data discussion is explored in chapter four. Certain large internet companies are expanding their core competencies with the aid of modern technologies and techniques in the field of data analysis. This increases not only the spectrum of their products and Current Issues Big data – the untamed force services but also the degree to which their customers are ‘locked-in’ and the costs of switching to another provider. Data has a commercial value. All kinds of data sets can be used any number of times by different stakeholders for different purposes; they can be monetised, but they can also be abused or misused. It is well within the realms of possibility to create digital profiles of people that are uncannily (and even frighteningly) realistic – and for the most part without obtaining their consent. The excessive data-gathering activities of individual stakeholders, a subject that can be hard for the layman to understand, are discussed in chapter five. Chapter six explores the data protection policies that are inadequate for the big data movement but are fundamental to its future, and proposes several solutions. Chapter seven gives a brief insight into various big data projects in the realms of business, academia and public administration. The limits of the big data movement, most notably the increasing complexity caused by merging different data sets, are the subject of chapter eight. Chapter nine closes this study with a brief summary and a look ahead to where the journey might take us in the coming years. 2. Stakeholders, data characteristics, sources and drivers Market potential of big data in Germany 5 EUR m 2,000 “I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s?” [Hal Varian, Google’s Chief Economist on the subject of statistics and data, January 2009] 1,600 505.8 1,200 522.6 800 400 0 Big data is more than just IT: Stakeholders in all kinds of sectors 398.3 69.3 75.2 53.4 2011 306 411.1 214.7 315.6 657.5 119.4 227.7 133.5 208.2 334.7 475.3 98.3 2012 Services 2013 2014 Software 2015 2016 Hardware Sources: Bitkom, Experton Group DE and SE with the highest success rates for ICT patent applications 6 ICT* patents %, average percentage success rate of patent applications 48 SE 48 44 GB 37 US 32 KR 7 %, average percentage success rate of patent applications DE JP Many decision-makers have recognised that big data is no longer purely the preserve of IT. Big data is instead becoming a movement that brings together cutting-edge internet technologies and analysis techniques in order for large, extendable and above all differently structured data sets to be captured, stored and analysed. This gives big data a broad, international dimension with different knowledge-based outcomes and expectations with regard to increasing growth and efficiency. But above all, big data provides scope for experimentation, innovation and creativity, offers a wealth of potential new data combinations and is therefore ideal for discovering unexpected correlations. It could be used to create new business models, products and services and to drive innovation. 70 60 50 40 30 20 10 0 06 07 08 Germany USA South Korea 09 10 11 12 20 CN Japan UK China Sweden 14 0 10 Source: EPO 20 30 40 50 * EPO definition: audio-visual technology, telecommunications, digital communication, basic communications processes, computer Source: EPO Because of Germany’s scarcity of natural resources and the strength of its research-intensive products and services, the big data movement clearly has the potential to become a stable comparative advantage for the German 5 | May 5, 2014 Current Issues Big data – the untamed force economy. Big data can become a factor in production and competitiveness that will open up new possibilities for value creation and ease the path for (r)evolutionary movements similar to that of the mass use of the internet. Above all, providers of new, high-performance data analysis tools are expecting longterm increases in efficiency and, in particular, lucrative revenue opportunities, for example through the collection and analysis of detailed customer-specific information: in short, big data ultimately means big business. In addition to the IT sector, stakeholders from the worlds of business, academia and public administration are increasingly being confronted with big data. Mining the mountains of data that are produced by various different sources has obvious economic benefits but is also being used more and more to address social and humanitarian (e.g. medical) problems, as an example in chapter seven illustrates. However, the success of these efforts hinges entirely on the modern technologies and developments being underpinned by a mandatory legal framework. This needs to meet the requirements of the data protection policies on a European (and, ideally, international) level and to put information privacy at the forefront of discussion. Areas of business that stand to benefit the most 8 Corporate potential of big data 9 % of respondents* (n=254), 2012 % of respondents* (n=254), 2012 Accounting and reporting 35 Optimising IT infrastructures Financial planning/budgeting Price optimisation Customer profitability Optimisation of customer behaviour Monitoring & optimisation of IT traffic 30 28 10 20 36 Better business management 33 More detailed information 33 Better basis for decision making 31 Optim. of existing business cases 30 Ident. of new business potential 28 Improvement in compliance aspects 17 17 17 0 45 42 Better information management 23 23 22 21 19 19 18 Sales analysis/management Better utilisation of machines Managem./optimisation of campaigns Plant management/capacity utilisation Higher employee utilisation Fraud detection/legal regulations Cost containment Faster way to get information 27 More precise financial reporting 24 Data governance 30 22 0 40 *multiple answers permitted *multiple answers permitted Source: IDC Source: IDC 10 20 30 40 50 The characteristics of data: Complex, heterogeneous but of huge value The defining properties of big data, known as the three Vs, are introduced in the following section using a number of examples and illustrations: — Volume The first criterion refers to the volumes of data that occur in research and business and in the private and public spheres. The relentless progress of digitisation means that virtually all aspects of modern life are involved. In the private area alone, people store and archive an enormous volume of data that runs into the three-digit gigabyte range. Included in this are digital photos, documents, spreadsheets, e-mails, music files and film downloads. Companies, research institutes and public authorities also store, archive and analyse masses of data, but this is done on a far greater scale. 6 | May 5, 2014 Current Issues Big data – the untamed force More bits and bytes than grains of sand on the beach The storage capacity of mobile devices (smartphones, tablets, e-readers) or laptops is usually measured in gigabytes, a quantity of data we can still just 2 about get our head around. However, businesses, research institutes and even intelligence agencies that generate, manage and analyse vast sets of data on a daily basis are now using terms such as terabyte, petabyte, exabyte and zettabyte. Last year, 1.8 zettabytes of data were produced worldwide. One zettabyte is equal to one sextillion bytes and would be followed by 21 zeroes (1,000,000,000,000,000,000,000). Units of measurement for data at a glance 10 1 Byte = 8 Bits = 1 1 Kilobyte (kB) = 103 Byte = 1,000 1 Megabyte (MG) = 106 Byte = 1,000,000 1 Gigabyte (GB) = 109 Byte = 1,000,000,000 12 1 Terabyte (TB) = 10 Byte = 1,000,000,000,000 1 Petabyte (PB) = 1015 Byte = 1,000,000,000,000,000 1 Exabyte (EB) = 1018 Byte = 1,000,000,000,000,000,000 21 1 Zettabyte (ZB) = 10 Byte = 1,000,000,000,000,000,000,000 1 Yottabyte (YB) = 1024 Byte = 1,000,000,000,000,000,000,000,000 Source: author’s illustration Data traffic on mobile, web-enabled devices 11 Petabytes per month 20,000 16,000 10,424 12,000 8,000 4,000 7,973 0 11 12 13 14 15 Mobile PCs/routers/tablets Source: Ericsson 2013 16 17 18 Smartphones 19 “Experts have estimated that mankind generated around five billion gigabytes of data from the beginning of recorded history to the year 2003. […] In 2011 the 3 same volume of data – 4.4 exabytes – was amassed every 48 hours.” The trend has continued: with more powerful computers, cheaper storage media and smarter algorithms, these volumes of data occurred every ten minutes in 2013. So if you believe the experts, the volume is doubling every two years. The ‘digital haystack’ is getting bigger and bigger, making it more and more difficult to look for the needles. This justifiably raises the question of what the different stakeholders are actually looking for in this ever-growing mass of data. It is particularly interesting to discover who exactly has access to which data sets, what analysis techniques they are using and for what purpose are they doing this. The challenge often lies in the incredible rate at which data is produced, as companies would ideally like to analyse this in real time. This brings us on to the second characteristic: — Velocity Whereas data in the analogue era was generated, captured and analysed at manageable frequencies, the data flows in today’s digital era are being produced around the clock and all over the world, in part because of the multitude of interconnected sensors and the merging of information and communication technologies (ICT). In order to meet the requirements of timecritical business and decision-making processes, it is becoming increasingly important to integrate (or analyse) data flows from both internal and external sources in real time. These are exactly the areas in which many decisionmakers are hoping to leverage huge gains in competitiveness and efficiency. Science, too, is benefiting from real-time data, for example, when identifying pathogens in time for preventive steps to be taken before a disease becomes too widespread. 2 3 7 | May 5, 2014 Two gigabytes of storage space can accommodate around 360 high-quality MP3s, one 90-minute HD film and approximately 150 photos at five-megapixel resolution. Heuer, S. (2013). Kleine Daten, große Wirkung. Big Data einfach auf den Punkt gebracht. (Small data, big impact. Big data in a nutshell.) Landesanstalt für Medien Nordrhein-Westfalen (LfM). Düsseldorf. Current Issues Big data – the untamed force — Variety The final V concerns the variety of different data types and structures, which as far as possible must be processed in a standardised manner and ‘harmonised’ so that they can be properly combined. A basic distinction can be made between structured, semi-structured and unstructured data. Big data and the US election campaign 12 Several state-of-the-art big data analysis tools were used in Barack Obama’s re-election campaign. They provided answers to the questions of how to widen to the impact of the campaign and how to efficiently distinguish people who might potentially vote Democratic from those who were already ‘lost’. Obama’s team used socio-demographic data such as consumer behaviour (information from loyalty programmes, shopping behaviour, online click patterns, etc.) and data from social networks (e.g. Facebook profiles). All this data was correlated with the electoral rolls in order to identify any patterns in people who were not intending to go the polls but if they did would vote for Obama. With the aid of various forecasting instruments (predictive analytics) up to 100 variables were fed into the system in order to predict voting behaviour. Data privacy experts were critical, arguing that many of these practices fell into a regulatory grey area and that individuals’ information privacy rights were being infringed. However, because the data privacy laws in the USA are not as strict as in Germany or Europe, Obama’s campaigners were able to combine and analyse a wide range of data sets relating to US voters. Source: http://www.predictiveanalyticsworld.com/patimes/ericsiegel-explains-how-the-obama-2012-campaign-usedpredictive-analytics-to-influence-voters-2 Worldwide smartphone subscriptions 13 Bn 6 5 4 3 2 1 0 1.3 0.5 0.9 10 11 12 1.9 13 2.6 14 3.3 15 3.9 16 4.5 17 5.1 18 5.6 19 Whereas previously we mostly processed structured information in relational databases, newer data sets, e.g. from social networks, are increasingly structure-free, making conventional databases far less suitable for extracting value. Master customer data is an example of a structured format and includes gender, date of birth and address. Images, videos and audio files are types of unstructured data. E-mails, on the other hand, are categorised as semistructured because the header information contains structured data, i.e. sender, recipient and subject. The bulk of the e-mail is unstructured, however, as the body of the message has no predefined structure. Experts estimate that nowadays only 15 % of data is structured, with around 85 % regarded as 4 unstructured. The current challenge for IT network architects lies in making heterogeneous data formats from different data sources compatible with each other so that they can be stitched together for analysis. Integrating this data, however, is complex. Information ultimately has to be converted into machine-readable data and it can originate from a bewildering array of sources, including images, videos, MP3 files, streaming services, microblogging media, blogs, forums, search engines, click protocols, e-mails, internet telephony, digital correspondence and/or sensors. Converting audio, video and image files into standardised machine-readable data is a particular challenge. The underlying algorithms have to be able to understand and transcribe the language(s), recognise faces and 5 corporate logos and even identify digital content that is protected by copyright. 6 7 In the discussion initiated by Tim Berners-Lee about using the semantic web (Web 3.0) to develop the Internet of Things (ubiquitous computing), the call is put out for data to be tagged with semantic information so that it can be read by other machines and be properly classified. Examples: does a mention of the word ‘Golf’ refer to the sport or the model of car, or could it even be a misspelling of ‘gulf’ as in Gulf of Mexico? How would the e-toothbrush use data from its sensors to convert evidence of plaque and tartar build-ups into a machine-readable language for analysis at your dental practice? Subjective comments made in written, spoken or video content are particularly interesting in this context. PR companies, election campaign agencies and marketing departments all have an interest in recording people’s sentiments and opinions on particular subjects in near real time. In order for these often emotional responses to products, brands or events to be made machinereadable, intelligent analysis tools and algorithms are needed to identify the pertinent messages. Because, technologically speaking, this is a huge challenge and because further research is required, this will undoubtedly become the supreme discipline in big data. Source: Ericsson Mobility Report 2013 4 5 6 7 8 | May 5, 2014 TNS Infratest GmbH – Technology business segment: Quo Vadis Big Data – Herausforderungen – Erfahrungen – Lösungsansätze. (What next for big data – challenges – experiences – approaches.) 2012. Cf.: Heuer, S. (2013). Kleine Daten, große Wirkung. Big Data einfach auf den Punkt gebracht. (Small data, big impact. Big data in a nutshell.) Page 12. Inventor of the world wide web. Whereas the internet offers a means for data to be linked, the semantic web provides a way of linking information at the level of meaning. Data in a semantic web is structured in a format that enables computers to process information based on the meaning of its content. A semantic web would also allow computers to extract knowledge from the mass of global information and, more importantly, to generate new knowledge. Current Issues Big data – the untamed force Experts are now talking about a further V in addition to the usual three characteristics: 8 — Veracity (accuracy, reliability of data) Mainly internal sources 14 % of respondents (n per data point=557-867 out of 1,144), 2012 Transactions 88 Log data 73 Event data 59 E-mails 57 Social media 43 Sensors 42 External data feeds 42 RFID scans or POS data 41 Free-form text 41 Geospatial data 40 Audio data 38 Still images/videos As is so often the case with young technologies, the analysis of the potential of big data is still in its infancy. Several questions remain unanswered. It is possible, of course, to mine the data to identify risks, which can be categorised using theoretical decision-making models and different probabilities. But events 9 may also occur that could never have been predicted, known as black swans. The need to identify uncertainties and to realistically factor them into planning is a further characteristic of data and an ever-present challenge for which not even big data has a silver bullet. Sources/drivers of big data: The digital revolution across all sectors The wealth of potential data sources is seemingly inexhaustible. They can be divided into three rough groups based on their origin. Often the data is 10 generated by a combination of these three groups: 34 0 15 30 45 Data and findings extracted from subjective comments and opinions are, by nature, difficult to predict. It is the same for weather data, seismology measurements and macroeconomic indicators. However, this kind of forecasting data is of critical importance to the complex modelling of business scenarios. But despite the relevance of this data, the uncertainty – analytically speaking – cannot be eliminated through just any adjustment methods. 60 75 90 Sources: IBM Institute for Business Value, Saïd Business School Oxford 11 — Machine-generated data: e.g. sensor or log data , language/audio/video, click statistics, data services. — User-generated data: e.g. social networking, correspondence, publications/patents, images, free text, forms, logs, open data/web content, language/audio/video; — Business data: e.g. master/case data, CRM Smartphone data traffic 15 Petabytes per month 12,000 1,263 10,000 1,585 8,000 6,000 4,934 4,000 2,000 1,803 839 0 11 12 13 14 15 16 17 Latin America North America Asia-Pacific Central and Eastern Europe Western Europe Source: Ericsson 2013 18 19 8 9 11 12 13 | May 5, 2014 data and transaction data. The dominant drivers undoubtedly include the ubiquitous digitisation (digital revolution) of infrastructures in the fields of energy, health, transport, education and public administration as well as the increasing personal and work-related use of mobile, internet-enabled devices. It is now possible, for example, to monitor users’ internet surfing behaviour around the clock and click by click with the help of tracking software or to determine a person’s exact location or 13 patterns of movement using RFID , infrared or wireless technologies such as near field communication (NFC). Film and music files are continuously being downloaded and uploaded or streamed in real time. At the same time, millions of users are interacting on social networks and generating digital or viral content that is also disseminated in real time. All this is leading to an exploding volume of data that is potentially valuable but also places huge demands on the infrastructure and architecture of the IT systems. 10 9 12 Schroeck, M. et al. (2012). Analytics: Big Data in der Praxis. Wie innovative Unternehmen ihre Datenbestände effektiv nutzen. (Analytics: practical uses of big data. How innovative companies are drawing benefit from their data.) Page 4 et seq. Reference to ‘The Black Swan: The Impact of the Highly Improbable’ by Nassim Nicholas Taleb (2007). The basic idea is that unprecedented events are by their very nature impossible to predict, whereas risk management deals with situations and events where probability can be calculated. Cf. Fraunhofer IAIS (2012). Big Data – Vorsprung durch Wissen. (Big Data – Getting ahead through knowledge.) Page 20. A log file or log is an automatically updated file that contains a record of events or processes generated on a computer system. CRM = Customer Relationship Management. Radio frequency identification (RFID) enables objects, animals and people to be automatically identified and tracked and provides a highly efficient means of gathering data. Current Issues Big data – the untamed force Global drivers for big data 16 Survey of German companies > 500 employees, multiple answers permitted [n=100], %, 2012 Mobile internet use 59 Cloud computing 53 IP-based communication 47 Social media 44 Video streaming/media distribution Digitisation of business models Collaboration (file sharing, etc.) Machine to machine (sensors, logistics, …) Online gaming/ entertainment 35 34 31 It appears to be the case, however, that social networks do not generate the biggest data sets. For example, the sum of all messages posted on a microblogging platform (e.g. tweets) about specific subjects is not nearly as extensive as the data sets produced by industry and/or academia. Examples of really large data sets include geophysical measurements taken on oil platforms or from seismographs and weather stations, and complex calculations of scenarios in the natural sciences or engineering disciplines. The globalised financial sector also produces a huge amount of analysable data. Every second, its digital trading systems process millions of transactions that can be (and are) analysed in real time. As the main driver of the big data movement, the digital revolution offers enormous scope for experimentation and lucrative applications, particularly for 14 the key technologies of biometrics and sensors . These technologies act almost like a turbocharger for big data, because sensors in particular can be incorporated into a growing number of everyday (mass-market) objects and machines. People are getting closer to achieving their goal of making everything around them measurable. We will be taking a closer look at this dynamic driver in the following chapter: 3. Sensors and biometrics take over the mass market 24 13 Other 1 0 20 40 60 Source: A study by the Experton Group AG for BT GmbH & Co. oHG Nowadays, every smartphone/tablet or other mobile, internet-enabled device contains a host of different sensors, which goes a long way to explaining why sensors and biometric identification technology are taking over the mass market. Thanks to these devices, the use of sensors and the tracking of individual body metrics are becoming part of daily life. Mobile-phone networks that are becoming faster and more stable are enabling huge numbers of people to be permanently connected to the internet through their mobile devices, which gives additional impetus to the connectivity between ‘man and machine’ and between ‘machine and machine’. The inbuilt sensors give internet services lots of scope to develop new functions, often in the form of apps, and are used millions of times a day. Temperature, distance, length and depth, light, time, speed, body metrics and weight are just some of the things that can be measured. Nowadays, no field of modern technology involved in measuring, monitoring, automation or control engineering can get by without sensors. This development is being further driven by cutting-edge microelectronics and the miniaturisation of sensors, making them increasingly smaller, cheaper and more powerful. Mobile devices offer a wealth of sensors … … and increasingly also biometric identification technology Mobile devices contain everything from motion sensors, light sensors, altitude meters and digital compasses to fingerprint sensors, gait sensors, voice recognition functions and proximity sensors. The last of these, for example, ensure that your device’s touchscreen is automatically deactivated when held against your ear. Biometric identification technology is another growing area of experimentation in the mobile devices sector. Using a motion sensor or microphone, it’s possible to measure how you walk, run, speak or drive. Special algorithms are then used to create gait, voice or driving profiles that are stored on the particular mobile device, enabling, for example, smartphones to identify 15 their owners from the way in which they walk, talk or drive. Sensor technology and biometrics therefore play a vital role in gathering information from our environment and surroundings, from medical processes 14 15 10 | May 5, 2014 Sensor technology refers to the development and application of sensors for measuring and monitoring changes in environmental, biological or technological systems. This technology, which identifies people by the unique way in which they move, would allow smartphone owners to authenticate themselves online relatively easily for a wide range of transactions or activities without having to enter passwords or codes. See also Dapp, T. (2012). Homo biometricus: Biometric recognition systems and mobile internet services. Page 7 et seq. Current Issues Big data – the untamed force Worldwide mobile device subscriptions 17 Million 10,000 8,000 2,883 3,587 6,000 4,000 3,931 2,000 0 All the individual data points are placing new and above all growing demands on existing IT network architectures, whose job it is to process the rising volumes of data quickly, flexibly and at a low cost. A lucrative future is in store for those decision-makers who not only collect and store this data but also convert it into machine-readable structures, identify patterns and draw the right conclusions. 5,602 4,612 855 227 11 12 and procedures, from robotics and automotive technology and from home appliances and office equipment. The constantly growing flood of data that results from this ‘smart connection’ of objects will go beyond what many people ever thought possible and bring with it new possibilities for connectedness. The volumes of data generated in this way can only be processed using specialised hardware and software. Both for industry and academia, these technologies offer lots of scope for experimentation, research and development, coupled with excellent growth prospects. 766 521 13 14 15 16 Mobile PCs/routers/tablets Smartphones Trad. mobile phones 17 18 19 In order to achieve the desired increases in efficiency, however, all data sources must first be a) brought into a machine-readable format and b) collated and combined. Only in this way can the various types of data be correlated with, say, locational data from smartphones or RFID technologies, transaction data from online trading or profile data from social networks. It will then be possible to filter out unexpected, hidden and surprising causal connections that will create value for the respective stakeholders. Soource: Ericsson 2013 The convergence of information and communication technologies as a basis for unexpected innovations In the future, it will be increasingly common for everyday objects to be connected via the internet so that they can communicate with each other and offer an even higher level of convenience than before. These technologies are being incorporated as flexibly as possible into people’s environments and daily lives, and, as a cutting-edge interface between man and machine, are opening up the possibility for new (digital) business models. The merging of information and communication technologies goes hand in hand with the discussion surrounding the ‘Internet of Things’. The vision behind this term is of the internet as a bridge between the real and the virtual world, and as a means of making all kinds of everyday objects from people’s work and home lives ‘smart’ (smart home, smart city, smart glass, smart grid, smart car, smart everything). In addition to smartphones and tablets, this could include objects ranging from home electronics, vehicles, traffic lights, parking meters, sports equipment and clothing to entire buildings, delivery containers and industrial machines (known as ‘industry 4.0’). Intelligent objects are programmable, equipped with sensors and able to store and communicate information autonomously and often wirelessly using the web. This connectedness can be used to trigger a range of actions or to have one object controlling another. These web-enabled objects could also function as physical access points to various internet services, which would meet the increasing desire of many people for digital mobility. This relatively young area of research holds a great deal of potential that can be unlocked in all kinds of unexpected ways, as the following examples show: A washing machine that can communicate with garments In the home, for example, you might soon be able to use your smartphone to remotely control web-enabled blinds, networked radiators or smart airconditioning systems. There will be washing machines that wait until electricity is at its cheapest before starting their wash cycle. Inbuilt thermostats will be able to make micro-adjustments to the temperature of the water to get the best out of 11 | May 5, 2014 Current Issues Big data – the untamed force the detergent’s enzymes. Intelligent garments will be able to tell the washing machine that they should be washed only at 40 degrees and on a gentle cycle and that they are blue. Greater independence in later years Sensors and cognitive assistance systems could soon be helping older people to live independently and for as long as possible in their own homes, by offering active support in the areas of continuing mobility, physical aid, communication, 16 medical monitoring and medical treatment. It’s possible, for example, to install motion sensors into the floor that automatically trigger an alarm when a person falls. Round-the-clock monitoring systems could keep track of data that is relevant to an older person’s state of health, such as blood pressure, heart rate, blood oxygen levels and stress levels. Miniature sensors attached to a bathroom mirror could be used to remind people suffering from dementia to take their medication or brush their teeth. With the aid of sensors, it might be possible to measure if a person driving a car breaks out in a sweat or shows other symptoms that might indicate the onset of a heart attack or other incapacitating condition. Technologies could then be developed that respond to these patterns, for example, by slowing down the vehicle, safely bringing it to a stop on the side of the road (using autopilot) and automatically alerting the emergency services. Potential for connectedness in industry There is a similar potential for connectedness in the production chains and machinery of a range of industries. In the automotive sector, for example, companies are working on ‘networked driving’, in which the vehicle’s electronic systems are hooked up to the internet. The mobility requirements of car owners are changing. People are starting to expect their vehicles to offer a similar level of comfort and functionality as their web-enabled mobile devices. Soon it won’t just be the quality, safety and reliable performance of the vehicle that counts; people will want their digital lives to continue when they get behind the wheel. At the forefront of this development are several trends, including the vision of automated driving, which looks set to become a reality in the medium term, and the long-term move towards more fuel efficient, low-emission vehicles. As the industrial and digital worlds converge, the market will be taken over by new business models that focus on high-tech and primarily personalised services for the driver. These include cloud-based voice recognition, real-time sharing of traffic flow data and proactive, part-automated driving that uses data from online sources and navigation systems to automatically apply the brakes, accelerate and even steer. Everyday objects with their own web address Warnings from data privacy experts As more and more everyday objects get hooked up to the internet for remote contact and control, we will need to look beyond the unilateral communication capabilities of RFID if we are to fully exploit the convergence opportunities. This is why the objects will be given their own IP address so that they can communicate with other smart objects and network nodes. Because of the vast number of IP addresses that this would require, providers will have to use the 17 newer 128-bit IPv6 addresses that are gradually replacing the IPv4 versions. However, this also means that further efforts will have to be made to improve infrastructure and expand networks. Despite all the potential benefits, data privacy experts are very critical in their assessment of the ‘Internet of Things’, because in many cases it may be 16 17 12 | May 5, 2014 Cf.: Beck, S., M. Grzegorzek et. al. (2013). Mit Robotern gegen den Pflegenotstand. (Using robots to alleviate the care crisis). http://en.wikipedia.org/wiki/IPv6. Current Issues Big data – the untamed force possible for sensors to send data to various interested parties for analysis without the individual’s consent. Even today, the right of individuals to determine what happens to their information is already being violated. 4. On the role of digital ecosystems in the big data world Big data is discussed and interpreted in different ways depending on how it is used. It does scant justice to the term if we reduce it to commercial interests or the opportunities presented by social networking applications in digital ecosystems such as Amazon, Twitter, Facebook, Apple and Google. The positive side to big data After all, big data also has potential to solve or at least mitigate some of the problems that face society, including in the areas of public health, climate change and international cybercrime. It could also herald a sea change in scientific methods. Whereas researchers have up to now been using preconceived models, such as sample analyses, to produce their findings, they might soon also be able to draw on data-driven observations in real time that use a far larger total sample at no great additional cost. The IT security and financial sectors also stand to benefit: new ways are emerging of bringing credit 18 card fraudsters to justice, for example, and of better assessing risks and compliance; manufacturing companies will be able run their production areas with greater efficiency; logistics chains will be optimised and sensitive infrastructures made more secure. Findings from medical research will be combined with findings from other fields of research as a means of revealing any hidden correlations and/or treating rare diseases. In addition, costs will be continuously brought down, the basis for making decisions will be optimised and it will be possible to more precisely analyse consumer behaviour and customer satisfaction and/or to create products and services targeted at specific types of people. The dark side to big data Broad interpretation of what big data means 18 % of respondents* (n=1,144), 2012 A greater scope of information New kinds of data and analysis 18 16 Real-time information Big data also has a dark side. In the big data debate, people often bring up 19 Orwellian ‘Big Brother’ methods as imagined in Nineteen Eighty-Four i.e. the excessive use of surveillance. In German we talk of the gläserner Mensch, the ‘see-through person’. Unlike in Orwell’s world, however, the dominant fear in the big data discussion is not that of a dictatorial system. Indeed people will initially participate voluntarily, while in the background their personal information is shared, traded and commoditised by businesses, scientific institutions, NGOs and intelligence agencies. This may sound far-fetched, but there is an understandable concern that citizens and consumers are increasingly becoming a plaything for various stakeholders or are unwittingly becoming such ‘seethrough people’. In the light of recent revelations about the activities of the intelligence services, it would seem to some observers as if Orwell’s novel is closer to reality than ever before. 15 Data influx from new technologies 13 Modern media 13 Large volumes of data The objectives pursued by digital ecosystems are more wide-ranging than many assume 10 The latest buzzword 8 Data from social media 7 0 5 10 15 Sources: IBM Institute for Business Value, Saïd Business School Oxford 20 On the one hand we have the many useful products and services that help people perform everyday tasks and routine actions, including the aforementioned examples of sensors benefiting the care sector. On the other hand, however, digital ecosystems are accused of engaging in business practices that do not exclusively serve the primary benefit of the product or service for the customer. Once data has been collected, it can be monetised or 18 19 13 | May 5, 2014 Using a multitude of data relevant to credit card crime, the patterns of behaviour common to credit card fraudsters are filtered out and compared with current cases. The aim is to flag up credit card transactions that have a high likelihood of being fraudulent. George Orwell’s novel (1949) is often cited in discussions about misuse of state surveillance measures. Current Issues Big data – the untamed force used again and again for sometimes dubious purposes – without the person concerned ever being informed. Digital ecosystems are giving us ‘smart everything’ DX Graphic: Oliver Ullmann. Deutsche Bank Research. Source: Dapp, T. (2014). Big Data – The untamed force. Deutsche Bank Research. Frankfurt am Main. A host of monetisation strategies If, for example, a company like Google began to offer web-enabled glasses or to conduct research into the development of self-driving vehicles, then it’s possible that there might be more to this than meets the eye. Both products could enable virtually limitless access to personal and sometimes intimate data. A single company would gain insights into people’s day-to-day lives, their activities, tendencies and identities. Web-enabled glasses could reveal, for example, what people are looking at, for how long and how often their gaze falls on objects, advertising and other people. This would give the company the opportunity to use big data analysis tools in real time as a means of identifying meaningful patterns from the aggregated data, which could then be monetised or used to send people personalised advertising messages. The desire shown by the major internet companies to re-use this additional information again and again for different purposes is growing, while the social consequences are, as yet, largely unexplored. Automated driving: once a vision, now a reality The owners of self-driving cars may also unwittingly reveal more about 20 themselves than they intend. Throughout the journey and even before and 20 14 | May 5, 2014 Morozov, E. (2013). Big Data. Warum man das Silicon Valley hassen darf. (Why you’re allowed to hate Silicon Valley.) Frankfurter Allgemeine Zeitung, 10 November 2013. Current Issues Big data – the untamed force afterwards, a whole host of information could be measured and gathered for subsequent analysis. After a while, of course, patterns will be observed that will provide answers to the following questions: where do people go when they are driving? At what times of day and night do they use their vehicles? Where do they work and where do they like to go in their free time? How long do they spend in the car on average, what music do they listen to, what temperature do they like the interior to be at, who are they calling or otherwise communicating with, and what digital content are they consuming during the journey? Are they keeping to the speed limit, are they perhaps eating, drinking or smoking during the journey? A company could potentially monetise every activity described here. It could earn this money either by offering tailored products and services from a single source or by selling the personal data on to stakeholders from a range of sectors. Most importantly of all, it would make it possible to track a person wherever they go, thereby bridging the crucial gap between the virtual and the real world. Revenue does not necessarily have to flow from the end consumer of a particular product or service directly to the provider. Instead, this provider can maximise its income by making the data that it collects during the course of the transaction available to third parties. For start-ups or niche providers it will be possible to latch onto ecosystems with high market shares in order to offer complementary products and services that make the original offering even more attractive for the customer. Takeovers by digital ecosystems are also 21 conceivable, of course. Individual companies would expand their market position and there would be less competition. Silicon Valley as a global driver of innovation The cross-sector onward march of … The digital footprints of internet users are becoming clearer, more precise and easier to track. On platforms provided by popular digital ecosystems such as Google, Twitter, Facebook, LinkedIn, Amazon and Apple, people think nothing of revealing personal and often intimate information such as their relationship status, hobbies, travel habits, consumer habits, favourite music and film genres and photographs, and they’ll ‘like’ virtually anything that their fingers or cursors come across. Any short message that is followed or created on microblogging services indicates an individual preference. It is relatively easy to create a profile of an individual’s political and social views from a list of all their tweets. On Facebook or similar social networks it is relatively easy to find out where people spend their time and what their interests, hobbies and even sexual preferences are simply by looking at what they post and whom they are friends with. … the online ecoystems 22 These days, it’s virtually impossible to avoid certain digital ecosystems, regardless of whether you’re using the internet privately or for work. Practical and popular online services function as one-stop-shops for all possible aspects of life and work and include search engines, operating systems for various end devices, e-mail services, navigation, cloud, streaming and wiki services and proprietary marketplaces for digital goods (Apple Store, Playstore, etc.). People’s search queries can be collated and analysed to create profiles even if they are not signed in to the search engine or are not using an e-mail account. The idea is to digitally organise as much information (e.g. personal data) as possible. However, the data is much more useful if it can be assigned to a particular account. 21 22 15 | May 5, 2014 See, for example http://www.spiegel.de/wirtschaft/unternehmen/google-kauft-nest-labs-fuer-3-2milliarden-dollar-a-943362.html. Cf.: Bahr, F. et al. (2012). Schönes neues Internet? Chancen und Risiken für Innovation in digitalen Ökosystemen. (Great new internet? Opportunities and risks for innovation in digital ecosystems.) Page 3 et seq. Current Issues Big data – the untamed force Should you follow the advice to be economical with the data you divulge, you’ll find this is only a partial solution – mainly because even ‘non-participation’ in the digital world leaves behind a trail that can enable conclusions to be drawn about individual people and their behaviour (e.g. by means of de-anonymisation). Anonymising personal data before providing it to third parties does not go far enough to protect privacy, because even individual searches (combined with additional data) can be used to precisely identify the originator of the data – even if their user name or IP address has been deleted, as was shown during 23 the AOL search data leak. The web as a playground for fenced-in digital ecosystems The lock-in effect Every minute on the internet … 19 100 hours of video material are uploaded to YouTube 695,000,000 Google searches are made 3,300,000 posts are shared on Facebook 347,000 tweets are sent using Twitter 48,000 Apple apps are downloaded 38,200 photos are shared on Instagram Sources: YouTube, Google, Facebook, US Securities and Exchange Commission, Apple, Instagram Google and other internet companies such as Apple, Facebook and Amazon dazzle consumers by offering a whole host of modern, social and often free web-based services. These ecosystems have literally hundreds of millions of loyal customers and enough liquid funds to continuously drive innovation. They heighten their appeal by ensuring a high availability of additional online services and apps within their fenced-in playgrounds. New, bolt-on services appear at regular intervals thanks to strategic alliances with third-party or niche providers or through takeovers. The innovation rate of these technology-driven internet companies is remarkable and is without parallel in both Germany and Europe. It is also evident that the major online platforms are increasingly extending their feelers beyond their core area of business (which is in any case growing), and in doing so are shaking up existing markets and putting established companies under pressure. Amazon, for example, is investing in its own mobile payment systems as well as its own TV productions and cloud services for premium 24 customers , while Google is investing heavily in home appliances and robotics 25 and has even set up a new division aimed at developing humanoid robots. Digital ecosystems are and remain popular and accepted platforms. People are often ‘locked-in’ to their services, and switching can prove relatively expensive. These walled-garden strategies are giving the digital ecosystems increasing scope to monetise the individual attention and personal data of their users or to 26 use this data for other purposes. The ambivalent behaviour of the internet user Despite all the justified criticism of digital ecosystems, the reality remains that many internet users have (so far) been willing to give up some degree of control over their personal data to the operators of online platforms in order to benefit from their one-stop-shop services. The pull of free, convenient services is obviously so great that huge numbers of people are willing to divulge personal and intimate data without a second thought, are allowing this to be passed on to third parties, are accepting personalised advertising messages and are allowing biometric scans of their voice and face to be taken and quietly stored. Manipulation and lack of transparency The users of such platforms have little control over the security of the system being offered, over potential access to their personal data, and over the security, use and deletion of their data. Despite this, the vast majority of people relinquish control of their data to the major internet platforms and assume that the operators will provide the necessary security and will protect against misuse 23 24 25 26 16 | May 5, 2014 http://www.nytimes.com/2006/08/09/technology/09aol.html?_r=0. Example: Amazon enters the market for mobile payment solutions: http://www.businessinsider.com/report-amazon-has-bought-square-competitor-gopago-2013-12. http://www.zeit.de/digital/2013-12/google-roboterfirma and http://www.zeit.de/wirtschaft/unternehmen/2013-12/google-roboter-android-erfinder-andy-rubin. Cf.: Dapp, T. (2013). The future of (mobile) payments: New (online) players competing with banks. Page 21 et seq. Current Issues Big data – the untamed force of data. There is a distinct ambivalence to the way in which people behave on digital channels. Critics say the suspicion is being confirmed that it is the internet users themselves who are becoming the commodity to be traded. What do people use their mobile devices for? 20 54% surf the internet 37% spend time on social networks 17% shop online 21% share documents, videos, etc. 26% watch videos and listen to music 16% make mobile payments n=4,500 from nine selected countries (DK, FR, DE, IT, NL, PL, RU, SE, GB) Source: Norton, 2012 The imbalance in data sovereignty For many companies this means first and foremost a lucrative line of business. Most consumers will continue to use the wide variety of personalised services despite possible concerns about data protection. Some will gradually come to the realisation that they are being manipulated and made ‘transparent’, or will even feel robbed of their data sovereignty. It remains to be seen whether, in the medium- and long term, people will continue to unquestioningly pay the relatively high price of these individualised services (comprehensive, personal, digital profile) or whether they will try to curb their demand for services and products or switch to alternatives. Because of the current situation regarding competition in the market, there is a justifiable question as to whether, in the medium term, serious alternatives to certain online services will or even could become established on the market and whether the speed at which modern technologies are adopted will decline because of the practices. Data protection experts are unanimous in their concern about individual internet companies sharing personal data that is often private and intimate in nature. People are beginning to question whether the fenced-in playgrounds of the digital ecosystems are controlling innovation, communication and information flows and undermining data protection regulations and user autonomy. Moreover, data protection experts are noting that there is a lack of transparency 27 about the ways in which user activities are being monetised and misused. 5. The economic value of data We have a ‘digital twin’ “Many do not realise, or simply do not want to know that they are complicit in the creation of the virtual twin to their real life self – their alter ego who reveals, or could reveal, both their strengths and weaknesses, who could disclose their failures or deficiencies, or who could even divulge sensitive information about illnesses. Who makes the individual more transparent, readily analysed and easily manipulated by agencies, politics, commerce and the labour market” [German Federal President Joachim Gauck, 3 October 2013, Stuttgart] The combination of different data sets promises new insights Individual data sets held by individual companies and viewed in isolation are only of limited use when it comes to analysing big data. Only once several data points, some from different sources, are merged does it become possible to extract certain patterns. Very few companies offer such a diverse range of products and services that they can accumulate a sufficiently broad variety of 28 customer information under one roof. If this information contains enough personal data, it can be used to create comprehensive and relatively detailed profiles of people. The majority of the various data being stored, archived and analysed originates from digital advertising, information, transaction and other web channels. It is relatively easy to collect information about the respective internet provider, IP or e-mail address and the search engine used. With every visit to the Amazon platform, for example, this information is stored and analysed, in combination with each person’s historical click behaviour and consumption habits. This is how Amazon can offer its customers personalised purchase suggestions every time they log in, even from different IP addresses, and this is also how personalisation on the internet works in general. The growing data collections held by the stakeholders makes it relatively easy to link the real and the virtual world, making even physical addresses and 27 28 17 | May 5, 2014 See chapter 6 (Walled-garden strategies) of Dapp, T. (2013). The future of (mobile) payments. Page 21 et seq. Google is very much the exception here, as the company is able to combine its own data, thanks to its reach and its extraordinarily high data volumes. Current Issues Big data – the untamed force telephone numbers of individuals fairly simple to identify. On top of that it is possible to make further deductions, e.g. about rents or house prices in a particular residential area, about employers and therefore about salary (at least within the general salary ranges for the industry or sector) and about disposable income. What would happen if personal data was also linked to other information from social networks, data on consumption habits, location data (GPS tracking), data on relationship status and income? The resulting data correlations would in turn allow new, relevant conclusions to be made about personal preferences and characteristics. How digital footprints become marketable and sought-after profiles The high price of personal data Imagine the following scenario: due to some data error, or thanks to a whistleblower, you happen to come into possession of a data set about yourself. Your profile contains quite detailed information originating from a company specialising in data collection or a ‘friendly’ intelligence service. In your profile you read the following about yourself: Big Brother is watching you! “X is male, age 43, an engineer with a PhD, married to V, female, age 37; X lives with person V in a detached house in Frankfurt and has a cat; X subscribes to ‘Ideal Home’ magazine and also has a digital subscription to the magazine ‘The Engineer’, which he downloads directly to his brand A mobile device; X regularly buys technical literature, clothes and electronics from online retailer A; X is interested in various environmental issues and supports a number of international NGOs by standing order; X interacts with a number of social networks several times a day on mobile and stationary devices, and maintains an above-average amount of contact with the following persons: H (male, age 41), S (female, age 29), and K (female, age 35); X consumes video-on-demand services and surfs the internet for one to two hours a day on average from 10 pm onwards; X holds several bank accounts with the German banks D and P, as well as one with Swiss bank U; X prefers digital, web-based payment methods; X holds two credit cards from providers M and A, credit card M is also used by person V (wife), transactions for the current month for credit card A: Paris travel guide EUR 9.95, flights to Paris for two people EUR 379.00, hotel in Paris EUR 440.00, restaurant in Paris EUR 90.00, jewellery in Paris EUR 399.00, transactions for the current month for joint credit card M: no transactions; X regularly buys sports goods for outdoor activities; X is increasingly soliciting e-mail offers from various car dealers (i.e. is in the process of buying/leasing/renting a vehicle); X is comparing prices of electricity providers (may be considering switching); X always books an annual winter holiday in Austria online and hires his ski equipment online in advance from two providers (otherwise tends towards long-haul trips with person V and city breaks, the last with person S (X is very probably having an affair with person 29 S) ; X obtains prescription medication several times a quarter (X suffers from allergies and high blood pressure), X regularly visits different forums and participates in discussions about rare medical conditions (X uses the anonymous avatars ‘snoopy’ or ‘curious_1970’ for this). X holds car insurance, legal insurance and home contents insurance from company A; In 2012 X obtained legal services from company A after being charged with a traffic offence, the case is ongoing. Additional real-time information: person X is at this very moment located at Infidelity Street 7. This is the residence of person S.” Fear of data theft 21 Survey, in % 70 65 61 65 55 60 45 50 41 40 30 20 10 13 4 5 11 7 6 0 DE AT CH SE 2012 UK 2013 DE (n=571); AT (n=586); CH (n=476); SE (n=346); UK (n=435); US (n=409); Multiple answers permitted Source: eGovernment MONITOR 2013 US Given the range of digital footprints that we leave on the internet nowadays (knowingly and unknowingly), and the data analysis tools required, this type of personal profile is definitely realistic as well as technologically feasible. This 29 18 | May 5, 2014 At this point I refer you back to the previously mentioned question of whether, based on individual transactions, credit card providers can predict which of their customers may be about to face a marital crisis. The answer is: ‘Yes, they can’. Current Issues Big data – the untamed force What is personal data? 22 % of respondents, EU-27, 2011 (n=26,574) Financial information 75 Health-related data 74 ID number 73 Fingerprint 64 Address 57 Mobile phone no. 53 Photos 48 Name 46 Employment history 30 Identity of friends 30 Opinions/attitudes 27 Nationality 26 Hobbies, etc. 25 Websites visited 25 0 20 40 60 80 Source: European Commission enables stakeholders to create cross-context timelines and digital profiles about locations, social relationships, consumption habits and media usage, health, 3031 income, employment, etc. Each individual piece of information in this fictional profile has an economic value, with profitable connection and monetisation points for a range of stakeholders from a business (e.g. insurance companies, manufacturers of consumer goods, various service providers) academic (e.g. neuroscience, sociology, behavioural science) or political background (e.g. tax authorities, public administrators, intelligence services). On the internet, people become the subject of mass surveillance, so-called ‘transparent individuals’, by virtue of their digital footprints, without necessarily realising it at many stages of their online activities (e.g. when (de)activating their GPS signal). The sum total of this information is aggregated into individual digital profiles that have a price, based on supply and demand. What might be the monetary value of a single customer profile, which is essentially an immaterial value, and most importantly, how could it be determined? This can be illustrated by a simple calculation: the market capitalisation of Facebook, for example, is currently 32 approx. EUR 70 bn. Facebook is advertising the number of its users as more 33 than 1.2 billion. Based on a simple division, that would make the monetary value of the average Facebook user profile around EUR 58. Making the further assumption that only about two thirds of accounts are actually active, the value increases to around EUR 88. This calculated value could provide the basis for negotiations about the price of individual user profiles with investors or other interested parties. Whether this approximation is actually suitable for calculating a return on investment goes beyond the scope of this study, but it does indicate that the motivation behind big data and the trading of data is primarily based on monetary interests. Personal data becomes a virtual currency on the web A data-driven life People are becoming more and more immersed in a data-driven life. Routine, everyday actions are linked to the latest web-based technologies, making life easier, e.g. through the use of apps (internet services). Every click on an online portal, every voice command to a mobile device and each use of GPS saves time and reduces search costs. These services increase efficiency and convenience in everyday life, and demand for them is undiminished. But these internet services are not really free, in fact they come at a comparatively hefty price. Although it does not cost any money to access many of the popular webbased services, the user is paying by (usually voluntarily) providing individual, personal digital data, a fact that many choose to ignore at the moment they make that decision. It is conceivable that future business models may even offer a monetary incentive to the people who are actually supplying the data, i.e. the internet users themselves. 30 31 32 33 19 | May 5, 2014 Cf.: Weichert, T. (2013). Big Data: Das neue Versprechen der Allwissenheit. (Big Data: The new promise of omniscience.) The terms ‘personalising’, ‘scoring’, ‘profiling’ and ‘tracking’ are used for this. As at 24 January 2014, according to http://www.onvista.de/aktien/Facebook-AktieUS30303M1027. http://allfacebook.de/zahlen_fakten/infografik-10-unglaubliche-zahlen-zur-facebook-nutzung-10jahre-facebook. Current Issues Big data – the untamed force Unprotected data: global problem 23 %, estimated proportion of all data requiring protection India 56 China 52 US 42 Western Europe 41 Worldwide 47 0 20 40 60 Source: IDC Digital Universe Study The utilisation of personal data in particular attracts many interested parties keen to employ it to a variety of different ends, often with a hope of increasing revenues. Such personal data can be linked directly to an individual. It may be, for example, details of bank accounts or credit ratings, credit card information, information about medical diagnoses and health, data on consumption habits and media usage, data from video surveillance or biometric recognition systems, usage data from applications (apps) or social networks, sensor data from mobile devices, location data and e-mails, chat or other internet communication data. Many people are not (yet) aware of the potential danger posed by a number of different stakeholders and algorithms with a range of interests and objectives collecting and storing digital information in order to analyse it at a later date and potentially monetise it. On the internet, people are being targeted by personalised advertising (e.g. advertising banners) in different places and largely unwittingly, and they are tracked online and spied on with the help of 34 small text files (‘cookies’ ). Once data has been stored it may resurface even years later, and in the worst-case scenario cause permanent damage to people’s careers or private relationships. Because it is, of course, possible to manipulate profiles in order to deliberately cause damage to people or discriminate against them, e.g. when they are trying to take out a particular type of insurance, during an application process or through the denial of further services. As long as there are no solutions that allow for the making of informed decisions, for the refusal to provide private, personal data or for their deletion, the development of this particular aspect of big data with regard to data protection seems far from encouraging. Personal data is highly desirable Who controls the much-touted algorithms? These days, big data is being talked up by many stakeholders from politics, business and academia, for different reasons. Given all the lucrative opportunities for growth offered by big data and by the wide range of modern web-based technologies, the inherent risks and problems are often conveniently ignored. Expectations are high, especially for the new, much-touted algorithms to reduce complexities or to create predictability analyses. But in reality, questions concerning underlying interests, power structures, ethics and morality, monitoring and rights and responsibilities often end up taking a back seat. The business model of digital ecosystems is based to a large extent on the ability to monetise data. It does not really make any difference who is accessing this personal data, whether it is Google, Facebook or another internet service provider, the police, tax authorities or statistics offices, health insurers, insurance companies, banks, or even – as in the recent controversy – intelligence services. In principle, the use of and access to non-anonymised data by anyone is subject to the provisions of data protection legislation. 6. Big data and data privacy “IT security is becoming one of the fundamental prerequisites for the preservation of civil rights and liberties. The social opportunities and economic potential offered by digitisation must not be endangered. […] In addition, the subjects of IT security and the defence against industrial espionage should play a special role.” That is what is stated in the recent coalition pact between Germany’s CDU, CSU and SPD parties entitled ‘Deutschlands Zukunft 35 gestalten’ (Shaping Germany’s future). 34 35 20 | May 5, 2014 http://en.wikipedia.org/wiki/HTTP_cookie. https://www.cdu.de/sites/default/files/media/dokumente/koalitionsvertrag.pdf, P. 139. Current Issues Big data – the untamed force Main problems with big data 24 % of respondents (n=82*), 2012 Data protection and security 49 Budgeting/setting priorities 45 Technical challenges of data management 38 Expertise 36 Lack of awareness of big data applications and technologies 35 0 20 40 60 *SMEs and large companies from various sectors Source: Fraunhofer IAIS “I try to divulge as little personal information as possible on the internet” 25 % of respondents 60 51 50 40 36 43 40 35 30 30 An ambitious goal for the current federal government. However, there has not been any concrete action since the controversial revelations about surveillance back in June 2013. In principle, there is data protection legislation at national and European level. Germany has the Grundrecht auf informationelle Selbstbestimmung (fundamental right to self-determination in regard to information) and also the Grundrecht auf Gewährleistung der Vertraulichkeit und Integrität informationstechnischer Systeme (fundamental right to the guarantee of the confidentiality and integrity of information technology systems). These are meant to ensure that personal data is not stored, passed on or analysed outside of its original context and without the permission of the person it relates to. The Federal Constitutional Court passed the first fundamental right to data protection as a result of its 1983 population census decision. More commonly known as the right to information privacy, it states that in principle individuals have the right to decide who holds what information about them, when, and under which circumstances, i.e. who receives what data for which purpose. The provision of personal data should normally take place with the informed consent of the person concerned. It is also subject to the purpose limitation principle, which stipulates that personal data may only be used for the purpose for which it was originally collected. Any infringement of this fundamental right, in particular by government agencies, is only permissible in the public interest and must comply with the principle of proportionality. Furthermore, the concept of data economy for information systems requires that as little data as possible should be collected and processed. “The constitutional jurisdiction also demands technical, organisational and procedural provisions for the protection of the right to information privacy. At the core of the realisation of informational selfdetermination are the rights of the affected person, i.e. the right to access any data stored about oneself and the right – where necessary – to correction, blocking and deletion. The implementation of the information privacy regulations requires independent supervisory bodies.” (Official collection of decisions of the 36 German Constitutional Court (BVerfGE) 65, 1 et seq.) 20 10 No suitable basis: A directive from 1995 0 DE USA BR CN IN KR DE: n=1,213, USA: n=1,218, BR: n=1,215, CN: n=1,207, IN: n=1,211, KR: n=1,214 Source: Münchner Kreis 2013, www.zukunft-ikt.de “Companies should be transparent about how they are using personal data and what they are using it for” 26 % of respondents 70 62 60 50 49 42 40 31 30 28 30 20 10 0 DE USA BR CN IN The fundamental right to the protection of personal data was incorporated into the EU Charter of Fundamental Rights under article 8 in 2009. Under EU law, the right to data privacy must be observed not only by the EU authorities but also by the member states. But reforms are inevitable, as data protection legislation is not being consistently implemented and applied by all member states, and the rather outdated Data Protection Directive (from 1995) cannot keep pace with the speed at which web-based technologies are being adopted or the evolution of the internet in general. In January 2012 the EU Commission submitted a proposal for a general data protection regulation for the EU. It is intended to replace the 1995 Data Protection Directive and has been under discussion by the Council of the European Union and in the European 37 Parliament ever since. The proposed change envisages that in future the scope of the basic regulation will no longer be exclusively linked to the geographical location where the responsible party is based and where the processing of the data takes place, but also to the question of whether the personal data of people in the EU is affected. This so-called ‘market location principle’ would ensure that large US KR DE: n=1,213, USA: n=1,218, BR: n=1,215, CN: n=1,207, IN: n=1,211, KR: n=1,214 Source: Münchner Kreis 2013, www.zukunft-ikt.de 36 37 21 | May 5, 2014 Cf.: Weichert, T. (2013). Big Data: Das neue Versprechen der Allwissenheit. (Big Data: The new promise of omniscience.) For details on this see: Schaar, P. (2013). Big Brother und Big Data – Was heißt eigentlich Datenschutz auf Amerikanisch? (Big Brother and Big Data – How do you say ‘data protection’ in American?). Page 5. Current Issues Big data – the untamed force companies that store data, such as digital ecosystems, would no longer be able 38 to claim that they are not subject to European law. Culture clash It seems that the US has a different attitude towards issues of data protection legislation. But since there are now millions of European citizens using social networks and online stores run by American internet companies on a daily basis this is increasingly leading to a clash of data protection cultures. This provides plenty of potential for conflict. In view of recent revelations about breaches of data protection law it is imperative that the structural and legislative failures and deficiencies in German, European and also international data protection law are remedied as soon as possible. That is the only way to protect privacy and increase people’s trust in the digital world. Greater trust in digital infrastructures would provide an ideal catalyst for innovation and growth, giving a boost to the development of the internet. Proposed solutions need to apply at international level Anonymising and aggregating personal data Some of the big data analysis practices violate fundamental concepts of data protection law, in particular those types of data analysis which relate to specific individuals. This conflict could be defused if the data records were 39 anonymised. The possibility of identifying individuals would have to be ruled out for the period that the data was stored and processed. However, the general danger of re-identification would remain. The more individual characteristics are included in a data record, the greater the risk that with the right additional information it would be possible to draw conclusions about the individual concerned. Another anonymisation measure is data aggregation, i.e. the combination of individual data records into group data sets. Data can be deliberately ‘blurred’ so that analysis will not return exact figures but rather approximate values. That would significantly impede the retrieval of individual 40 information. Demand for greater transparency In addition, measures to ensure transparency in the use and analysis of data records might counteract the lack of trust among internet users and minimise data protection violations. This transparency should cover every step in the analysis process, i.e. data collection, the amalgamation with other data sets, the analysis itself and the subsequent use of the results. If the different stakeholders made their analysis practices more transparent, either through a regulative framework or self-imposed measures, people would be able to make an informed decision about whether they want to give consent for their data to be passed on and/or analysed. This would mitigate the current ‘black box’ character of big data. International algorithm agreements offer standardisation and certification Trade between different countries is normally covered by internationally negotiated agreements. Their aim is to promote world trade and the global economy. Similar to the way in which these trade agreements regulate the international movement of goods, an alliance of countries could ratify international algorithm agreements that would harmonise the trade in personal data and make it subject to certification, for example by external algorithm 41 experts . This could standardise analysis methods and make them transparent. Such algorithm agreements would cover not only the relationship between citizen and state (with regard to the practices of intelligence services) but also the digital ecosystems that operate at international level. 38 39 40 41 22 | May 5, 2014 Cf.: Schaar, P. (2013). Page 6. Cf.: Weichert, T. (2013). Big Data: Das neue Versprechen der Allwissenheit. (Big Data: The new promise of omniscience.) Cf.: Weichert, T. (2013). Big Data – eine Herausforderung für den Datenschutz. (Big Data – a challenge for data protection.) Cf.: Mayer-Schönberger, V. and Cukier, K. (2013). Current Issues Big data – the untamed force Based on trust The change in data volumes has led to a change in the nature of data processing, posing an enormous challenge for the protection of privacy. At the moment, we can only guess at the extent and potential of big data. This makes it difficult to cover all eventualities with legal provisions. The potential will develop fully when the people involved feel able to trust web-based technologies, i.e. their privacy is not violated and their fundamental rights are respected and safeguarded. Users’ trust in digital channels has already been dented Since the documents from former intelligence analyst Edward Snowden were published in June 2013, the discussion about big data has almost always included the surveillance practices of intelligence agencies. Critics believe that anything which is technically feasible in order to obtain personal data is in fact being done, irrespective of any restrictions imposed by (national) data protection laws. “If you’re looking for the needle in the haystack, you have to have the entire haystack first” [Deputy Attorney General James Cole of the US Department of Justice] National data protection regulations can be bypassed The problem is that data protection legislation that applies in Germany can be bypassed using technology. For example, the bulk of German telecommunication traffic is routed through foreign countries (primarily the US), because domestic server capacities appear to be overwhelmed by the sheer volume of data. There are reports that data is being accessed, stored and 42 processed, without officially violating German data protection laws. To what extent large parts of digital communications were deliberately diverted via foreign servers, and whether German telecommunication companies are even able to resist the demands for intelligence services is likely to remain a well-kept secret, like so many details of the surveillance scandal. Surveillance practices could endanger innovation Economic damage should not be underestimated No path$$SheetNameNo sheetname$$TextTableTextTable Recent controversial surveillance practices could have far-reaching economic consequences that should not be underestimated. The economic danger lies primarily in the fact that in the medium to long term people might adapt their media usage and consumption behaviour, as well as reducing the speed at which they adopt web-based technologies due to diminishing trust and increasing insecurity. That could lead to a slowdown in the development of webbased technologies, in particular for digital ecosystems, but also for many niche providers and start-ups in the ICT sector, who are already increasingly having to promise their customers secure IT infrastructures and operating systems and to plug potential security gaps. Secure IT architecture is increasingly important Even if many people do not feel directly threatened by what they have learned about the interception practices of the intelligence services, the constant media coverage could lead to a certain amount of underlying insecurity. The more people feel like they are under constant surveillance on the internet, the more this has an effect on individual development, freedom, creativity and ultimately also on the innovation and competitiveness of an entire economy. 100% data security is, of course, an illusion, and will remain so. But in future the security of IT infrastructures will become more important for users. Companies that are able to offer reliably secure internet services and technologies should be able to benefit from this. 42 23 | May 5, 2014 http://www.heise.de/security/meldung/Raetselhafte-Entfuehrungen-im-Internet-2053503.html. Current Issues Big data – the untamed force 7. (Big) data in practice: There is no ideal solution Big data analysis tools 27 % of respondents (n per data point=508-870 out of 1,144), 2012 Query and reporting 91 Data mining 77 Data visualisation 71 Predictive modelling 67 Optimisation 65 Simulation The stakeholders would do well to recognise the abundance of information and data streams at their disposal as a potential growth area, and learn to channel it effectively. To this end they require an adaptable digitisation strategy. For example, new IT systems that are able to more quickly detect trends and produce forecasts could be integrated into existing IT architectures, e.g. to provide analyses and forecasts relating to the behavioural patterns of customers, business partners and competitors. The ability to design more efficient processes and structures, optimise IT systems, achieve potential synergies and thereby reduce costs are also benefits that should not be overlooked. 56 Natural language text 52 Geospatial analytics 43 Streaming analytics 35 Video analytics 26 Voice analytics 25 0 20 40 60 80 100 Sources: IBM Institute for Business Value, Saïd Business School Oxford Which platform components are being used? 28 % of respondents (n per data point=297-351 out of 1,144), 2012 Information integration 65 Scalable storage infrastructure High-capacity data warehouse Security and governance Scripting and development tools Column-oriented databases Complex event processing Faced with the potential uses and above all the commercialisation opportunities provided by the explosion of data and the information derived from this, stakeholders need to decide whether it is worth their while to extract and analyse the data that is – in principle – available to them, and whether they should be investing in big data projects generally. The answer is a definite ‘yes’. Just as providers are now no longer able to ignore discussions and reviews of their products and services by their (potential) customers on social networks or 43 forums , they are also well advised to utilise existing internal and external data sets. Otherwise companies stand to lose valuable competitive advantages due to a lack of relevant informational edge. 64 In order to remain competitive, the stakeholders need to learn to combine their growing data pool with the latest management methods as a means of developing suitable business strategies and appropriately adapted products and services. Even if some protagonists may initially be unable to achieve a big breakthrough, many business processes across a range of industries have potential for greater efficiency that can be leveraged in small incremental steps. For financial institutions that could mean, for example, that they pay less attention to financial products (e.g. derivatives) and other new, virtual products, but rather concentrate on the banking services that are being created around the financial sector (such as web-based consultancy, information services, forums). In future, the service portfolio of banks is likely to feature more options based on filtered customer information. 59 Potential for German SMEs to catch up 58 54 51 45 Workload optimisation 45 Analytic accelerators 44 Hadoop/MapReduce 42 NoSQL* engines 42 Stream computing 38 -5 10 25 40 55 *NoSQL = Non-relational databases, suitable for dataintensive applications such as the indexing of large document sets Sources: IBM Institute for Business Value, Saïd Business School Oxford 70 According to a survey of innovation potential carried out by the Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), respondents stated that they saw cross-sectoral and long-term potential for the individualisation of services (e.g. daily health checks, individual entertainment 44 programmes on demand). They also wanted more intelligent products (e.g. self-learning and self-regulating houses, self-driving vehicles). At the same time, 95% of survey participants voiced their desire for support in the form of best practice examples or training courses, in order to reduce existing knowledge gaps in the area of big data. At the moment, the great potential offered by the wealth of data available from various sources is being largely ignored by many companies, particularly small and medium-sized ones. That is bound to change in future. When it does, German SMEs will join the ranks of companies facing the enormous challenges of capturing, storing and processing the accruing volumes of data in order to become more efficient and profitable. However, unlike large internet companies, many mid-sized businesses lack the necessary digital strategies and expertise 43 44 24 | May 5, 2014 Cf.: Dapp, T. (2011). The digital society. Page 7 et seq. Cf. Fraunhofer IAIS (2012). Big Data – Vorsprung durch Wissen. (Big Data – Getting ahead through knowledge.) Current Issues Big data – the untamed force that would allow them to integrate modern data mining technologies into their internal and external value creation processes. Those who do not respond quickly enough to this development with appropriate and above all flexible software solutions and extraction tools might miss out on valuable competitive advantages and could even lose market share. But the implementation of the latest big data tools is not without its pitfalls and requires additional resources. Irrespective of the different stakeholders involved, the decision-makers should be able to give general answers to the following questions at the start of the big data process: The four phases of big data adoption 29 1. Educate Focused on knowledge gathering and market observations 24% * 2. Explore Developing strategy and roadmap based on business needs and challenges 47% 3. Engage Piloting big data initiatives to validate value and requirements 22% 4. Execute Deployed two or more big data initiatives, and continuing to apply advanced analytics — What are our objectives? For example, increased efficiency and/or revenues, optimisation of IT network and system architecture, promoting open data processes (e.g. implementing external knowledge bases, preparing and publishing data), creating customer profiles, optimising scenario and/or model calculations — Which data is relevant for the analysis? 6% * Percent of all respondents, 2012 (n=1,061, percentage does not equal 100% due to rounding). Sources: IBM Institute for Business Value, Saïd Business School Oxford — Which data is already available; which additional data could be aggregated, either internally or externally? — Which analysis technologies and techniques are required? — Which additional resources are required and who has the authority to decide on them? Expertise (internal/external technical and management skills), internal/external hardware/software (analysis tools, data warehousing, storage capacity), time, financial resources (internal/external financing), etc. To highlight the potential and the wide scope of possible applications for big data, the following sections are going to give a brief overview of some successful big data projects from business, academia and the public sector: A big data project in the health sector Up to now, health authorities have had to rely on the number of reported cases when analysing flu epidemics. But since many people tend not to go to the doctor at the first sign of illness, and the official notification process is fairly slow, 45 there is usually a considerable time lag of one to two weeks before authorities can react to outbreaks, which is rather a long time when it comes to public 46 health. Big data provides information advantages A big data project from Google may be able to help. Since the end of 2008 the online tool Google Flu Trends has been available free of charge to users in over 47 25 countries. A data-based early-warning system for flu epidemics, health authorities can use it in addition to traditional methods for monitoring flu 48 outbreaks to enable them to take timely, possibly even preventive action. Google researchers Jeremy Ginsberg and Matthew Mohebbi developed an algorithm that examines the 50 million most common daily search queries for terms relating to flu. Search volume analyses (also conducted with Google tools) indicate a spike in the number of certain search queries at the start of flu epidemics. Data mining reveals a pattern that shows particular seasonal volatilities. The subsequent comparison of the figures for actual cases, provided by the relevant health authorities, with the predictions based on electronic search query figures shows an almost identical picture. Since there are relatively 45 46 47 48 25 | May 5, 2014 Ginsberg, Jeremy et al. (2009). Detecting influenza epidemics using search engine query data. Cf.: Mayer-Schönberger, V. and Cukier, K. (2013). Big Data. Die Revolution, die unser Leben verändern wird. (Big Data. The revolution that will change our life). P. 7 et seq. http://www.google.org/flutrends/intl/de/about/how.html. This model is currently restricted to the analysis of flu epidemics and dengue fever – there are no plans for predicting other epidemics in the near future. Current Issues Big data – the untamed force large regional differences in the spread of influenza, Google uses information about incidents of the disease from regional providers. Overall this gives a fairly realistic picture without a time lag – as opposed to the traditional flu monitoring methods that tend to have inbuilt delays. On this basis it is possible to produce exact estimates for the current flu situation in different regions. This information advantage buys health authorities precious time by allowing them to recognise flu epidemics sooner and possibly even limit their extent. Frequency of flu cases in the United States 30 Historical estimates 120,000 80,000 40,000 0 Submitted data Google flu trends – estimates Sources: U.S. Centers for Disease Control; Google flu trends (http://www.google.org/flutrends) The Google analysis model is reviewed every year and, if necessary, adapted to changes in the electronic search behaviour for health information, e.g. with 49 regard to terminology. The Google predictions for the start of 2013 overestimated the actual illness rates (see chart). Google announced that it was adapting its algorithm to the new search habits. Scientists suspect that media warnings of an impending flu epidemic at the start of 2013 may have led people to ‘google’ flu-related terms on a purely preventive basis, without suffering any of the actual symptoms. That would have distorted the predictions of the Google tool – at least for the US. Suspicions remain Instant flu predictions allow the health sector to be managed more effectively. Google’s analysis tool is an example of the potential for innovation offered by big data, along with a possible economic benefit. But we must not forget that to make these predictions it is necessary to store and evaluate personal data. Google assures us that individual users cannot be identified as the analysis of the search queries is anonymised. Only the big picture is of interest, as that is the only way to arrive at reliable conclusions. But is Google complying with data protection legislation? Although the analysis is based on the amalgamation of billions of different search queries from individual users it still needs to use geographical information based on IP addresses in order to be able to make 50 regional predictions. And data based on IP addresses can easily be deanonymised. In the end, some suspicion always remains. Suspicion that such tools are also an elegant way of justifying the permanent storage and potential monetisation of personal data. 49 50 26 | May 5, 2014 In 2009 the original model had to be adapted early on to enable the best possible analysis of the development of the non-seasonal swine flu pandemic. See: Cook, Samantha et al. (2011). Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic. Internet Protocol address: every device on a network is assigned an address made up of a series of numbers; this allows the accurate addressing and delivery of data packages. Current Issues Big data – the untamed force A big data project in the marketing sector “What scares me about this is that you know more about my customers after three months than I know after 30 years.” [Reaction of Lord MacLaurin, former CEO of Tesco, to the initial results of loyalty card trials] Big data objectives: Customer focus 31 % of respondents, weighted and aggregated, n=1,067, 2012 4 14 49 15 18 Customer-centric outcomes Operational optimisation Risk/financial management New business model Employee collaboration Sources: IBM Institute for Business Value, Saïd Business School Oxford Targeting customers with a personal touch Large (US) retail chains in particular are now commonly using data mining for marketing purposes. Until recently, the relationship between large retailers and individual customers was fairly anonymous. Now ‘one-to-one marketing’ – based on the model of the village store but on a much larger scale – is intended to 51 transform this into a very personal bond between retailer and customer. Due to his detailed knowledge of his clientele, the proprietor of the village store knew exactly which products to recommend to which customers at which particular time in order to make a sale. Supermarket chains are just beginning to employ the concept of this somewhat analogue ‘database’ on a massive scale. To this end they are continuously storing, updating and analysing customer-specific data. This data originates mainly from credit card sales and purchases from their online shops – but loyalty cards (customer or bonus cards) also provide information about people’s preferences. Traditional market research with its personal surveys is increasingly being replaced by data-based real time analysis of shopping habits. To find out which group of customers is buying which product thus requires good programmers and data analysts, as well as adaptive algorithms. With accurate analysis results it is possible to offer the customer the right product at the right time, in the form of vouchers, personalised advertising brochures, promotional e-mails, etc. From the retailers’ perspective, big data applications primarily have the potential to become a lucrative marketing tool. The US retail chain Target set out to examine the shopping habits of female 52 customers who signed up for their baby registries. They found the following shopping patterns: from the start of the second trimester onwards pregnant women increasingly bought unscented lotions. A few weeks later they began to buy special food supplements. The changes in their shopping behaviour over the course of the pregnancy showed such consistent patterns that the retailer was even able to predict due dates with relative accuracy. Based on these findings, Target put together product lists that constituted an early warning system for customer pregnancies. If the shopping habits of a female customer suddenly correspond to these ‘pregnancy patterns’, supermarket chains now tend to target them with specific advertising or vouchers. This personal targeting has become so sophisticated and accurate in its predictions that it can even trigger family crises. In the US a father accused Target of trying to encourage his teenage daughter to become pregnant by targeting her with specific advertising for baby products. As it turned out, the retailer had some inside information. Its analysis of the daughter’s shopping habits matched the pattern displayed by many women in their first months of pregnancy. The daughter really was pregnant. Over the course of their involvement with big data, Walmart, the world’s largest retail chain, have analysed a range of data sets relating to the shopping habits of their customers and discovered some remarkable and, as it turns out, 53 profitable correlations. For example, the snack food Pop-Tarts are bought 54 particularly often if the weather centre has issued a hurricane warning. Walmart also discovered that beer and Pampers are often bought together in 51 52 53 54 27 | May 5, 2014 Bloching, B., et al. (2012). Data Unser. Wie Kundendaten die Wirtschaft revolutionieren. (The Store’s Prayer. How customer data is revolutionising the economy.) Cf.: Mayer-Schönberger, V. and Cukier, K. (2013). http://de.wikipedia.org/wiki/Pop-Tart. http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html?_r=0. Current Issues Big data – the untamed force 55 the evenings. As a result Walmart has started to use big data marketing tools to optimise its shelf space. Branches now position nappies and beer on adjacent shelves, and when a hurricane approaches pallets of Pop-Tarts are put next to the checkouts. A big data project in academia “Torture the data long enough, and it will confess to anything.” [Ronald Coase, economist] 56 The FuturICT Knowledge Accelerator and Crisis-Relief System is a European research project in the field of ICT. This large-scale project, which is supported by a number of international research institutes, was initiated by Dirk Helbing, a professor at ETH Zurich. The philosophy of open innovation adds another dimension Its vision is the development of a knowledge platform based on a ‘super computer’. It is intended to integrate all global data streams into a model of reality that is as accurate as it can possibly be. Computer simulations and data mining are used to minimise or plug any gaps in knowledge of a technical, commercial, social or ecological nature. One particular aim is to uncover interdependencies between very diverse areas. Virtual simulations of future scenarios help to manage these interdependencies and highlight preventive measures that can be taken to mitigate or even avoid looming crises in world events, e.g. financial crises. To make this possible, FuturICT is to be based on three pillars: a planetary nervous system, a living earth simulator and a global participatory platform. The objective of the planetary nervous system is the visualisation of current world events, based primarily on data derived from a global network of sensors. This model of the global society is based on reality mining, i.e. the real-time mining of data streams produced in all spheres of life and work. The greatest challenge will be the constant adaptation to new circumstances, because, as the Greek philosopher Heraclitus once aptly said: “The only constant is change”, i.e. people and their environment are subject to a permanent process of transformation. The second pillar, the living earth simulator, is intended to simulate possible future scenarios, based on the data stored in the planetary nervous system and 57 the participatory platform. This political ‘wind tunnel’ is meant to provide some insight into how the global system would behave as different parameters change. This is where the ‘what happens when’ scenarios are played out. Each scenario is assigned a probability rating which provides the basis for the development of an early warning system. Factors that could (potentially) destabilise the entire system are, of course, of particular interest. The simulation is based on experimentation with different models and their combination (=pluralistic modelling), embedded in an open-source software environment. This is intended to give a wide variety of researchers the opportunity to directly feed in newly developed analysis tools. The idea of the interactive infrastructure carries through into the open participation platforms: these are based on the concept of ‘prosumers’, i.e. consumers who also produce. ‘Prosumers’ are able to view data streams, simulations and applications that are made available on the platform and also to feed information of their own choice into the system. In this complex system, all users, whether they’re individuals, companies or organisations, retain control over their own data and have the opportunity to see the ideas entered by others. 55 56 57 28 | May 5, 2014 http://www.forbes.com/forbes/1998/0406/6107128a.html. www.futurict.eu. Geiselberger, H., Moorstedt T. (Hrsg.) (2013). Big Data: Das neue Versprechen der Allwissenheit. (Big Data: The new promise of omniscience.) Suhrkamp Verlag. Berlin. Current Issues Big data – the untamed force Through this interactive participation, the FuturICT model is given a kind of ‘open innovation’ component that heavily involves external knowledge owners and thus ensures an ongoing process of quality control. The core concepts of FuturICT should therefore be openness, transparency and participation. The financing hurdle Hayek’s criticisms more relevant than ever When and to what extent FuturICT will be implemented, however, is uncertain after it failed to secure EUR 1 bn worth of funding from the 7th EU Framework Programme for Research and Innovation ‘Future Emerging Technologies’ 58 (FET) . Despite the potential benefits to society as a whole, there are doubts about the feasibility of FuturICT. A complex global system to which a multitude of users have permanent and simultaneous access requires vast server capacities. Given the current technological possibilities, it is questionable whether a project of this magnitude would be feasible or financially viable – despite the advances in server capacity. Another hurdle is the absence of a credible theory on human behaviour on which a forecasting model could be based. Critics also argue that the world is more complex and chaotic than any model or combination of models could ever be. What we don’t know is becoming greater than what we do know, and the compulsion to find a higher degree of coherence than is currently possible quickly leads people to prematurely generalise and simplify complex correlations. In Friedrich August von Hayek’s prophetic book ‘The Road to Serfdom’, social planners, known as the ‘social engineers’ of mechanism, are attacked for having been taken in by the constructivist delusion that a society 59 can be designed on a drawing board. It also remains doubtful whether we would even be able to understand the recommendations of the living earth simulator. After all, implausible suggestions that are open to interpretation can hardly be implemented as policies. Open (big) data projects in public administration Many big data projects are solely driven by the monetisation strategies of particular interests. However, the various data applications, regardless of their data volumes, can also provide a valuable economic benefit to society. Unlike 60 the big data applications that are usually talked about, ‘open data’ (or ‘open government data’) does not serve primarily monetary purposes, but it does also offer an often underestimated wealth of data that different stakeholders can use multiple times for any number of reasons. Open data is digital, anonymised information that is made publicly available with almost no restrictions. Data-driven innovation Open data is usually focused on non-textual material, such as maps, satellite images, geospatial data and environmental data, rather than documents and records. This movement is driven by the growing demand of citizens for transparency, interaction and collaboration. Another aspect that could present an economic benefit is data-driven innovation. After all, these ‘open’ data sets can be used freely by journalists, researchers, companies and members of the general public for all kinds of different purposes, both private and commercial, and, most importantly, they can be re-used again and again. This freely available data, in combination with web-based technologies, creates an economic benefit in the form of new business models and innovative products and services. Journalists, for example, are using various open data sets to augment their investigative reports, thereby driving forward the field of ‘data 61 journalism’. Open data is there to be re-used and recycled. To support this, the 58 59 60 61 29 | May 5, 2014 http://cordis.europa.eu/fp7/ict/programme/fet_en.html. Hayek, F. A. von (2004). Der Weg zur Knechtschaft (The Road to Serfdom). 4th edition. Mohr Siebeck Verlag. Tübingen. http://opendatahandbook.org/de/why-open-data/index.html. http://datajournalismhandbook.org/. Current Issues Big data – the untamed force 62 Use of e-government services via mobile devices 32 % of survey respondents, 2013 70 60 50 40 30 20 10 0 DE Yes AT CH No, but plan to SE No UK US Don't know DE (n=600); AT (n=713); CH (n=702); SE (n=725); UK (n=713); USA (n=576) Source: eGovernment MONITOR 2013 Establishing standards in open data Open Knowledge Foundation (OKF) in Germany organises regular 63 ‘hackathons’ for a wide variety of people including journalists, open-data experts and software developers. They use these events to share ideas and tips about the mining of existing, freely available data and the design of new webbased services in the form of apps. Although open data can actually cover all types of data, the term is often used as a synonym for open government data. The gradual liberation of government data is changing the relationship between citizen and state. This data that is being made freely available is adding a new dimension to democracy; the previously opaque workings of authorities are becoming more transparent. The US and the UK in particular (data.gov; data.gov.uk) as well as certain Nordic countries are playing a leading role in the publication of government data. Various open data projects have already been successfully implemented in 64 these places and are being used by citizens. Several years ago, a list of standards was developed that is today known as the Ten Principles for Opening Up Government Information. These are there to help governments and administrative bodies make their data records available to the 65 general public. In mid-2013 the G8 member states signed the Open Data 66 Charter, which committed them to five principles that would see them open up their data by 2015. An example of open data is the website that gives the UK’s 67 taxpayers a breakdown of what their money is being spent on . This option also exists in Germany: the website offenerhaushalt.de, another OKF project, provides in-depth information about government spending at a national level and includes details about the size of budget available to each department. 68 Since 2005, when Germany’s Freedom of Information Act was passed, every citizen has had the right to demand information about such matters. The data on the website is presented in a way that makes it quick and easy for people to access, analyse and process this information. In the UK, however, it is not only information on how the government spends tax money that is available online. Open data at a local level is being provided to 69 the general public in a similar scale on the website OpenlyLocal Nearly one quarter of the UK’s local authorities have made their data publicly available here, and more are being added every week. As well as breakdowns of spending, the data published on the site includes all other municipal information such as population statistics and local financial data. An organisation based in London (MySociety) has launched a website called ‘Fix my Street’ that allows people to report potholes, broken traffic lights and street lights directly to the relevant authorities. Thanks to the accompanying app, smartphones can be used to send photos of specific potholes to the relevant authorities, who can instruct repair jobs using the GPS data. The person who reported the fault is given regular updates about the progress of the repair. This concept has since been imitated in a number of European countries. These kinds of platforms can benefit both the municipal authority and the 62 63 64 65 66 67 68 69 30 | May 5, 2014 The OKF, for example, has defined a set of ten data sets that national governments should, as a minimum, publish in the form of open data and it rates countries on this basis in its Open Data Census. The most recent census, which now covers almost 80 countries, was conducted for the G8 Summit. See http://census.okfn.org/. See http://energyhack.de/ and http://jugendhackt.de/. For an example of this, see the Open Data Institute (http://theido.org). Germany, France, UK, Italy, Japan, USA, Canada and Russia. Open Data by Default, Quality and Quantity, Usable by All, Releasing Data for Improved Governance, Releasing Data for Innovation. http://wheredoesmymoneygo.org/. The Informationsfreiheitsgesetz gives all citizens the unconditional right to access administrative information from federal authorities. Their wish to do so does not have to be based on a legal, commercial or other interest. http://openlylocal.com/. Current Issues Big data – the untamed force The ten principles for open data 33 1. Completeness All raw information from a public data set should be made publicly available in its entirety – ideally along with formulas and explanations for how the derived data was calculated. Public data is data that is not subject to valid privacy, security or privilege limitations. 2. Primacy Data should be collected at the source, with the highest possible level of granularity, not in aggregate or modified forms. 3. Timeliness Data should be made available as quickly as necessary to preserve the value of the data – ideally as soon as it has been collected and collated. 4. Accessibility Data should be available to the widest range of users for the widest range of purposes. It should be made as easy as possible for people to access the information they require. 5. Machine readability To make the data suitable for automated processing, it should be structured in a common data format. 6. Non-discrimination The data should be available to everyone at any time and with no requirements for identification (registration or other membership requirements). 7. Use of commonly owned standards Data should be made available in standardised, freely available formats over which no legal entity has exclusive control. 8. Licensing Data should not be subject to any limitations of use such as copyrights, patents, trademarks or trade secret regulations. Reasonable privacy, security and privilege restrictions are allowed, however. 9. Permanence Public data should be made permanently available in online archives. 10. Low usage costs residents – however, there are not always sufficient public funds to repair the fault immediately. In Germany there is still no central database in which all municipal 70 administrative data is collected. Individual cities, such as Cologne and Frankfurt, are now offering these kind of tools, however. The website ‘Frankfurt 71 gestalten – Bürger machen Stadt’ .(another OKF project), is designed by the people of Frankfurt for the people of Frankfurt, providing information about the city free of charge. In addition, people can use the OpenStreetMap wiki to search for existing data sets relating to local politics and other matters e.g. police reports and roadworks notices. There are also various forums where citizens can discuss local developments. The projects described in this chapter demonstrate that the focus of open data – as an alternative use of data in the big data discussion – is much more on benefiting society than on earning money. This raises the question as to how much the big data discussion is able to give additional momentum to open data and to the creation of value. The opening up of previously inaccessible data is not only relevant for the public sector, it also undoubtedly has potential for other areas. In education (Open Research Data), for example, we can observe a liberation of data in how departments are making whole series of lectures including the accompanying handouts publicly available on YouTube or on the university websites (e.g. the Massachusetts Institute of Technology’s 72 MITOpenCourseWare ). 8. The limits of big data Big Data is not a panacea. Besides the need for data protection legislation to catch up, there are other aspects that highlight the weaknesses and limitations of the latest big data analysis methods. Innovative technologies and processes in the area of big data do open up new ways in which stakeholders can, for example, augment existing data models and scenarios with real time data in order to obtain more meaningful results and ideally forecast trends. However, big data is not able to perform miracles, because the laws of statistics don’t simply cease to apply due to a massive volume of new data (correlations). Just because data-driven analyses can suddenly be augmented with new, previously unimaginable quantifiable data sets, this does not make the results any more objective than in the pre-big data era, i.e. despite the increasing ability to quantify human behaviour, and even given all the new, machine-readable buying preferences and expressions of human emotion this does not necessarily produce any reliable facts. This is particularly true of data originating 73 from social networks. No matter how quantifiable and dispassionate the analysis methods, the evaluation of the results from this scientific data analysis still involves interpretation by human beings (with their subjective views), who are fallible. http://sunlightfoundation.com/blog/2011/07/14/vivek-kundras10-principles-for-improving-federal-transparency/ http://wiki.okfn.de/10-Prinzipien-fuer-offene-Daten Increasing complexity in the amalgamation of data sets Furthermore, there are other methodological problems, such as the unreliability, susceptibility to error and incompleteness of large data sets. These problems are amplified when data from different sources is aggregated (data delineation). When different data sets are merged the complexity of the underlying data pool increases with each correlation. Amalgamation doesn’t just mean a simple 70 71 72 73 31 | May 5, 2014 http://www.offenedaten-koeln.de/. http://www.frankfurt-gestalten.de/. http://ocw.mit.edu/index.htm. Cf.: Boyd, D., Crawford, K. (2013). Critical questions for Big Data. Provocations for a cultural, technological and cultural phenomenon. Current Issues Big data – the untamed force Increasing complexity of data combination addition of new data. For example, when ten details from environmental sensors are linked to ten details relating to traffic volume this does not result in 20 new data records, but is instead multiplied to produce 100 new pieces of information, which can all be interpreted in a variety of new ways. This increases the complexity of the newly created data set and the requirements made of it enormously, with the inherent danger of trying to detect patterns in the underlying analysis that do not exist. This could result in the wrong actions being taken. Cause and causality Big data does not offer causality, only correlation Although data is getting bigger all the time, the maxim of ‘quality over quantity’ still applies. Big data does not equate to completeness. The available information and data arrays can be examined for correlations using existing analysis tools. But these connections found by algorithms are not always logical. Even with today’s advanced technology it is not possible to establish or examine the causality behind the connections found; this task still falls to the human mind, using its experience and intuition. It is down to human beings to determine which questions and answers can be derived from the data. Algorithms are simply an amalgamation tool and determine, which connections the data exhibits, but not why it exhibits them. Data analysis remains the basis for decision-making, but the power to make decisions should not be transferred to algorithms. Although big data is based on correlations it is unsuitable for 74 assessing their causality. What about human intuition? The threat of information misuse The new technologies and methods might tempt people to stop trying to find causal connections by using or developing models, and rather let the data speak for itself, without any further interpretation. In the long term that could change the way people think, as they come to rely too much on big data without questioning causes and connections. Thought patterns may potentially be led along certain paths, with a negative impact on human intuition and creativity. But both of these are crucial elements in the process of innovation. Should data algorithms be allowed to automatically initiate recommended actions without further interpretation? The potential preventive use of data evaluation should not be allowed to become common practice. In an extreme scenario, the police might, for example, increase their surveillance of someone simply because his current ‘digital footprint’ (e.g. tweets, search queries, credit card and online purchases) fits the typical behaviour pattern of known bank robbers as determined by algorithms, without that person ever committing any criminal acts or even contemplating them. At first sight the idea of being able to predict crimes using big data before they are committed does, of course, seem very tempting. But it is in fact a misuse of information if wholesale and unwarranted profiling leads to people getting caught in an electronically created net and put under general suspicion. That sort of behaviour constitutes a general danger to a free and democratic system and was consequently the subject of debate even before the recent surveillance revelations. Questionable data Data from social networks does not provide universal insights into human behaviour Some stakeholders, including academics, like to use data from social networks to try and detect new patterns based on people’s moods and feelings in order to explain or even predict media events, political protests or other social movements. But even the analysis of such large data sets does not allow us to 74 32 | May 5, 2014 Cf.: Mayer-Schönberger, V. and Cukier, K. (2013). Current Issues Big data – the untamed force draw general conclusions about human behaviour. For example, even a large proportion of the Facebook community is still only a specific subset. Depending on the analysis and the message derived, the users of social networks may well differ from other people. A variety of characteristics, such as age, gender, level of education, internet affinity and the willingness to post personal pictures and messages play an important role. It is impossible to be certain that the officially stated numbers of accounts on social networks do not include fake accounts, or that the number of users and accounts match. Nor can we rule out the possibility that individual social networks censor problematic user content, thereby undermining the value of these messages as a data source from the outset. It is imperative to keep these questions of methodology at the back of one’s mind when deciding which questions or what analysis are valid, and consequently which interpretations 75 based on the analysis are reliable. In spite of, or even because of, the big data movement it is still true to say that smart data can provide equally good results. Ethics and morality What exactly do we mean by ‘open access’? A comprehensive discussion about ethics and morality within the context of big data would go beyond the scope of this study. But ethical and moral considerations do, of course, play an important role when it comes to the processing of personal data from different sources and contexts, sometimes repeatedly and by different stakeholders. Many questions in the big data discussion remain unanswered, e.g. with regard to access, purpose, control, power structures and veracity. What, for example, is the status of data that is marked as ‘public’ on the various social networks, and therefore not just open to friends but in effect to all internet users? Should interested parties be allowed to use this data for analysis purposes or is there some sort of ‘volunteer protection’ which means that the informed consent of each individual user is required? The fact that some digital content is publicly accessible does not automatically mean that anyone can do whatever they like with it. The internet is not a legal vacuum. 9. Conclusion Big data is unstoppable Massive data volumes are now a fact of life. Big data is as impossible to stop as globalisation. With the advent of big data, IT security has become a crucial part of the overall value creation process. Where stakeholders used to struggle with limited data, they are now struggling with the limitations of analysis, and the question of which additional valuable information they might be able to coax from the glut of data at their disposal. Although decision-makers are well aware that big data is a subject of great strategic relevance, offering lucrative opportunities for growth in the medium to long term, they often lack suitable digitisation strategies, trained staff to face the new challenges and the necessary management skills. Inadequate or outdated data protection regulations, coupled with a rather rigid silo mentality, mean that inhibitions about actively experimenting with big data still remain. New types of jobs created Big data will alter the structure of existing economic sectors, as new competitive constellations are created. Many established business models are going to be overthrown by the digital revolution, with firms facing increasing competition by companies from other sectors who specialise in products and services relating to web-based information and communication technologies or data analysis. There will also be new types of jobs, such as data analysis specialist and algorithm expert. 75 33 | May 5, 2014 Cf.: Boyd, D., Crawford, K. (2013). Current Issues Big data – the untamed force In particular, start-ups and niche providers who specialise in the evaluation and analysis of various data, including publicly available data, and turn it into products and services, will become much more prevalent. This will increase the pressure on established business models. An application for web-enabled devices that evaluates people’s individual media usage and reading behaviour can provide a much better indication of what people like to read, in a much shorter time frame, than any publishing company with years of experience in customer service was ever able to. Letting data speak Painstakingly acquired expertise within established business sectors will in future be challenged much more quickly, as the business models of the technology companies entering the market are based on the premise that intelligent algorithms will provide more accurate forecasts in a much shorter time and at a much lower cost than individual experts who have spent years building up their knowledge. This will rapidly lead to increased competition and a changed outlook in the markets. Established business models will need to undergo an at times painful transformation process. In addition, the skills requirements in the labour market will alter, i.e. due to the increased growth caused by the digital revolution, employers in business, academia and politics will demand better qualifications from prospective employees. With growing demand for big data methods there will be lucrative opportunities for career changers with a background in statistics, mathematics, information technology, data analysis, artificial intelligence or robotics, as they have sought-after transferable skills. Many unanswered questions For all the promise and positive effects associated with big data, some central questions remain unanswered. What influence are the new technologies and analysis methods going to have on our everyday lives? Many sectors are already going through painful adjustment processes. Which other industries will be affected, and which digitisation strategies could they use to counteract the competitive disadvantages they face? How will the stakeholders and the citizens deal with the imbalance in data sovereignty? What role will Germany play in an increasingly digital, globally networked world? With its knowledge-intensive economy and lack of natural resources, a country like Germany is ideally placed to play a leading international role in cutting-edge internet technologies, especially in the area of IT security. The digital world needs rules The commercial and technological foundations for the debate about big data are already in place. Now the time has come to discuss essential data protection aspects based on fundamental rights and freedoms, as well as the potential threat of competition distortion from the digital ecosystems. The latter aspect is particularly important, because the corrective market consolidation in the digital world of the large internet corporations is happening much faster than in, for example, the automotive sector. Decision-makers from business, politics and academia, as well as civil rights groups, must now play an active part in shaping the necessary legislative framework. This goes particularly for the question of how to achieve a balance between data protection legislation on purpose limitation and the big data methods currently being practised. The many different interests must be discussed now It is only a matter of time before we reach the ‘Internet of Things’; sooner or later humanoid robots will be providing increased efficiency and support in many households. In future, people are going to be interacting with their computers (artificial intelligence) in order to optimise their day-to-day lives. The burning question in this regard is not when this is going to happen, but rather how future generations will handle modern technologies, what general parameters they are going to agree on, what boundaries they will set, and whether people will be able to trust that their rights are going to be enforceable. The wide spectrum of different hopes and much-vaunted potential – but also the well-justified concerns – show just how multi-faceted the developments and 34 | May 5, 2014 Current Issues Big data – the untamed force applications of big data are. After all is said and done, big data can be used for a vast range of different commercial, reputable purposes, but equally to criminal or at least highly dubious ends. For every positive example there is also a worrying but equally realistic scenario. Danger of losing trust A favourable legislative environment, ideally with conditions that apply on an international basis, would aim to create the necessary equilibrium between stimulating the development opportunities for big data without putting too much emphasis on the risks. This balancing act will not be an easy one, but it ought to be tackled at this early stage of development, and at an international level. There are not going to be any perfect solutions or definitive answers for many of the future problems and questions relating to big data. The challenge is to find a way of integrating modern technologies and methods into people’s everyday lives in a useful way without — violating any civil rights and liberties or democratic principles, — without discrimination or manipulation, — and without making people more afraid of using digital channels because different stakeholders are conducting mass-surveillance of private data. It is up to us, as the valuable, creative suppliers of the ideas behind every (r)evolutionary technology or innovation, to keep a tight hold of the reins and set an appropriate course. May the Force be with us. Thomas F. Dapp (+49 69 910 31752, thomas-frank.dapp@db.com) Veronika Heine 35 | May 5, 2014 Current Issues Big data – the untamed force Reference list Bahr, F. et al. (2012). Schönes neues Internet? Chancen und Risiken für Innovation in digitalen Ökosystemen. (Great new internet? Opportunities and risks for innovation in digital ecosystems.) Policy Brief. Stiftung Neue Verantwortung. Berlin. Beck, S., Grzegorzek, M. et al. (2013). Mit Robotern gegen den Pflegenotstand. (Using robots to alleviate the care crisis.) Policy Brief 04/13. Stiftung Neue Verantwortung. Berlin. Big Data – Vorsprung durch Wissen. Innovationspotenzialanalyse. (Big Data – Getting ahead through knowledge. Analysis of innovation potential.) 2012. Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS). Sankt Augustin near Bonn. Boyd, D., Crawford, K. (2013). Critical questions for Big Data. Provocations for a cultural, technological and cultural phenomenon. In Geiselberger, H., Moorstedt, T. (2013). Big Data. Das neue Versprechen der Allwissenheit. (Big Data: The new promise of omniscience.) eBook Suhrkamp Verlag Berlin. Brown, B. et al. (2011). Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute. Dapp, T. (2012). Homo biometricus: Biometric recognition systems and mobile internet services. Current Issues. Deutsche Bank Research. Frankfurt am Main. Dapp, T. (2013). The future of (mobile) payments: New (online) players competing with banks. Current Issues. Deutsche Bank Research. Frankfurt am Main. Dapp, T. and Heymann, E. (2013). Dienstleistungen 2013. Heterogener Sektor verzeichnet nur geringe Dynamik. (Services 2013. Heterogeneous sectors record only slow growth.) Aktuelle Themen. Deutsche Bank Research. Frankfurt am Main. Geiselberger, H., Moorstedt, T., (2013). Big Data: Das neue Versprechen der Allwissenheit. (Big Data: The new promise of omniscience.) eBook Suhrkamp Verlag Berlin. Ginsberg, Jeremy et al. (2009). Detecting influenza epidemics using search engine query data. Google Inc. Centers for Disease Control and Prevention in Nature Vol. 457. Hayek, F. A. von (2004). Der Weg zur Knechtschaft. (The Road to Serfdom.) 4th edition. Mohr Siebeck Verlag. Tübingen. Heuer, S. (2013). Kleine Daten, große Wirkung. Big Data einfach auf den Punkt gebracht. (Small data, big impact. Big data in a nutshell.) DigitalKompakt LfM #06. Landesanstalt für Medien Nordrhein-Westfalen (LfM). Düsseldorf. Kranzberg, M. (1986). Technology and history: Kranzberg’s laws. In Technology and Culture. 27/2. P. 544-560. Manig, M. and Giere, J. (2012). Quo Vadis Big Data – Herausforderungen – Erfahrungen – Lösungsansätze. (What next for big data – challenges – experiences – approaches.) TNS Infratest GmbH. Munich. Mayer-Schönberger, V., Cukier, K. (2013). Big Data. Die Revolution, die unser Leben verändern wird. (Big Data. The revolution that will change our life.) Redline Verlag. Munich. 36 | May 5, 2014 Current Issues Big data – the untamed force Morozov, E. (2013). Big Data. Warum man das Silicon Valley hassen darf. (Why you’re allowed to hate Silicon Valley.) Frankfurter Allgemeine Zeitung review supplement of 10 November 2013. Schaar, P. (2013). Big Brother und Big Data – Was heißt eigentlich Datenschutz auf Amerikanisch? (Big Brother and Big Data – How do you say ‘data protection’ in American?) Talk given as part of the 41st Römerberg debates on 26 October 2013 in Frankfurt am Main. Schroeck, M. et al. (2012). Analytics: Big Data in der Praxis. Wie innovative Unternehmen ihre Datenbestände effektiv nutzen. (Analytics: practical uses of big data. How innovative companies are drawing benefit from their data.) IBM Institute for Business Value in partnership with the University of Oxford’s Saïd Business School. Weichert, T. (2013). Big Data – eine Herausforderung für den Datenschutz. (Big Data – a challenge for data protection.) In Geiselberger, H., Moorstedt, T. (2013). Big Data. Das neue Versprechen der Allwissenheit. (Big Data: The new promise of omniscience.) eBook Suhrkamp Verlag Berlin. Weichert, T. (2013). Big Data und Datenschutz. (Big data and data privacy.) Unabhängiges Landeszentrum für Datenschutz Schleswig-Holstein. (Independent Regional Centre for Data Privacy, Schleswig-Holstein). 37 | May 5, 2014 Current Issues Big data – the untamed force Further reading Becker, K., Stalder, F. (2010). Deep Search. Politik des Suchens jenseits von Google. (Deep Search. The politics of search beyond Google) Federal Agency for Civic Education. www.bpb.de. Bonn. Bloching, B. et al. (2012). Data Unser. Wie Kundendaten die Wirtschaft revolutionieren (The Store’s Prayer. How customer data is revolutionising the economy.) Redline Verlag. Munich. Kurz, C., Rieger, F. (2012). Die Datenfresser. Wie Internetfirmen und Staat sich unsere persönlichen Daten einverleiben und wie wir die Kontrolle darüber zurückerlangen. (The data devourers. How internet firms and the state are feasting on our personal data and how we can regain control of this.) Fischer Verlag GmbH. Frankfurt am Main. Schmidt, J., Weichert, T. (2012). Datenschutz. Grundlagen, Entwicklungen und Kontroversen. (Data privacy. Fundamentals, developments, controversies.) Federal Agency for Civic Education. Bonn. Shapiro, C. and Varian, H. (1999). Information Rules. A Strategic Guide to the Network Economy. Havard Business Review Press, Boston Massachusetts. 38 | May 5, 2014 Current Issues Focus topic Germany Focus Germany: So far, so good (Current Issues – Business cycle) ...................................... May 2, 2014 Industry 4.0: Upgrading of Germany’s industrial capabilities on the horizon (Current Issues – Sector research) ...................................April 23, 2014 Focus Germany: 2% GDP growth in 2015 despite adverse employment policy (Current Issues – Business cycle) ............................ February 28, 2014 Can Hollande pull off a Schröder and will it work? (Standpunkt Deutschland) ........................................ February 24, 2014 Focus Germany: Onward and upward (Current Issues – Business cycle) .............................. January 27, 2014 Grand coalition – poor policies (Standpunkt Deutschland) ...................................... December 16, 2013 Criticism of Germany’s CA surpluses largely unfounded (Standpunkt Deutschland) ...................................... December 12, 2013 Our publications can be accessed, free of charge, on our website www.dbresearch.com You can also register there to receive our publications regularly by E-mail. Ordering address for the print version: Deutsche Bank Research Marketing 60262 Frankfurt am Main Fax: +49 69 910-31877 E-mail: marketing.dbr@db.com Available faster by E-mail: marketing.dbr@db.com Länder bonds: What drives the spreads between federal bonds and Länder bonds? (Current Issues – Germany)...................................... December 5, 2013 Focus Germany: Launchpad to the past (Current Issues – Business cycle) .......................... November 29, 2013 German industry: Tangible production growth in 2014 (Current Issues – Sector research) ......................... November 15, 2013 Minimum wage at EUR 8.5: The wrong policy choice (Standpunkt Deutschland) ........................................ November 1, 2013 Focus Germany: Exuberance and fear (Current Issues – Business cycle) .............................. October 31, 2013 © Copyright 2014. Deutsche Bank AG, Deutsche Bank Research, 60262 Frankfurt am Main, Germany. All rights reserved. When quoting please cite Focus Germany: Germany after the elections “Deutsche Bank Research”. The above information does not constitute the provision of investment, legal or tax – advice. Any views expressed reflect the current October views of the (Current Issues Business cycle) ................................ 1,author, 2013 which do not necessarily correspond to the opinions of Deutsche Bank AG or its affiliates. Opinions expressed may change without notice. Opinions expressed may differ from views set out in other documents, including research, published by Deutsche Bank. The above information is provided for informational purposes only and without any obligation, whether contractual or otherwise. No warranty or representation is made as to the correctness, completeness and accuracy of the information given or the assessments made. In Germany this information is approved and/or communicated by Deutsche Bank AG Frankfurt, authorised by Bundesanstalt für Finanzdienstleistungsaufsicht. In the United Kingdom this information is approved and/or communicated by Deutsche Bank AG London, a member of the London Stock Exchange regulated by the Financial Services Authority for the conduct of investment business in the UK. This information is distributed in Hong Kong by Deutsche Bank AG, Hong Kong Branch, in Korea by Deutsche Securities Korea Co. and in Singapore by Deutsche Bank AG, Singapore Branch. In Japan this information is approved and/or distributed by Deutsche Securities Limited, Tokyo Branch. In Australia, retail clients should obtain a copy of a Product Disclosure Statement (PDS) relating to any financial product referred to in this report and consider the PDS before making any decision about whether to acquire the product. Printed by: HST Offsetdruck Schadt & Tetzlaff GbR, Dieburg Print: ISSN 1612-314X / Internet/E-mail: ISSN 1612-3158