Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it 2 Motivations for Taking Care of Data Data is everywhere (Big, complex, real-time, unstructured) Putting data at the center of research work on energy issues may bring some benefits. (Today the focus is on algorithms). Cost metrics of data management techniques (communication, storing, access, query, analysis) will help professionals and users to save energy in data-intensive apps. Energy-scalable data management is important for sustainable data science. 3 Data Availability or Data Deluge? • Every life process today is data intensive. • The information stored in digital data archives is enormous and its size is still growing very rapidly. 4 Data Availability or Data Deluge? • Some decades ago the main problem was the shortage of information, now the challenge is • the very large volume of information to deal with and • the associated complexity to process it and to extract significant and useful parts or summaries. Complex Big 5 Problems … • Bigger and more complex problems must be solved by using large-scale distributed computing systems. • DATA SOURCES are larger and larger and ubiquitous (Web, sensor networks, mobile devices, telescopes, …). Big Data …and • Even where accessible, much data in many fields cannot be read by humans so • The huge amount of data available today requires smart data analysys techniques to aid people to deal with it and • Scalable algorithms, techniques, and systems are needed (time and energy scalability). 6 7 Data: From Storing to Analysis • Storing data is not the only main problem. • A key issue is analyse, mine, and process data for making it useful. Source: The Economist Towards Models for Energyaware Data Management The main focus today is on energy-aware algorithms, tasks, applications. The other side of the coin is data and costs of operating on it. Abstract energy-cost models for exchanging, accessing and transform data are primary elements for energyaware data management at large scale. They are useful for sustainable data science. 8 An Example: Energy-aware Mining of Data We evaluated the energy cost of analyzing data by using some well-known data mining techniques on mobile devices. Our interest was mainly on how the same technique consumes energy when dimension of data change. Tests with different • Data set dimensions, • Attribute number, • Class number. 9 10 Data Mining Techniques Energy characterization of data mining techniques running on mobile devices k-means (data clustering) J48 (data classification) Apriori (association rules) Common performance parameters Number of instances (data set size) Number of attributes Algorithm-specific performance parameters k-means: number of clusters J48: decision tree size Apriori: Number of rules, minimum support and minimum confidence k-means (1) Increasing the number of instances,with different produced clusters 11 k-means (2) Increasing the number of attributes with different produced clusters 12 Apriori (1) Increasing the number of instances with different number of attributes 13 Apriori (2) Increasing the data set size with different number of rules 14 Apriori (3) 15 Increasing the data set size with different minimum confidence J48 16 Increasing the number of instances with different number of Attr_55 Attr_38 Attr-16 Attr_8 attributes Energy Consumption (Joules) 120 100 80 60 40 20 0 1620 3601 6341 10826 Number of Instances Attr_55 Attr_38 Attr_16 Attr_8 100 90 80 70 60 50 40 30 20 10 0 1620 Attr_55 Attr_38 Attr_16 Attr_8 98 97 CPU % Time (sec) 99 96 95 94 93 92 1620 3601 6341 Number of Instances 10826 3601 6341 Number of Instances 10826 Results on different devices Results obtained with different smart phones Sony Xperia P: 1 GHz Dual CoreARM processor and 1 GB RAM HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM 17 Results on different devices Results obtained with different smart phones Sony Xperia P: 1 GHz Dual CoreARM processor and 1 GB RAM HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM 18 Results on different devices Results obtained with different smart phones Sony Xperia P: GB RAM HTC Hero: 1 GHz Dual Core ARM processor and 1 528 MHz Qualcomm processor and 288 MB RAM Samsung Galaxy ACE: 800 MHz Qualcomm processor and 512 MB RAM 19 20 Concluding Remarks Data-intensive applications demands for energy cost models based on data characteristics. This should be done for sensors, smart phones, HPC servers, and clouds. In general, for large scale computing systems. Sustainible data center services and applications may benefit from these models. Preliminary experiments show useful data. 21 Data Sets Census (http://archive.ics.uci.edu/ml/datasets/Census+Income) Used with K-means Data set size: 14 MB Number of instances: 244348 Number of attributes: 11 Census_disc (http://archive.ics.uci.edu/ml/datasets/Census+Income) Used with Apriori Data set size: 19 MB Number of instances: 333011 Number of attributes: 11 Covertype (http://archive.ics.uci.edu/ml/datasets/Covertype) Used with J48 Data set size: 14.5 MB Number of instances: 114556 Number of attributes: 55 22 Method Algorithm Data Set Size RAM Memory (MByte) Virtual Memory (MByte) CPU (%) Battery Charge Depletion (mAh) Energy Consumption (J) Time (sec) Association Rules CENSUS_DISC.arff Rule Induction Apriori 0,1 0,2 0,4 0,8 1,6 3,2 MB MB MB MB MB MB 15,86 16,97 18,06 19,87 23,32 26,92 95,19 105,36 104,95 102,75 103,99 100,01 96,92 98,03 98,24 98,13 96,87 95,44 0 0 0 2,7 13,5 23,3 0 0 0 35,964 179,82 310,356 6 12 26 73 300 3960 6,4 MB --- --- --- --- --- --- 19,47 20,15 23,87 27,68 ------- 104,94 104,92 105,6 103,87 ------- 13,4 29,8 59,4 194,64 ------- 178,488 396,936 791,208 2592,6048 ------- 300 540 2040 8160 ------- 6,75 8,1 18,9 18,9 43,2 ----- 89,91 107,892 251,748 251,748 575,424 ----- 55 150 300 600 1320 ----- Classification COVERTYPE.arff Trees J48 0,1 0,2 0,4 0,8 1,6 3,2 6,4 MB MB MB MB MB MB MB 96,23 98,21 97,43 97,36 ------- Clustering CENSUS.arff Instancebased/La zy Learning K-Means 0,1 0,2 0,4 0,8 1,6 3,2 6,4 MB MB MB MB MB MB MB 16,73 17,95 19,72 23,08 26,4 ----- 96,56 102,05 102,16 101,86 95,96 ----- 98,03 97,65 97,02 97,97 97,82 -----