Brief Summary on Big Data

advertisement
Brief Summary on Big Data
Abstract. This paper provides a brief summary on Big Data research.
Summary
The age of data science has just started and is experiencing the tactical evolution by leaps and bound in all
dimensions of science. The digital footprints and sizeable traces of the data are being produced and left to
deposit in massive quantities by various data outlet harbingers such as Facebook, Twitter, Wikipedia,
Google to name a few. The scientists from almost every community are expressing significant interest and
eagerness to access the spoors of the raw data to facilitate the optimal decision makings. Disparate
scientific communities unilaterally cherish the granary of such a voluminous data, but argue about the
accessibility mechanism and echo opinions to usher new technologies to cater the data processing needs of
this very hour. The fast-paced growth of the modern data - structured and unstructured - raises the doubts
on the computing capacity of the existing database models. In the last few years, due to the internet sites,
the sources of data collection have been increased. The data is inundated from many potential sources
such as sensors - gathering climatic information, power meter readings, social media sites, videos and
digital pictures, traffic data, etc. These sources are responsible for the rapid growth of voluminous data about 2.5 quintillion bytes every day. Because of the growth of the data with such a fast velocity, the size of
data is a central moving issue today. In 2012, in a single dataset, the size of the data likely to increase from
few dozen terabytes to many petabytes and about 90% of the whole data in world today is produced in the
last two years only. Not only for storage capacity, but also from processing aspects, the existing database
management systems fall short to handle them. In this era of data science, in its entirety, the term “data”
is further redefined, which is popularly known as “Big Data” today. This may help to convincingly
reformulate the operational-oriented definition of “Big Science” into data-centric definition and can be
rephrased as –“The science that deals with Big Data is Big Science.” The concept of Big Data is relatively
new and has a large scope of further research. This section highlights some important aspects about the
Big Data. Primarily, the Big Data is described to have five concerns: volume of the data, velocity of the
data, variety of the data, veracity of the data, and the value of the data. All of the concerns are briefly
described here. Today, the industries are inundated with voluminous data of all types. The data is growing
easily from terabytes to even petabytes. Tweets from the social network sites alone are responsible for up
to 12 terabytes of data every day. On an annual basis the power meters contribute to a huge amount of
data – 350 billion readings.
The data, getting collected at the enterprises is at a very fast pace.
Sometimes even the delay of a minute may cause the discrepancy in the analyzed output. For especially
time-sensitive systems such as fraud detection, the Big Data must be analyzed as it comes into the
enterprises to obtain the best results such as scanning though the 5 million trade transactions, produced
every day, to capture the potential frauds. As per the description about Big Data, it can be of any type,
regardless of its structured or unstructured nature. The data type majorly includes – text, audio and video
data, sensor data, log files, etc. One of the main reasons that the enterprises prefer to work with Big Data
is the finding of new insights, when analyzing all such data types together. The examples include –
monitoring live video feeds from surveillance cameras in order to narrow down to the point of interest; to
analyze the bulk data growth in the videos, documents and images to enhance the customer satisfaction.
In the business world, the leaders do not completely trust on the information extracted from Big Data.
Though, they exploit this information to obtain better decision making. If someone does not trust on the
input, how she/ he can trust on the output and changes that one will act of such outputs are next to
negligible. As the Big Data is growing in size and variety every day, instating the confidence in Big Data
poses greater challenges. Regardless the engineering arena, enormous efforts are invested for the curation
of Big Data with the expectations of being it content-rich and valuable. Though, the richness of the data
may be highly valuable for certain communities and may be of no use for others that do not see the apt
outcome from the extraction. So, in the latter case, the efforts investment does not warrant any return. So,
a primary careful analysis about the value of the Big Data before its curation may help building the
competitive advantage.
Conclusion. This paper was a one pager on Big Data research.
1
References
Sharma, S. (2016). Expanded cloud plumes hiding Big Data ecosystem. Future Generation Computer
Systems, 59, 63-92.
Sharma, S., Chang, V., Tim, U. S., Wong, S, Gadia, S. (2016). Cloud-based Emerging Services Systems.
International Journal of Information Management, Elsevier.
Sharma, S. (2016). Concept of Association Rule of Data Mining Assists Mitigating the Increasing Obesity.
International Journal of Information Retrieval Research (IJIRR), IGI Global, 2016.
Sharma, S., Chang, V., Tim, U. S., Wong, S, Gadia, S. (2016). Growing Cloud Density & as-a-Service
Modality and OTH-Cloud Classification in IOT Era.
Sharma, S., Tim, U. S., Gadia, S., & Wong, J. (2016). Proliferating Cloud Density through Big Data
Ecosystem, Novel XCLOUDX Classification and Emergence of as-a-Service Era.
Sharma, S. (2015). Evolution of as-a-Service Era in Cloud. arXiv preprint arXiv:1507.00939.
Sharma, S., Tim, U. S., Payton, M., Cohly, H., Gadia, S., Wong, J., & Karakala, S. (2015). Contextual
motivation in physical activity by means of association rule mining. Egyptian Informatics Journal, 16(3),
243-251.
Sharma, S., Gadia, S., Tim, U. S. (2016). SubVizCon: Visual Subsetting, Conversion and Complex Queries
Exploitation in Spatio-Temporal Compound of Big Data.
Sharma, S., Tim, U. S., Gadia, S., Wong, J., Shandilya, R., & Peddoju, S. K. (2015). Classification and
comparison of NoSQL big data models. International Journal of Big Data Intelligence, 2(3), 201-221.
Sharma, S., Shandilya, R., Patnaik, S., & Mahapatra, A. (2015). Leading NoSQL models for handling Big
Data: a brief review. International Journal of Business Information Systems, Inderscience, 18(4).
Sharma, S., Tim, U. S., Wong, J., Gadia, S., & Sharma, S. (2014). A brief review on leading big data
models. Data Science Journal, 13(0), 138-157.
Sharma, S., Tim, U. S., Gadia, S., & Wong, J. (2014). Does SNAP eligibility have racial or ethnic gradients:
a geospatial social exploratory. International Journal of Information and Communication Technology,
6(2), 189-212.
Sharma, S. (2014). Racial disparity in SNAP? No: a geospatial study for Iowa. International Journal of
Indian Culture and Business Management, 9(3), 340-352.
SUGAM, S., SHASHI, G., & UDOYARA SUNDAY, T. (2013). Parametric database approach integration for
handling temporal data in GIS. Geo-spatial Information Science, 16(2), 91-99.
SHARMA, S., TIM, U. S., & GADIA, S. (2012). AutoConViz: automating the conversion and visualization
of spatio-temporal query results in GIS. Geo-spatial Information Science, 15(2), 85-93.
Sharma, S., Wong, J., Tim, U. S., & Gadia, S. (2013). Bidirectional migration between variability and
commonality in product line engineering of smart homes. International Journal of System Assurance
Engineering and Management, 4(1), 1-12.
Sharma, S., Goyal, S. B., Shandliya, R., & Samadhiya, D. (2012). Towards XML Interoperability. In
Advances in Computer Science, Engineering & Applications (pp. 1035-1043). Springer Berlin Heidelberg.
Shandilya, R., Sharma, S., & Qamar, S. (2012). A Domain Specific Indexing Technique for Hidden Web
Documents. Communications in Information Science and Management Engineering.
Sharma, S., Tim, U. S., Gadia, S., & Smith, P. (2011). Geo-spatial Pattern Determination for SNAP
Eligibility in Iowa Using GIS. In Advances in Computing and Communications (pp. 191-200). Springer
Berlin Heidelberg.
Sharma, S., Yang, H. I., Wong, J., & Chang, C. K. (2011). Wrenching: transient migration from commonality
to variability in product line engineering of smart homes. In Toward Useful Services for Elderly and People
with Disabilities (pp. 230-235). Springer Berlin Heidelberg.
2
Meghanathan, N., Sharma, S., & Skelton, G. W. (2010). On Energy Efficient Dissemination in Wireless
sensor Networks using Mobile Sinks. Journal of Theoretical and Applied Information Technology, 19(2), 7991.
Sharma, S., Gadia, S., Kumar, N., Narayanan, V., & Zhao, X. (2010). Climate Analysis in IOWA Using
XML and Spatio-Temporal Dataset-NC94. Int. J. Database Manage. Syst, 2(3), 82-93.
Sharma, S., & Gadia, S. (2010). Perl Status Reporter (SRr) on Spatiotemporal Data Mining. International
Journal Computer Science and Engineering Survey, AIRCC-IJCSES (August 2010).
Sharma, S., & Gadia, S. K. (2010). On analyzing the degree of coldness in Iowa, a north central region,
United States: An XML exploitation in spatial databases. In Recent Trends in Network Security and
Applications (pp. 613-624). Springer Berlin Heidelberg.
Sharma, S., Gadia, S., & Goyal, S. B. (2010, September). On the Calculation of Coldness in Iowa, a North
Central Region, United States: A Summary on XML Based Scheme. In Information and Communication
Technologies (pp. 400-405). Springer Berlin Heidelberg.
Sharma, S., & Gadia, S. K. (2010, April). An XML-based range variation approach to render the coldness in
Iowa, a North Central region, United States. In Information Management and Engineering (ICIME), 2010
The 2nd IEEE International Conference on (pp. 567-571). IEEE.
Sharma, S., Goyal, S. B., & Qamar, S. (2009, December). Four-Layer Architecture Model for Energy
Conservation in Wireless Sensor Networks. In Embedded and Multimedia Computing, 2009. EM-Com
2009. 4th International Conference on (pp. 1-3). IEEE.
Sharma, S., & Cohly, H. (2009, December). On enhancing the energy conservation by ATIM window in
wireless sensor networks. In Internet Multimedia Services Architecture and Applications (IMSAA), 2009
IEEE International Conference on (pp. 1-4). IEEE.
Sharma, S., & Rani, M. (2009, October). Three-layer architecture model (TLAM) for energy conservation
in wireless sensor networks. In Ultra Modern Telecommunications & Workshops, 2009. ICUMT'09.
International Conference on (pp. 1-5). IEEE.
Sharma, S., Rani, M., & Goyal, S. B. (2009, October). Energy efficient data dissemination with ATIM
window and dynamic sink in wireless sensor networks. In Advances in Recent Technologies in
Communication and Computing, 2009. ARTCom'09. International Conference on (pp. 559-564). IEEE.
Sharma, S., Cohly, H., & Pei, T. (2010). On Generation of Firewall Log Status Reporter (SRr) Using Perl.
arXiv preprint arXiv:1004.0604.
Sharma, S., Pei, T., & Cohly, H. (2010). On Utilization and Importance of Perl Status Reporter (SRr) in
Text Mining. arXiv preprint arXiv:1001.3277.
Meghanathan, N., Sugam, S., & Skelton, G. W. (2008). Use of Mobile Sinks to Disseminate Data in WSNs.
International Journal of Information Processing, 2(2).
SHARMA, S., PEI, T., & COHLY, H. (2008). ToAccess PubMed Database to Extract Articles Using
Perl_Su. IJCSES, 2(2), 92.
Challa, V. S., Indracanti, J., White, L. D., Sharma, S., Baham, J. M., Rabarison, M. K., & Anjaneyulu, Y.
(2007, September). Study of Meso-Scale Coastal Circulations in Mississippi Gulf coast with Mesonet
Observations and Modeling. In Seventh Conference on Coastal Atmospheric and Oceanic Prediction and
Processes.
Singh, S. P., Bhanot, K., & Sharma, S. (2016). Critical Analysis of Clustering Algorithms for Wireless
Sensor Networks. In Proceedings of Fifth International Conference on Soft Computing for Problem
Solving (pp. 783-793). Springer Singapore.
Sharma, S. (2015). An Extended Classification and Comparison of NoSQL Big Data Models. arXiv
preprint arXiv:1509.08035.
3
4
Download