Uploaded by teendraft

thuraisingham2018

advertisement
2018 IEEE International Parallel and Distributed Processing Symposium Workshops
Integrating Cyber Security and Data Science for Social Media:
A Position Paper
Bhavani Thuraisingham, Murat Kantarcioglu, Latifur Khan
Erik jonsson school of engineering and computer science
The University of Texas at Dallas
Richardson, TX, USA
Emails: {bhavani.thuraisingham, muratk, lkhan}@utdallas.edu
The goal is to reduce false positives and negatives as well as
improve accuracy of the prediction. However, more recently
with the ability to handle large volumes of data, there is a
lot of interest in applying data science for cyber security.
The goal is to first develop a model using training data and
then apply this model for test data to determine say whether
the code is malicious or not or whether an employee in an
organization is a threat to the organization. One of the major
challenges that we have to address is that the malware may
change patterns. Therefore, we need solutions to handle such
zero-day attacks.
We have made many contributions to applying data
science to cyber security and in particular developed a
technique called novel class detection [8]. The idea is to
train the model using classification techniques to detect
pre-defined classes as well as new (called novel) classes.
We have applied our technique to streaming data for many
applications including to detect new malware or insiders
[7]. As progress is made with the various data science
techniques, we believe that detecting unusual patterns in
large amounts of streaming data will be vastly improved with
fewer false positives and false negatives and higher accuracy.
While this area has a lot of promise, it is very difficult
to obtain datasets containing various malware. To address
this problem, we are collecting datasets and hosting them at
the University of Arizona (on a collaborative NSF project)
and some preliminary work is discussed in [9]. We need to
come together as a community to share the various datasets
at the unclassified level so that researchers can carry out
experimentation in say malware analysis and insider threat
detection.
Abstract—Cyber security and data science are two of the
fastest growing fields in Computer Science and more recently
they are being integrated for various applications. This position
paper will review the developments in applying Data science
for cyber security and cyber security for data science and then
discuss the applications in Social Media.
Keywords-Cyber Security; Data Science; Social Media; Fake
News; Adversarial Machine Learning; Malware Analysis.
I. I NTRODUCTION
Cyber security and data science are two of the rapidly
growing fields in computer science as well as in related
areas such as statistics and social science. Data science
integrates areas such as data mining, data management,
statistical reasoning, machine learning and high-performance
computing. The goal is to analyze large volumes of heterogeneous data and uncover hidden dependencies as well as
make predictions. Cyber security is about controlling access
to the data and ensuring that the data is not maliciously
corrupted.
Data science is being applied to cyber security in areas
such as intrusion detection, malicious code detection, and
insider threat detection among others [1], [2]. Cyber security
is also being applied to data science in areas like adversarial
machine learning [3], [4]. Some applications have been
discussed in social media [5], [6], [7]. This position paper
will examine the developments in applying one area to the
other (i.e., data science and cyber security) as well as discuss
the applications in social media.
The organization of the paper is as follows. Section
II describes Cyber Security for Data Science. Section III
discusses Data Science for Cyber Security. Some of the
applications in Social Media are discussed in Section IV.
The paper is concluded in Section V.
III. C YBER S ECURITY FOR DATA S CIENCE
While much progress has been made on applying data
science for cyber security, many of the techniques do not
take into consideration the attackers behavior in developing
the models. We were one of the early teams to focus in this
area and develop techniques for what has come to be called
adversarial machine learning. In this technique we take into
consideration the potential attacks by the attacker such as
injecting good data from time to time as well as say varying
II. DATA S CIENCE FOR C YBER S ECURITY
Data science techniques have been applied to cyber security problems since the 1990s. For example, data mining
techniques have been applied for areas such as intrusion
detection, malware analysis, and insider threat detection. The
idea here is to train the model with the training data (e.g.,
past experiences) and then test the model with the test data.
978-1-5386-5555-9/18/$31.00 ©2018 IEEE
DOI 10.1109/IPDPSW.2018.00178
1163
the packet lengths with respect to network traffic. We train
the model taking into consideration the attackers behavior
during deployment time. We designed and developed a
technique called Adversarial Support Vector Machine (ADSVM) and we have shown that AD-SVM detects many of
the attacks carried out by the attacker [3].
More recently there is a lot research on adversarial machine learning and there are conferences devoted to cyber
security analytics [10]. The challenge we are faced with is
modeling all types of attacks. For example, in our work
we considered a limited number of features such as varying
packet lengths. But in reality, there could be hundreds of
attacks, each resulting in numerous features. Therefore, how
do we take into consideration the majority of the attacks?
How do we get datasets to test our results? We have many
challenging problems to work on in this area.
Another related area that is showing promise is trustworthy analytics. That is, how do we ensure that the data
analytics techniques are secure and trustworthy [11]? The
developments with the Intel SGX technology are providing
progress in this area. The challenges are to explore ways to
use such technologies to prevent attackers from modifying
the data analytics techniques.
A third area that has received attention for over a decade is
privacy- preserving data analytics [12]. The key idea is how
we carry out data analysis and at the same time ensure the
privacy. A promising area of research is secure multiparty
computation [13].
the same for both there are some serious differences. With
the first, malicious users are posting false information about
others that could damage their reputation. For example, a
man/woman may post a photograph of him/her with a blackeye they claim is inflicted by their spouse, The end result
could be damaging to a spouse especially and he/she could
be fired from his/her job right away well before the legal
process runs its course. This is very distressing. But, on the
other hand, if the photo is genuine and it is the spouse who
inflicted the injury, then appropriate actions have to be taken.
In the second case, some users may post false information
about themselves such as traveling to exotic cities and having
certain prestigious degrees. Then there are those who may
pretend to work for a company even though they have likely
been laid off or at least are no longer with the company.
This is a form of lying that can affect others. For example,
two people apply for a job interview and the candidate who
posts false information may get the job. That is, it is likely
that some users may post false information and as a result
serious decisions may be made based on such information
(e.g., job offers). How do we prevent such situations from
occurring? Some preliminary research is reported in [15].
Also, it is now well known that social media companies
are giving their data to researchers for experimentation and
this data is being misused as a result of the researchers
sharing the data with organizations without authorization.
This is one of the major challenges that is facing social
media companies. The question is also whether we need
regulations as to what information can be shared by the
social media companies. We need to focus on assured
information sharing for social media applications [16]. Some
are saying that once a person posts personal information
on social media then there is nothing the company can do.
However, just as in the case of an automobile company,
the social media company should discuss the policies with
the user and give the user sufficient warning of all the
consequences of posting the data. There is also the case
of whether a social media user can post information such
as photos with the permissions of those in the photos. Also,
should there be a policy that no photos should be posted of
say children under a certain age? Much of our research on
the inference problem is relevant to detecting violations that
can result due to the inference and aggregations of the data
[17]
IV. S OCIAL M EDIA A PPLICATION
One application area that has shown a lot of promise to
integrate cyber security and data science is social media.
Social media applications have to be secure. Social media
data has to be analyzed to extract nuggets for social good
and numerous data science techniques have been applied to
social media applications for over a decade. At the same
time, it is important to ensure the privacy of the individuals.
Our early work focused on designing access control techniques for social media applications [6]. Next, we designed
various social media analytics techniques. For example, our
research has focused on analyzing tweets for security as
well as marketing applications [7]. It is important for social
media analytics to preserve the privacy of the individuals.
Therefore, our work has also focused on privacy-enhanced
social media analytics [5]. We have shown that even though
social media users do not give out private information, it is
possible to extract private information from the information
posted on their social media sites. Attacks on social media
is also an active area of research [14].
One of the major challengers we are faced with today is
what has come to be known as Fake News. How do we
prevent false and damaging information from being spread
on social media? How do we ensure that the social media
data is accurate and has high integrity? While the idea is
V. C ONCLUSION
This paper has discussed the application of data science
to cyber security and cyber security to data science. We also
discussed the applications in social media.
There are many areas for future research. First, we need
improved data science techniques that can handle massive
amounts of data rapidly. We also need better models for
adversarial machine learning that take into consideration a
wider range of attacks. Finally, we need to continue working
1164
on integrating cyber security and data science for social
media applications. This includes addressing the challenging
problem of spreading Fake News. Just as we have done in
data privacy during the past decade, we need technologists,
policy makers, social and political scientists, and legal
experts to work together to develop viable solutions to this
problem.
[13] M. Kantarcioglu and J. Vaidya, “Secure multiparty computation methods,” in Encyclopedia of Database Systems.
Springer, 2009, pp. 2535–2539.
R EFERENCES
[15] L. Fan, Z. Lu, W. Wu, B. Thuraisingham, H. Ma, and Y. Bi,
“Least cost rumor blocking in social networks,” in Distributed
Computing Systems (ICDCS), 2013 IEEE 33rd International
Conference on. IEEE, 2013, pp. 540–549.
[14] Y. Alufaisan, Y. Zhou, M. Kantarcioglu, and B. Thuraisingham, “Hacking social network data mining,” in Intelligence
and Security Informatics (ISI), 2017 IEEE International Conference on. IEEE, 2017, pp. 54–59.
[1] M. Masud, L. Khan, and B. Thuraisingham, Data Mining
Applications in Malware Detection. CRC Press, 2011.
[2] B. Thuraisingham, L. Khan, P. Parveen, and M. M. Masud,
Big Data Analytics with Applications in Insider Threat Detection. Auerbach Publications, 2017.
[16] T. Cadenhead, M. Kantarcioglu, V. Khadilkar, and B. Thuraisingham, “Design and implementation of a cloud-based
assured information sharing system,” in International Conference on Mathematical Methods, Models, and Architectures
for Computer Network Security. Springer, 2012, pp. 36–50.
[3] Y. Zhou, M. Kantarcioglu, B. Thuraisingham, and B. Xi,
“Adversarial support vector machine learning,” in Proceedings of the 18th ACM SIGKDD international conference on
Knowledge discovery and data mining. ACM, 2012, pp.
1059–1067.
[17] B. Thuraisingham, T. Cadenhead, M. Kantarcioglu, and T. Cadenhead, “Access control and inference with semantic web,”
in CRC Press, 2014.
[4] Y. Zhou and M. Kantarcioglu, “Modeling adversarial learning
as nested stackelberg games,” in Pacific-Asia Conference on
Knowledge Discovery and Data Mining. Springer, 2016, pp.
350–362.
[5] R. Heatherly, M. Kantarcioglu, and B. Thuraisingham, “Preventing private information inference attacks on social networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 8, pp. 1849–1862, 2013.
[6] B. Carminati, E. Ferrari, R. Heatherly, M. Kantarcioglu,
and B. Thuraisingham, “Semantic web-based social network
access control,” computers & security, vol. 30, no. 2-3, pp.
108–115, 2011.
[7] B. Thuraisingham, S. Abrol, L. Khan, R. Heatherly,
M. Kantarcioglu, and V. Khadilkar, Analyzing and Securing
Social Networks. Auerbach Publications, 2016.
[8] M. Masud, J. Gao, L. Khan, J. Han, and B. M. Thuraisingham,
“Classification and novel class detection in concept-drifting
data streams under time constraints,” IEEE Transactions on
Knowledge and Data Engineering, vol. 23, no. 6, pp. 859–
874, 2011.
[9] R. Paranthaman and B. Thuraisingham, “Malware collection
and analysis,” in Information Reuse and Integration (IRI),
2017 IEEE International Conference on. IEEE, 2017, pp.
26–31.
[10] R. Verma and B. Thuraisingham, “Privacy-preserving data
mining,” in International Workshop on Security And Privacy
Analytics, IWSPA@CODASPY. ACM, 2017.
[11] S. Chandra, V. Karande, Z. Lin, L. Khan, M. Kantarcioglu,
and B. Thuraisingham, “Securing data analytics on sgx with
randomization,” in European Symposium on Research in
Computer Security. Springer, 2017, pp. 352–369.
[12] R. Agrawal and R. Srikant, “Privacy-preserving data mining,”
in ACM Sigmod Record, vol. 29, no. 2. ACM, 2000, pp.
439–450.
1165
Download