Uploaded by Aadam Bodunrin

Analysis of Tools Available for Use in Cybersecurity

Analysis of Tools Available for Use in Cybersecurity
Zachary Sherman-Burke and Aadam Bodunrin
Department of Computer Science
East Carolina University
Greenville, NC, 27858, USA
shermanburkez12@students.ecu.edu bodunrina22@students.ecu.edu
Abstract
Models (LSTM), and Transformer Models. This paper aims
to evaluate these tools, their uses, and new developments
that have arisen as a result. Once these tools have been
examined, this paper will aim to propose a new approach
using reinforcement learning to be utilized by cybersecurity
experts.
This paper is organized as follows: in Section 2 we discuss what research or uses are utilizing the tools in cybersecurity use and how they are implemented. Section 3 provides the underlying architecture of the tools discussed in
this paper. Section 4 discusses a new tool utilizing reinforcement learning in cybersecurity. Section 5 outlines how
the environment will be addressed. Section 6 addresses how
the reward function will be created. Section 7 will discuss
the agent and how it all comes together. Section 8 discusses
the containerization of the reinforcement learning program.
We close the document with conclusions and suggested avenues of research.
Cybersecurity is a critical part of almost all organizations’ infrastructure in virtually every industry. With recent
advancements in technology, specifically Large Language
Models, there have arisen many new threats and even the
creation of many new adversaries which did not exist before due to a lack of skill or knowledge. Prior to the rise
of large language models (LLM), there was already a struggle among cybersecurity experts to keep up with attackers.
Now that numerous LLMs have been released to the public,
such as ChatGPT, and models continue to be improved and
released, the struggle cybersecurity experts face has grown
even more dire. As a result, there is an even greater need for
cybersecurity experts to assess, understand, and fully utilize the tools available to them. The first part of this study
examines the current tools that exist for use by experts to
help defend and fortify a given network. The second part
proposes an approach for utilizing reinforcement learning
as a tool for further protecting a cybersecurity system via
anomalous activity detection.
2. Related Work
The experiments described in this study are aimed at examining how machine learning tools can be used in the defense of a network against attackers.
Keywords:
1. Introduction
2.1
With recent advancements in Artificial Intelligence (AI)
and Machine Learning (ML), there has been an increase in
cybersecurity tools and threats. Developments in ML, especially Large Language Models (LLM) and derived tools,
such as ChatGPT, have not only given malicious actors the
tools needed to be much more efficient and dangerous, but it
has also created a new market that did not exist before. With
tools like ChatGPT, it is now possible for people without the
technical knowledge or training to create tools and methods
of attacking networks and proprietary systems with only a
description of what they want to do.
The tools that will be examined in this paper are Knowledge Graphs (KG), Docker, Long Short-Term Memory
Detecting Insider Threats with LSTM
When a malicious actor is attempting to attack an organization they typically first target the network in some way,
such as phishing emails, vulnerability exploits, or improper
use of credentials. With the breadth of network activity,
web browsing, authentication, daily job use, etc..., one of
the most important aspects of a cybersecurity expert’s job is
understanding and fully utilizing the tools available to them.
With the rise in power and capability of ML, researchers
have started looking at ways it can be implemented in the
defense of a system. One tool that has been developed by
Lopez and Sartipi utilizes an LSTM model to analyze electronic logs in order to create a probabilistic model that is
1
used to analyze activity and determine the likelihood that a
given event is a threat [1]. If an event is determined to be
a threat it can then individually be analyzed by an end-user
without having to go through all of the logs. The work done
in this paper shows the viability of LSTM models to parse
very large electronic logs to detect anomalous behavior.
2.2
Paying
Threat
Attention
to
the
formed in utilizing them together for a variety of purposes.
One area of research looked at by Antoine Bosselut, is the
utilization of a transformer to automatically build a knowledge base based on the relation between different concepts,
attributes, etc..., in effect creating a knowledge graph. The
transformers created in this paper are referred to as COMmonsense Transformers (COMET) and have shown promise
for the automatic construction of knowledge graphs [5].
Insider
In addition to the research performed with LSTMs,
Lopez and Sartipi also researched the viability of using
Bidirectional Encoder Representations from Transformers
(BERT) as a tool for cybersecurity by means of anomaly
detection and user behavior prediction. The transformer is a
newer type of machine-learning model than LSTMs and differs in underlying architecture each with its own strengths
and weaknesses. BERT is a special type of transformer that
only consists of an encoder that can pass information forward and backward in the network. This research was performed by training BERT on the Los Alamos cybersecurity
events data set [2]. Once BERT was initially trained on the
Los Alamos data set, the model was updated and fine-tuned
on 14 days. The model was fine-tuned on the delay of a day.
Finally, the model was updated and tested on a time frame
of one second. What the results showed is that a model was
able to be built was able to go through activity logs, identify
data, and alert a user to a threat within a second of the threat
occurring with a high degree of accuracy [3].
2.3
3. Background
3.1
Artificial neural networks (ANN), also known as simple
neural networks (NN), are a component of most machine
learning models. NNs work by mimicking the way that biological neurons signal to one another. At its most basic
a neural network takes inputs through the input layer, sends
them sequentially through a series of processing steps in the
hidden layers, and then produces an output at the final layer,
known as the output layer (Figure 1). The input layer takes
external information processes it, analyzes it, and passes it
to the next layer. The hidden layer takes input from the input layer or another hidden layer analyzes the output of that
layer, processes it further, and passes it to the next layer.
The output layer will take the final processed data and provide the final result. A simple ANN utilizes a feed-forward
mechanism in which each layer always passes information
to the next.
Knowledge Graphs
Knowledge graphs are a powerful tool in the fight against
hackers and are capable of network data aggregation, data
integration, and knowledge discovery [4]. Knowledge
graphs are also inherently helpful in cybersecurity by allowing the tracking of a network and possible hacker activity to understand the breadth and depth of the attack. While
a knowledge graph is more of a concept, tools like Neo4j
exist which allow for easy knowledge graph creation with
querying, adding, removing, and filtering nodes. A tool like
Neo4j would be helpful as it allows for the visualization of
a knowledge graph with connected nodes which could be
used to aid in visually spotting outliers and tracking their
activity throughout a network. Overall, Knowledge graphs
can serve as an excellent tool to help those who need to
defend a network from attackers by providing visual information to help in making the decisions that can isolate a
hacker and keep the rest of the network safe.
2.4
Neural Networks
Figure 1. Overview of a Neural Network
An advancement on the simple neural network is the recurrent neural network (RNN). RNNs follow the same fundamental principles as the ANN but introduce the concept
of memory by allowing the output of one layer to be used
as the input of previous layers.
Hybrid Approaches
While knowledge graphs and ML models are both powerful tools on their own merits, recent research has been per2
Figure 2. Overview of a Simple Recurrent
Neural Network
A shortcoming of RNNs is their inability to preserve information for use later due to what is known as gradient vanishing and gradient exploding problems [2][3]. The source
of both problems is that as time goes on the weights for each
hidden layer neuron are updated but if the weights get too
small the first part of your network will be overshadowed by
the later half and all of the initial information could be lost.
Conversely, the gradient exploding problem is the exact opposite, the weights get too big, and the first part of your
network will dominate the rest. LSTMs were developed, in
part, to address these issues.
3.2
Figure 3. Transformer Architecture
A further refinement of the transformer model is a model
known as bidirectional encoder representations from transformers (BERT). BERT differs from a typical transformer
in a few ways different ways. A few differences between
a transformer and BERT are BERT drops the decoder and
only uses an encoder, BERT is typically pre-trained on an
unsupervised learning task, whereas a transformer typically
uses supervised training, and BERT was developed specifically for NLP-based tasks whereas Transformers are focused on generating an output sequence from an input sequence. In addition to machine learning-based tools, there
also exists a tool known as Docker.
LSTM
An LSTM is a type of recurrent neural network (RNN)
that has ”long-term memory” and ”short-term memory” that
saves information for use later to avoid the gradient vanishing problem. At a high level LSTMs analyze sequences
as a whole and individually and by doing this can support
both long and short-term memory to prevent gradient issues.
Mathematically LSTMs work by adding an additive update
function which provides better-defined behavior as well as a
gating function which provides a direct mechanism for controlling how much the gradient vanishes or grows at each
step [8].
3.3
3.4
Transformers
Docker
Docker is software that allows for OS virtualization to
create environments, also called containers, almost completely separate from the hard and host operating system
(OS). A Docker container can contain scripts, programs, or
even its own OS, that exists entirely within the container
separated from the host. By providing separation Docker
also allows for the creation and use of multiple OSs on a single set of hardware. Docker’s primary purposes are creating
test environments to protect your system, complete separation of systems with minimal hardware, and easy and quick
deployment of all necessary tools, programs, and packages
for a given role. All of these uses can be applied to cybersecurity making Docker a useful tool. Docker can be used to
create environments to simulate networks, attacks, or malware, it can also be used to create a complete separation
between critical infrastructure even if it is located on the
One of the newest advances in AI and ML is the concept
of Transformers, introduced by Vaswani et al. [9]. Transformers take the concepts of an RNN and build on it using
a mechanism known as self-attention to allow for parallel
processing. Self-attention works by ”modulating the representation of a token by using the representations of related tokens in the sequence” [6]. Transformers are able to
parallelize by performing multiple self-attentions together
in a mechanism known as Multi-Head Attention (Figure 3).
The two main components of a transformer are the encoderdecoder. An encoder extracts important features for use and
the decoder takes the output from the encoder, processes
the data in multi-head attention layers, shifts the data in one
position, and produces an output.
3
same hardware. Docker is also useful for creating entire deployments for cybersecurity employees or students wanting
to learn more about the field. In addition to Docker, another
non-ML-based tool are knowledge graphs.
3.5
KGs in cybersecurity could allow for the automation and
combination of the most commonly used tools in cybersecurity. As shown by Lopez and Eduardo [1][3], ML models
can be built to parse logs to find anomalous activity and predict user behavior in near real time. Using KGs you could
pull in and build known databases which are used for storing and tracking malicious sites, emails, software, etc.., and
combine them with the ML models to improve accuracy,
continually build out databases, and help keep track of activity.
Knowledge Graph
Knowledge graphs are collections of data that provided
relations between the data, (nodes) points based on a given
relation (edges). KGs serve purposes in many fields including social media, organizational structure, questionanswering systems (QAS), and cybersecurity. The ability
of KGs to track and maintain relations as well as communicate them to humans and computers is where the strength
lies. In cybersecurity, one of the most valuable things an
expert could know or have access to is an overview of their
network and connections. Understanding a network and the
communications that take place helps to build a strong, layered defense that can be adjusted per layer as needed. By
having an overview of your network, you can also better assess the potential risk associated with a compromised system based on where it lies in the network and what connections it makes.
3.6
4. Approach
The remaining sections outline how cybersecurity tools
discussed in Section 3 can further be expanded on the work
introduced in Section 2. This study proposes the use of reinforcement learning for the creation of a model to analyze
and detect anomalous network activity.
The first step is to outline the environment to operate
in and then design a reward function and agent. Once the
model is complete and runs successfully, the next step is
to take the model and its components and create a Docker
container for use as a demonstration and teaching tool.
Machine Learning Integrated Knowledge Graphs
4.1
Reinforcement Learning
The approach proposed in this paper is to use reinforcement learning as a means of anomaly detection. The ML
techniques discussed so far have fallen into either one of
two categories, supervised or unsupervised learning. RL,
however, is a type of machine learning which falls outside
of both of these categories (Figure 4)
Another approach that has been proposed is a combination of MLs and KGs in conjunction to help overcome the
weaknesses of the other. As previously discussed, ML models and KGs are useful tools for cybersecurity, but the ability to use them together could provide even greater tools.
There has been recent research and developments in the integration of ML models and KGs [5][10]. The strength of
knowledge graphs is that they have clear, structured connections between data based on a given set of relations. The
strength of KGs can also serve as a weakness when a user
does not have enough knowledge or information to provide
the information for the knowledge graph to create or find
the relations needed or if a user makes spelling or grammatical errors the search results might be completely irrelevant to the user’s desired inquiry. The converse is true of
ML models, ML models excel at inference, understanding,
and finding relevant information based on the context of a
query, but the information might only be partially relevant,
only partially true, or sometimes entirely fabricated since it
is creating a response rather than pulling it from a database.
KGs could provide a foundation for ML models so that
they are bound by a set of rules so the information provided is significantly more likely to be accurate and relevant rather than creating an answer that is irrelevant or fabricated. Similarly, ML models can allow for better interpretation and understanding of a query. Using ML-integrated
Figure 4. Types of Machine Learning [11]
In supervised learning the model is presented with a set
of correct actions to compare performance against. In unsupervised learning the data is unlabeled and the goal is to
4
group data based on inherent similarities and differences.
RL differs from supervised learning in that the feedback
provided is rewards and punishments as signals for negative and positive behaviors. When compared to unsupervised learning, RL’s difference comes from the goal. In RL
the goal is to maximize the agent’s total cumulative reward,
whereas in supervised learning the goal is to find differences
based on the inherent properties of the data.
The main components of an RL model are the agent,
environment, state, action, and reward. The agent is the
component that makes decisions, receives punishments and
rewards, and interacts with the environment. The environment is the world in which the agent operates and interacts with. The state is the observation the agent performs
on the environment after performing an action and is the
current situation of the agent. Action is what the agent
performs on the environment based on the agent’s observation. Reward is the feedback the agent receives based
on the action that was performed, it can be either positive or negative. These components remain the same for
each of the three types of reinforcement learning implementation, Value-based, Policy-based, and Model-based. The
approach behind model-based reinforcement learning is to
find the optimal function value. In policy-based methods,
the agent works to develop a policy so that the actions performed in each state help to maximize future rewards. In
model-based approaches, virtual models are created for every environment and the agent explores the model to learn
it. The algorithms behind these implementations fall into
one of two categories: model-free RL algorithm or modelbased RL algorithm.
The approach outlined by this paper will propose the utilization of a Deep Q-Learning which is a model-free, valuebased algorithm that maximizes future rewards. The reason
for focusing on Deep Q-Learning for this study is the desire to create a model that is able to quickly and efficiently
parse authentication records to correctly identify activity.
The goal of the model will be to classify whether an activity is normal or anomalous while building the most efficient
algorithm for parsing the log.
parse such a large file but since we also heavily value accuracy, a time penalty will be applied to incorrectly labeled
data. This leads to the next component of our model, the
reward function.
6. Reward Function
Add the Discussion about the proposed approach here.
7. Agent
The agent will be a Deep Q-Network (DQN) Agent
which is a value-based agent that operates in a discrete action space but can operate in a continuous or discrete observation space. DQN Agent is best for the goal of this
study as the actions that our model are deciding whether
an activity is normal or anomalous and deciding the next
communication activity to examine. The primary goal of
the agent is to examine each communication quickly and
label it accurately. To simplify the agent’s goals, the agent
will look solely at the time and each incorrect labeling will
incur a time penalty. We believe this is the best approach
for the agent, since in a real situation if the model falsely
alerts someone to an issue that person would need to spend
time verifying the false alarm and thus be delayed in the
event of a real threat coming shortly after. Overall the agent
will operate in a network communication environment that
occurs in real-time, in an attempt to classify all communication in and efficient and accurate manner alerting a third
party when an anomaly is detected.
8. Containerization
The goal of this study is to create a tool that can be utilized by cybersecurity experts and educators to demonstrate
the capabilities of reinforcement learning, provide a model
to experiment with and encourage further research in this
area. In order to share this research, the code and data will
be containerized into a Docker container that contains all of
the resources needed to duplicate the work performed in this
study. Docker was chosen because it allows for a portable
solution for others to copy this work and it allows for the
installation of necessary packages and dependencies once
upon creation.
5. Environment
The environment will consist of a pseudo-continuous
network communication space, the data log, in which the
agent will be responsible for navigating entries to classify
and identify anomalous behavior. At each state, the environment will consist of a timestamp, communication (source
and destination computer and user), domain, authentication, and logon types, orientation, and the communication
outcome. While the primary goal of the agent is to detect anomalous activity, the key parameter monitored will
be time. The goal will be to find the most efficient way to
9. Conclusion
To conclude this study, the principles outlined in Section
3 provide the basis for the tools outlined in Section 2. The
proposed research in Sections 4, 5, 6, and 7 provides a basis for continued research. Section 8 outlines the plan and
methodology to make the work that will be performed as a
result of this paper accessible and repeatable.
5
ML and KGs form the foundation for the majority of the
cybersecurity tools put forth in this study. Due to the complex and interwoven nature of network communications ML
and KG show great promise for
10. References
[1] E. Lopez and K. Sartipi, “Detecting the Insider Threat with Long Short Term Memory (LSTM)
Neural Networks,” arXiv:2007.11956 [cs], Jul. 2020,
Accessed: Apr.
13, 2023.
[Online].
Available:
https://arxiv.org/abs/2007.11956
[2] A. D. Kent. Comprehensive, Multi-Source Cybersecurity Events, 2015.
[3] E. Lopez and K. Sartipi, “Paying Attention to the Insider Threat,” Proceedings of the 34th International Conference on Software Engineering and Knowledge Engineering,
Jul. 2022, doi: https://doi.org/10.18293/seke2022-059.
[4] L. F. Sikos, “Cybersecurity knowledge graphs,”
vol. 65, no. 9, pp. 3511–3531, Apr. 2023, doi:
https://doi.org/10.1007/s10115-023-01860-3.
[5] A. Bosselut, H. Rashkin, M. Sap, Chaitanya, M.
Asli Celikyilmaz, and Y. Choi, “COMET : Commonsense
Transformers for Automatic Knowledge Graph Construction,” 2019. Accessed: Jul. 26, 2023. [Online]. Available:
https://aclanthology.org/P19-1470.pdf.
[6] F. Chollet, Deep Learning with Python. Shelter Island (New York, Estados Unidos): Manning, Cop, 2018.
[7] S. Hochreiter, “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and
Problem Solutions,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,
vol. 06, no. 02, pp. 107–116, Apr. 1998, doi:
https://doi.org/10.1142/s0218488598000094.
[8] N. Arbel, “How LSTM networks solve the problem of vanishing gradients,” Medium, May 16, 2020.
https://medium.datadriveninvestor.com/how-do-lstmnetworks-solve-the-problem-of-vanishing-gradientsa6784971a577.
[9] A. Vaswani et al., “Attention Is All You Need,”
arXiv.org, 2017. https://arxiv.org/abs/1706.03762
[10] N. Rohrseitz, “Knowledge Graphs and
Machine Learning,” Medium, Feb.
13, 2022.
https://towardsdatascience.com/knowledge-graphs-andmachine-learning-3939b504c7bc
[11]
S.
Bhatt,
“Reinforcement
Learning
101,”
Medium,
Apr.
19,
2019.
https://towardsdatascience.com/reinforcement-learning101-e24b50e1d292
[12]
[13]
[14]
[15]
6