Uploaded by Riley Arneson

Intrusion Detection Systems using Machine Learnid

advertisement
Intrusion Detection Systems
using Machine Learning
By: Riley Arneson And Kellan Dempsey
Basics of Intrusion Detection Systems
● Intrusion
○
○
Defined as an attack that affects a network or computers reliability, privacy, or accessibility
Example Attacks Include
■ Denial of Service (DOS)
● Restricts access to network or computer
■ User-to-root (U2R)
● Attempt to gain admin access where previously the attack only had user access
■ Remote-to-local(R2L)
● Sending packets to target machine
● Intrusion Detection System (IDS)
○
○
A system designed to identify unauthorized activity and then notify admins or hosts of the
suspicious and potentially malicious activity
Typically positioned between the Network Switch and the Firewall
Concerns of Adequate IDS’s in new technology
● Unmanned Aerial Vehicles (UAVs)
○
○
Used in military and industrial applications for essential tasks such as surveillance and mission
control
A new IDS was proposed by Whelan et al. to handle potential risks associated with wireless
technology in a high threat environment with risks such as GPS spoofing and jamming
● Internet of things (IoT)
○
Due to the nature, multiple devices all exchanging data, and popularity of IoT devices there is
a high possibility of threats.
Types of IDS’s
● Deployment Based IDS
○
○
Host-based IDS
Network-based IDS
● Detection Based IDS
○
○
Signature-based detection
Anomaly detection-based
● Statistics Based Technique
Deployment Based IDS
● Host-based IDS
○
○
○
Installed on single machine
Monitors all activity on the host
Main drawback is each computer has to have the IDS installed, which can results in lower
overall performance of the IDS due to the increased processing power need on each node
● Network-based IDS
○
○
○
Deployed on a network
Monitors all traffic that occurs on the network
Often uses Machine and Deep learning
Machine Learning in Network-Based IDS
1. Preprocessing - Essential to ML as it makes results in faster and more
accurate outputs
a. Data Cleaning - Handles inconsistency in data, missing data, and outliers
b. Feature Selection/Extraction - Selecting features or removing features based on relevance
c. Feature Scaling - Standardizes all features so features don’t dominate based solely on their
magnitude
d. Handling Categorical Variables - Process of transforming sex,country, grade into numeral
values
e. Data splitting - Split data into training,validation, and testing sets
f. Handling Data Imbalance - If classes in datasets are imbalance employ techniques such as
oversampling or undersampling
2. Training - 80% of the original data set is used to train the ML algorithm
3. Testing - 20% is used to test the accuracy of the ML algorithm
Detection Based IDS’s
● Signature-based Detection
○
○
○
Focuses on identifying known attack patterns by recognizing attack signatures
Requires a database of known attack pattern signatures which is used to compare intrusions
against
Extremely good at recognizes known attacks, however can’t detect novel attacks or patterns
● Anomaly detection-based
○
○
○
○
Establishes a profile of what a normal user does
Deviation from the pattern is flagged as an attack
Can detect novel attacks
Higher false alarm rate
Statistics-Based Technique
● Collects and analyzes data to build a regular user behavior statistical model
○
○
○
Two types of models
Univariate
■ Only models a single feature in isolation
■ No regard to its relationship with other variables
Multivariate
■ Analyzes multiple variables at one time and considers the relationship to one another
■ More common in complex systems with lots of moving parts
IDS Feature Engineering
● Used for multiple reasons
○
Reduces computational complexity, removes and simplifies redundant data, decreases false
alarm rates, and improves the accuracy of machine learning algorithms
● Large amounts of data are generated across different fields
○
○
Causes issues as the large datasets require more computational resources
Creates a need for feature engineering to make the amount of data more manageable
● Feature engineering is a deeply researched and highly utilized field
○
○
Used for pattern recognition, machine learning, and intrusion detection, among other things
Has incorporated simple and comprehensive models to improve performance of feature
engineering
Feature Engineering Algorithms
● Multiple advantages
○
Reduces cost of data collection and classification model training, improves classification
performance, allows smaller model sizes, and may improve interpretability of a model
● Feature extraction
○
○
○
Combines features based on important characteristics
Effective method for improving a learning algorithm
Can complicate future analysis
● Feature selection
○
○
Chooses the most important features
Keeps the original features
IDS Evasion Techniques
● Obfuscation
○
○
All techniques which aim to avoid detection by making it harder to recognize malicious activity
Fragmentation, flooding, and encryption are methods of obfuscation
● Anonymizers
○
○
○
All techniques which attempt to hide information about malicious network traffic
Information manipulation (routing, ip, port), proxy servers, and VPNs are all anonymizers
Tor network, an anonymization system which encrypts and routes network traffic through
nodes to make it harder to find the original source
Obfuscation 1
● Fragmentation
○
○
○
Tries to bypass an IDS by splitting a packet and reassembling once transmitted
IDS may scan the split packets individually and not recognize the danger
IDS may use algorithms that attempt to reform and analyze the fragmented packets as a
counter
● Flooding
○
○
○
○
Technique which attempts to overwhelm the system by creating a high amount of traffic
Once system resources are in use, it may be possible for harmful data to get through the IDS
Dos And DDos attacks attempt to flood the system with a large amount of traffic in order to
attempt to disrupt the system
Protocol level flooding creates TCP/IP or ICMP protocols to attack the protocol handlers within
the IDS
Obfuscation 2
● Encryption
○
○
○
○
Transforms the data into an unreadable form
As the IDS uses encryption for safety, this can make it difficult to ensure security
IDS can use metadata to help identify suspicious patterns and behavior which may indicate
malicious intent
Some encrypted data can be decrypted, analyzed, and encrypted again to check it
Anonymizers 1
● Source Routing
○
○
○
Manipulates the routing path of network packets to slip past an IDS
Loose source routing specifies some routers along the path and may deviate from the most
common path to confuse or pass an IDS system
Strict source routing specifies the exact router path the data must follow, allowing the packet to
avoid security measures
● Source Port Manipulation
○
Alters the source port value of network packets to avoid appearing suspicious and bypass an
IDS which may have less security for common source ports such as HTTP or HTTPS
Anonymizers 2
● Spoofing IP address
○
○
Alters the source IP address to make a packet look like it is coming from a trusted IP
Used in DDoS attacks to get around situations where the IDS limits traffic from one IP
● Address Decoy
○
○
Manipulates network addresses during communication with a network by changing the IP
address of the destination in the header of the packet
Modifying the destination can make it appear like it is heading for a different IP address than
where the packet is actually heading
● Proxy servers
○
○
○
Uses an intermediary server between the source and destination to attempt to hide activity
from an IDS
Hides the source of the malicious data
Can use encryption and tunneling to further protect the source of the malicious data and allow
the attackers to communicate in a secure channel
Download