On the Semantic Patterns of Passwords and their Security Impact

advertisement
On the Semantic Patterns
of Passwords and their
Security Impact
R A FA EL V E R A S , CHR I STOPHER COL L I N S, JU L I E T HOR P E
U N I V ERSITY OF ON TA RI O I N STITU TE OF T ECHN OLOGY
P R ESENTER: KYL E WA L L ACE
A Familiar Scenario…
User Name:
CoolGuy90
Password:
“What should I pick as my new password?”
A Familiar Scenario…
“Musical!Snowycat90”
A Familiar Scenario…
But how secure is “Musical!Snowycat90” really? (18 chars)
◦ “Musical” – Dictionary word, possibly related to hobby
◦ “!” – Filler character
◦ “Snowy” – Dictionary word, attribute to “cat”
◦ “cat” – Dictionary word, animal, possibly pet
◦ “90” – Number, possibly truncated year of birth
15/18 characters are related to dictionary words!
Why do we pick the passwords that we do?
Password Patterns?
“Even after half a century of password use in computing, we
still do not have a deep understanding of how people create
their passwords” –Authors
Are there ‘meta-patterns’ or preferences that can be
observed across how people choose their passwords?
Do these patterns/preferences have an impact on security?
Contributions
Use NLP to segment, classify, and generalize semantic categories
Describe most common semantic patterns in RockYou database
A PCFG that captures structural, semantic, and syntactic patterns
Evaluation of security impact, comparison with previous studies
Contributions
Use NLP to segment, classify, and generalize semantic categories
Describe most common semantic patterns in RockYou database
A PCFG that captures structural, semantic, and syntactic patterns
Evaluation of security impact, comparison with previous studies
Segmentation
Decomposition of passwords into constituent parts
◦ Passwords contain no whitespace characters (usually)
◦ Passwords contain filler characters (“gaps”) between segments
Ex: crazy2duck93^ -> {crazy, duck} & {2,93^}
Issue: What about strings that parse multiple ways?
Coverage
Prefer fewer, smaller gaps to larger ones
Ex: Anyonebarks98 (13 characters long)
Splitting Algorithm
Source corpora: Raw word list
◦ Taken from COCA (Contemporary
Corpus of American English)
Trimmed version of COCA:
◦ 3 letter words: Frequency of 100+
◦ 2 letter words: Top 37
◦ 1 letter words: a, I
Also collected list of names, cities,
surnames, months, and countries
Splitting Algorithm
Reference Corpus: Collection of NGrams, where N=3 (Full COCA)
◦ N-Gram: Sequence of tokens (words)
Ex: “I love my cats”
◦ Unigrams: I, love, my, cats (4)
◦ Bigrams: I love, love my, my cats (3)
◦ Trigrams: I love my, love my cats (2)
𝑃 𝑤1 , 𝑤2 , … 𝑤𝑛 =
𝑓(𝑤1 ,𝑤2 ,…,𝑤𝑛 )
𝑓(𝐾𝑛 )
Common Words
Part-of-Speech Tagging
Necessary step for semantic
classification
◦ Ex: “love” is a noun (my true love)
and a verb (I love cats)
Given segments 𝑠1 , 𝑠2 , … , 𝑠𝑛 ,
returns [ 𝑠1 , 𝑡1 , 𝑠2 , 𝑡2 , … , 𝑠𝑛 , 𝑡𝑛 ]
Gap segments are not tagged
Semantic Classification
Assigns a semantic classifier to each password segment
◦ Only assigned to nouns and verbs
WordNet: A graph of concepts expressed as a set of synonyms
◦ “Synsets” are arranged into hierarchies, more general at top
Fall back to source corpora for proper nouns
◦ Tag with female name, male name, surname, country, or city
Semantic Classification
Tags represented as
word.pos.#, where # is the
WordNet ‘sense’
Semantic Generalization
Where in the synset hierarchy should we represent a word?
Utilize a tree cut model on synset tree
◦ Goal: Optimize between parameter & data description length
𝐿𝑝𝑎𝑟
log 𝑆
𝑀 +𝑊
𝑆
𝐿𝑑𝑎𝑡 (𝑀)
W=1000 (gold), W=5000 (red), W=10000(blue)
Contributions
Use NLP to segment, classify, and generalize semantic categories
Describe most common semantic patterns in RockYou database
A PCFG that captures structural, semantic, and syntactic patterns
Evaluation of security impact, comparison with previous studies
Classification
RockYou leak (2009) contained over
32 million passwords
Effect of generalization can be seen in
a few cases (in blue)
◦ Some generalizations better than
others (Ex: ‘looted’ vs ‘bravo100’)
Some synsets are not generalized (in
red)
◦ Ex: puppy.n.01 -> puppy.n.01
Summary of Categories
Love (6,7)
Food (61, 66, 76, 82, 93)
Places (3, 13)
Alcohol (39)
Sexual Terms (29, 34, 54, 69)
Money (46, 74)
Royalty (25, 59, 60)
*Some categories expanded
from two letter acronyms
Profanity (40, 70, 72)
+Some
categories
contain
Animals (33, 36, 37, 92, 96 100)
noise from names dictionary
Top 100 Semantic Categories
Contributions
Use NLP to segment, classify, and generalize semantic categories
Describe most common semantic patterns in RockYou database
A PCFG that captures structural, semantic, and syntactic patterns
Evaluation of security impact, comparison with previous studies
Probabilistic Context-Free Grammar
A CFG whose productions have associated probabilities
◦ A vocabulary set (terminals) Σ = {𝑤1 , 𝑤2 , … , 𝑤𝑚 }
◦ A variable set (non-terminals) 𝑉 = 𝑁1 , 𝑁2 , … , 𝑁𝑛
◦ A start variable 𝑁1
◦ A set of rules 𝑁𝑖 → 𝜁𝑗 (terminals + non-terminals)
◦ A set of probabilities on rules, such that ∀𝑖 𝑗 𝑃 𝑁𝑖 → 𝜁𝑗 = 1
Semantic PCFG
In the author’s PCFG:
◦ Σ is comprised of the source corpora and learned gap segments
◦ 𝑉 is the set of all semantic and syntactic categories
◦ All rules are of the form 𝑁𝑗 → 𝑤𝑘 , or 𝑁1 → ξ (nonterminals)
This grammar is regular (described by a finite automaton)
Sample PCFG
Training data:
◦ iloveyou2
◦ ihatedthem3
◦ football3
𝑁1 rules are base structures
Grammar can generate passwords
Probability of a password is the
product of all rule probabilities
Ex: P(youlovethem2) = 0.0103125
RockYou Base Structures (Top 50)
Contributions
Use NLP to segment, classify, and generalize semantic categories
Describe most common semantic patterns in RockYou database
A PCFG that captures structural, semantic, and syntactic patterns
Evaluation of security impact, comparison with previous studies
Building a Guess Generator
Cracking attacks consist of three steps:
◦ Generate a guess
◦ Hash the guess using the same algorithm as target
◦ Check for matches in the target database
Most popular methods (using John the Ripper program)
◦ Word lists (from previous breaks)
◦ Brute force (usually after exhausting word lists)
Guess Generator
At a high level:
◦ Output terminals in highest
probability order
◦ Iteratively replaces higher
probability terminals with
lower probability ones
◦ Uses priority queue to
maintain order
Will this produce the same
list of guesses every time?
Guess Generator Example
Suppose only one base structure:
◦ 𝑁1 → 𝑃𝑁 𝑓𝑒𝑒𝑙𝑖𝑛𝑔. 𝑣. 01 𝑁𝑃 [𝑎𝑛𝑖𝑚𝑎𝑙. 𝑛. 01]
Initialized with most probable terminals: “I love Susie’s cat”
Pop first guess off queue (“IloveSusiescat”)
◦ Replace first segment: “youloveSusiescat”
◦ Replace second segment: “IhateSusiescat”
◦ Replace third segment: “IloveBobscat”
◦ Replace fourth segment: “IloveSusiesdog”
Mangling Rules
Passwords aren’t always strictly
lowercase
◦ Beardog123lol
◦ bearDOG123LoL
◦ BearDog123LoL
Three types of rules:
◦ Capitalize first word segment
◦ Capitalize whole word segment
◦ CamelCase on all segments
Any others?
Comparison to Weir Approach
Author’s approach seen as an evolution of Weir
◦ Weir contains far fewer non-terminals (less precise estimates)
◦ Weir does not learn semantic rules (fewer overall terminals)
◦ Weir treats grammar and dictionary input separately
◦ Authors semantic classification needs to be re-run for changes
Password Cracking Experiments
Considered 5 methods:
◦ Semantic approach w/o mangling rules
◦ Semantic approach w/ custom mangling rules
◦ Semantic approach w/ JtR’s mangling rules
◦ Weir approach
◦ Wordlist w/ JtR’s default rules + incremental brute force
Attempted to crack LinkedIn and MySpace leaks
Experiment 1: RockYou vs LinkedIn
5,787,239 unique passwords
Results:
◦ Semantic outperforms nonsemantic versions
◦ Weir approach is worst (67%
improvement)
◦ Authors approach is more
robust against differing
demographics
Experiment 2: RockYou vs MySpace
41,543 unique passwords
Results:
◦ Semantic approach
outperforms all
◦ No-rules performs best
◦ Weir approach is worst
(32% improvement)
◦ Password were phished,
quality lowered?
Experiment 3: Maximum Crack Rate
Since method is based on
grammar, can build grammar
recognizer to check
Results:
◦ Semantic equivalent to brute
force, with 1030 fewer guesses
◦ Weir approach generates fewer
guesses, 30% less guessed
Experiment 3: Time to Maximum Crack
Fit non-linear regression
to sample of guess probs.
Results:
◦ Semantic method has
lower guess/second
◦ Grammar is much larger
than Weir method
Issues with Semantic Approach
Further study needed into performance bottlenecks
◦ Though semantic method is more efficient (high guesses/hit)
Approach requires a significant amount of memory
◦ Workaround involves probability threshold for adding to queue
Duplicates could be produced due to ambiguous splits
◦ Ex: (one, go) vs (on, ego)
Conclusions
There are underlying semantic patterns in password creation
These semantics can be captured in a probabilistic grammar
This grammar can be used to efficiently generate probable passwords
This generator shows (up to) a 67% improvement over previous efforts
Thank you!
QUESTIONS?
Download