Uploaded by Keerthana Chirumamilla

intro to ml

advertisement
Apologies and Announcements
Website will be up within this weekend
Apologies for delay in initiating the discussion
Please finalize your assignment groups asap
Groups can have no more than 6 members
Recommended to have at least 4 members
Groups cannot contain unregistered students
Course list finalized – will be put up to help identify group members
Course members who are unable to join a group will be clubbed
Must authenticate
your sensors so
that tampering
can be detected!
Couldn’t you have
told me earlier?!
Authentication by Secret Questions
Give me your A/C number and
answer the following questions
1. What is your date of birth?
2. What is your pet’s name?
3. How many marks did you
get in 10th standard exams?
4. How many cars do you own?
5. …
BANK
USER
SBI31415926535
1. 05th August 2000
2. Mr. Bud Bud
3. err … couldn’t
hear you clearly
4. None, so give me
that loan already!
5. …
Authentication by Secret Questions
Using PUFs
Give me your device ID and
answer the following questions
1. 10111100
2. 00110010
3. 10001110
4. 00010100
5. …
TS271828182845
SERVER
How to ensure that these
answers are unique and
unpredictable?
DEVICE
1.
2.
3.
4.
5.
1
0
1
0
…
Physically Unclonable Functions
0.50ms
These tiny differences are
difficult to predict or clone
0.55ms
Then these could act
as the fingerprints
for the devices!
A simple Multiplexer PUF
“select”
bit
0
p ms delay
q ms delay
Multiplexers are basically
switching circuits
1
Correct. However, the devices
are consistent, i.e., their delays do
not change (too much) over time.
It is difficult to deliberately
create another mux that
exhibits the same delays
Arbiter PUFs
If the top signal reaches the finish line first,
the “answer” to this question is 0, else if the
bottom signal reaches first, the “answer” is 1
Question: 1011
1
0
1
1
?
Arbiter PUFs
If the top signal reaches the finish line first,
the “answer” to this question is 0, else if the
bottom signal reaches first, the “answer” is 1
Question: 1011
1
0
1
1
1?
Arbiter PUFs
If the top signal reaches the finish line first,
the “answer” to this question is 0, else if the
bottom signal reaches first, the “answer” is 1
Question: 0110
0
1
1
0
0?
Some FAQs
Does it matter whether the “red” signal reaches first or the “blue”?
No, the color does not matter – the color was added just for explanation
Why go into all this fuss of having multiple multiplexers?
It was expected that it would make it more difficult to predict the answers.
Also, it increases the number of possible questions.
Is it compulsory to have only 4 multiplexers?
Absolutely not. It depends on how long are your “questions”
It is common
to have 64
multiplexers
Actually …
That would make
the total number
of challenges 264
> 18 Quintillion!!
By the way, people usually call
the questions “challenges”
and the answers “responses”
Good … even if an attacker knows the
responses to a few challenges, there is
no way to guess the other answers.
Right? Right? Hello! Melbo!!
A Twist in the Tale
An attacker can see responses on a few challenges and
use ML to predict responses on all other challenges 
Does not matter if using 32-bit or 64-bit challenges
All mux-es are different so 𝑝1 ≠ 𝑝2 ≠ β‹― , π‘ž1 ≠ π‘ž2 ≠ β‹―
𝑐0
𝑐1
𝑝0
𝑑0𝑒
𝑐2
𝑝1
𝑐63
𝑝2
𝑑1𝑙
𝑑0𝑙
π‘ž0
𝑑1𝑒
π‘ž1
𝑑𝑖𝑒 is the (unknown)
time at which the upper
signal leaves the 𝑖-th
mux. 𝑑𝑖𝑙 is the time at
which the lower signal
leaves the 𝑖-th mux.
𝑑2𝑒
𝑑2𝑙
π‘ž2
…
…
𝑝63
𝑒
𝑑63
𝑙
𝑑63
π‘ž63
A Twist in the Tale
𝑒
𝑙
Observe that the answer is 0 if 𝑑63
< 𝑑63
and 1 otherwise
𝑒 𝑙
𝑒
𝑙
Also note that 𝑑1 and 𝑑1 depend on 𝑑0 , 𝑑0 , 𝑝1 , π‘ž1 , π‘Ÿ1 , 𝑠1 and 𝑐1
𝑐1 dictates which previous delay 𝑑0𝑒 or 𝑑0𝑙 will get carried forward in which
branch, and 𝑝1 , π‘ž1 , π‘Ÿ1 , 𝑠1 give us the delay introduced by the 1-th mux itself
𝑐0
𝑐1
𝑝0
𝑑0𝑒
𝑐2
𝑝1
𝑝2
𝑑1𝑙
𝑑0𝑙
π‘ž0
𝑑1𝑒
𝑐63
π‘ž1
𝑑2𝑒
𝑑2𝑙
π‘ž2
…
…
𝑝63
𝑒
𝑑63
𝑙
𝑑63
π‘ž63
A Twist in the Tale
10 𝑐1 ⋅ 𝑑0𝑒 + 𝑝1 + 𝑐011 ⋅ 𝑑0𝑙 + 𝑠1
𝑑1𝑒 = 1 −
𝑑1𝑙 = 1 −
01 ⋅ 𝑑0𝑒 + π‘Ÿ1
1
0 𝑐1 ⋅ 𝑑0𝑙 + π‘ž1 + 𝑐1
𝑐0
𝑐1
01
𝑝0
𝑑0𝑒
𝑐2
𝑝1
𝑝2
𝑑1𝑙
𝑑0𝑙
π‘ž0
𝑑1𝑒
𝑐63
π‘ž1
𝑑2𝑒
𝑑2𝑙
π‘ž2
…
…
𝑝63
𝑒
𝑑63
𝑙
𝑑63
π‘ž63
A little bit of Math 
Let us use the shorthand Δ𝑖 = 𝑑𝑖𝑒 − 𝑑𝑖𝑙 to denote the lag
Recall: all that matters is whether the top signal reaches first or not
Thus, all that matters is whether Δ63 < 0 or not
𝑒
𝑑0
𝑙
+ 𝑝1 − 𝑑0
𝑙
𝑑0
𝑒
𝑑0
Δ1 = 1 − 𝑐1 ⋅
− π‘ž1 + 𝑐1 ⋅
+ 𝑠1 − − π‘Ÿ1
= 1 − 𝑐1 ⋅ Δ0 + 𝑝1 − π‘ž1 + 𝑐1 ⋅ −Δ0 + 𝑠1 − π‘Ÿ1
= 1 − 2𝑐1 ⋅ Δ0 + π‘ž1 − 𝑝1 + 𝑠1 − π‘Ÿ1 ⋅ 𝑐1 + 𝑝1 − π‘ž1
To make notation simpler, let 𝑑𝑖 ≝ 1 − 2𝑐𝑖
𝑑𝑖 creates bits that take
values −1, +1 instead
Δ1 = Δ0 ⋅ 𝑑1 + 𝛼1 ⋅ 𝑑1 + 𝛽1
of 0,1 – that’s it!
𝛼1 = 𝑝1 − π‘ž1 + π‘Ÿ1 − 𝑠1 /2
𝛽1 = 𝑝1 − π‘ž1 − π‘Ÿ1 + 𝑠1 /2
A little bit of Math 
Note that a similar relation holds for any stage
Δ𝑖 = 𝑑𝑖 ⋅ Δ𝑖−1 + 𝛼𝑖 ⋅ 𝑑𝑖 + 𝛽𝑖
where 𝛼𝑖 = 𝑝𝑖 − π‘žπ‘– + π‘Ÿπ‘– − 𝑠𝑖 /2 and 𝛽𝑖 = 𝑝𝑖 − π‘žπ‘– − π‘Ÿπ‘– + 𝑠𝑖 /2
We can safely take Δ−1 = 0 (absorb initial delays into 𝑝0 , π‘ž0 , π‘Ÿ0 , 𝑠0 )
We can keep going on recursively
Δ0 = 𝛼0 ⋅ 𝑑0 + 𝛽0 (since Δ−1 = 0)
Δ1 = Δ0 ⋅ 𝑑1 + 𝛼1 ⋅ 𝑑1 + 𝛽1 – now plugin value of Δ0 to get
Δ1 = 𝛼0 ⋅ 𝑑1 ⋅ 𝑑0 + 𝛼1 + 𝛽0 ⋅ 𝑑1 + 𝛽1
Δ2 = 𝛼0 ⋅ 𝑑2 ⋅ 𝑑1 ⋅ 𝑑0 + 𝛼1 + 𝛽0 ⋅ 𝑑2 ⋅ 𝑑1 + 𝛼2 + 𝛽1 ⋅ 𝑑2 + 𝛽2
We can begin to see a pattern here
Linear Models
We have
Δ63 = 𝑀0 ⋅ π‘₯0 + 𝑀1 ⋅ π‘₯1 + β‹― + 𝑀63 ⋅ π‘₯63 + 𝛽63 = 𝐰 ⊀ 𝐱 + 𝑏
Exactly, this is why people
where
stopped using arbiter
π‘₯𝑖 = 𝑑𝑖 ⋅ 𝑑𝑖+1 ⋅ … ⋅ 𝑑63
PUFs for authentication
after this was revealed
𝑀0 = 𝛼0
𝑀𝑖 = 𝛼𝑖 + 𝛽𝑖−1 for 𝑖 > 0
This means that if someone
If Δ63 < 0, upper signal wins and answer is 0
can find the 𝐰, 𝑏 parameters,
they would be able to predict
If Δ63 > 0, lower signal wins and answer is 1
response to any challenge!!
Thus, answer is simply
sign 𝐰 ⊀ 𝐱+𝑏 +1
2
This is nothing but
a linear classifier!
Linear/hyperplane Classifiers
The model is a single vector 𝐰 of dimension 𝑑 (features
are also 𝑑-dim), and a scalar term (called bias) 𝑏
Predict on a test point 𝐱 by checking if 𝐰 ⊀ 𝐱 + 𝑏 > 0
Decision boundary: hyperplane (where 𝐰 ⊀ 𝐱 + 𝑏 = 0)
The vector 𝐰 is called the normal or perpendicular
vector of the hyperplane – why?
Consider any two vectors 𝐱, 𝐲 on the hyperplane i.e.
𝐰 ⊀ 𝐱 + 𝑏 = 0 = 𝐰 ⊀ 𝐲 + 𝑏. This means 𝐰 ⊀ (𝐱 − 𝐲) = 0.
Note that the vector 𝐱 − 𝐲 is parallel to the hyperplane
and 𝐰 perpendicular to all such vectors
The bias term 𝑏 if changed, shifts the plane – it can be
thought of as a threshold as well – how large does 𝐰 ⊀ 𝐱
have to be in order for decision to be 1
𝐰
XOR PUF
XOR: given a bunch of
0/1 bits, output is 1 if
odd number of bits
are 1 else if even
number of bits
(includes no bits) are
1, output is 0
XOR is basically
addition modulo 2
𝑏1 + β‹― + 𝑏𝐾 %2
Cracking the XOR PUF
It turns out that the XOR PUF can also be cracked using a linear
model although one of a larger dimensionality
Key insight: if we have a bunch of +1/−1 values, their product is +1
if and only if an even number of them are -1 else the product is -1
We can crack the individual PUFs using linear models i.e., for i-th PUF
⊀
1 + sign 𝐰𝑖 𝐱
2
Remember: sign value of +1 corresponds to bit 1 and -1 corresponds to bit 0
Note: 𝑖 sign π°π‘–βŠ€ 𝐱 is +1 if an even number of the sign values are -1
However, XOR is concerned with parity of +1 bits
Solution: Flip the signs!
Cracking the XOR PUF
The product − 𝑖 −sign π°π‘–βŠ€ 𝐱 = −1 𝐾+1 𝑖 sign π°π‘–βŠ€ 𝐱 is -1 if an
even number of the sign values are +1 else the product is +1
The extra −1 is there since XOR is 0 if there are an even number of 1s
Here, 𝐾 is the number of PUFs
1+ −1 𝐾+1
⊀
sign
𝐰
𝑖
𝑖 𝐱
Thus, the output of
𝟐
the sign values are +1 else the output is 1
is 0 if an even number of
This is exactly what we wanted!
All we need to do find a way to compute 𝑖 sign π°π‘–βŠ€ 𝐱
Although it does not seem so right away, there is a linear model hidden here
Observation:
⊀
sign
𝐰
𝑖
𝑖 𝐱 = sign
Find a way to simplify
𝑖
π°π‘–βŠ€ 𝐱
𝑖
π°π‘–βŠ€ 𝐱
Cracking the XOR PUF
Let’s take a toy example in 2 dims with 𝐰1⊀ 𝐱 ⋅ 𝐰2⊀ 𝐱 where
𝐰1 = π‘Ž, 𝑏 , 𝐰2 = 𝑝, π‘ž , 𝐱 = π‘₯, 𝑦 ∈ ℝ2
𝐰1⊀ 𝐱 ⋅ 𝐰2⊀ 𝐱 = π‘Žπ‘₯ + 𝑏𝑦 ⋅ 𝑝π‘₯ + π‘žπ‘¦
= π‘Žπ‘ ⋅ π‘₯ 2 + π‘Žπ‘ž + 𝑏𝑝 ⋅ π‘₯𝑦 + π‘π‘ž ⋅ 𝑦 2
= π‘Š ⊀ 𝑋,
where π‘Š = π‘Žπ‘, π‘Žπ‘ž + 𝑏𝑝, π‘π‘ž , 𝑋 = π‘₯ 2 , π‘₯𝑦, 𝑦 2 ∈ ℝ3
Thus, we can just learn a linear model in 3D instead of 2D
Exercise: extend this intuition to more than 2 classifiers and higher dims
Try to do optimizations to reduce the dimensionality of 𝑋
Note: we are not assured that the linear model we learn will be of
this form i.e., for some π‘Ž, 𝑏, 𝑝, π‘ž we get π‘Žπ‘, π‘Žπ‘ž + 𝑏𝑝, π‘π‘ž
However, we are assured that a linear model with 0 error does exist
Download