Final presentation Encryption/Decryption on embedded system

advertisement
Final presentation
Encryption/Decryption on
embedded system
Winter 2013
Part A
Supervisor: Ina Rivkin
students: Chen Ponchek
Liel Shoshan
motivation
• Now days, there are many portable storage systems
with large memories which contains valuable data
(such as disk on key, tablets, etc.)
• Therefore there is a concrete need for portable
cryptography systems which are suitable for such
devices.
• In our project, we will aspire to provide a suitable
system which will answer this need.
Project Goal
main goal:
Implementation of data cryptography embedded
system using AES algorithm and finding the suitable
architecture for portable system.
Project Specifications
• Implementing on a Zync SoPC by Xilinx.
• Suitable for portable systems (Disk-on-Key, tablets, etc.) - low power system.
• Transparent system (while storing/loading files) - The cryptography system won’t
create traffic bottle necks.
• Finding the best architecture – according to the requirements above:
•
Profiling AES algorithm.
• Finding the balance between using the ARM processor and using the FPGA
(the hardware accelerator needs more power).
AES algorithm
•
Advanced Encryption Standard, also known as “Rijndael”, is a block cipher.
•
The cipher is iterative, quick and comfortable to implement both by software
and hardware, and it doesn’t have high memory requirements.
• Most of the AES calculations are made through 10 rounds.
• In each state the data block is described as a 2D, 4X4 array of bytes.
• In each round a “Round Key” is created by the key-expansion process.
•
Each round consists of 4 steps:
1.
SubBytes
2.
ShiftRows
3.
MixColumns
4.
AddRoundKey
System Block diagram
Decrypted data
DDR
Encrypted data
Zynq
PS
RS232
UART
PL
BRAM
ZEDBOARD
Zedboard Block Diagram
System Block Diagram
project part A
Decrypted data
Zynq
DDR
Encrypted data
PS
PL
AES in
software
RS232
BRAM
UART
ZEDBOARD
Implementation of AES algorithm on ARM and code optimization.
Software Engineering
• Each step is implemented as a separate
function.
• Each function is independent of the other
functions.
• The program can encrypt and decrypt the
data.
Software Engineering
• The input data will be entered by the user via
PuTTY terminal.
• The program’s output is the data after
encryption and the encrypted data after
decryption.
Encryption Process
Development stages
XPS/EDKConfiguring the ARM system:
 Creation of the ARM processor
interface to the RS-232 UART.
 Addition of the Bram and Bram
Controller IP and connection to the
AXI Interconnect.
Development stages
PlanAhead
 Creation of the Top level entity in VHDL code.
 Generation of the Bitstream.
 Exporting hardware to SDK.
Development stages
SDK  Generating the software platform project:
• Creating Board Support package (BSP).
• Selection of memory – DDR vs. Bram.
 Test in Hardware:
• Downloading the application to the
ARM processor.
• Running and profiling the application.
Profiling
Bram
vs.
DDR
Encryption and decryption of 10x16 Bytes
111.54 ms
2.754 ms
Software optimization #1
• The MixColumns and InvMixColumns
functions takes around 65%-70% of the whole
process execution time.
• Improving them will significantly reduce the
delay time.
Software optimization #1
• We will implement the MixColumns function
using LUTs instead of arithmetic commands
and if/else statements.
• Should speed up the calculations.
MixColumns initial implementation
MixColumns improved implementation
Bram
With
an
Profiling
improved
88.06 ms
vs.
MixColumns
DDR
implementation
2.626 ms
Software optimization #1
 Bram :
• The total execution time decreased from
111.5 msec to 88 msec.
• Decreasing in 21%.
 DDR :
• The total execution time decreased from
2.754 msec to 2.626 msec.
• Decreasing in 5%.
Software optimization #2
• We will implement the MixColumns and the
InvMixColumns functions using LUTs and
without using for loops.
• Should speed up even more the calculations.
MixColumns optimized implementation
InvMixColumns optimized implementation
Bram
With
an
Profiling
improved
vs.
MixColumns
DDR
implementation
With an optimized MixColumns and MixColumns implementation
47.427 ms
1.145 ms
Software optimization #2
 Bram :
• The total execution time decreased from
111.5 msec to 47.427 msec.
• Decreasing in 57%.
 DDR :
• The total execution time decreased from
2.754 msec to 1.145 msec.
• Decreasing in 58%.
Hardware optimization
The ARM processor clock:
• At first, we used the default clock rate, which
was 160MHz.
• We will now set the clock rate to 225MHz (the
maximum clock rate).
Bram
Profiling
vs.
DDR
With higher clock rate ( 160MHz  225MHz)
34.798 ms
0.819 ms
Hardware optimization
 Bram :
• The total execution time decreased from
111.5 msec to 34.8 msec.
• Decreasing in 69%.
 DDR :
• The total execution time decreased from
2.754 msec to 0.82 msec.
• Decreasing in 70%.
Optimizations
E x e c u t i o n ’s t i m e i m p r o v e m e n t
120
100
3
2.754
2.626
111.54
2.5
88.06
80
2
BRAM
msec
DDR
60
1.5
1.145
47.427
40
0.819
1
34.798
20
0.5
0
0
basic
improved
optimized
higher clk rate
Optimizations
E x e c u t i o n ’s s p e e d i m p r o v e m e n t
5.00
250.00
4.60
4.50
4.00
200.00
3.37
3.50
195.36
KB/sec
3.00
150.00
139.74
2.50
BRAM
1.82
2.00
100.00
1.43
1.50
1.00
58.10
50.00
60.93
0.50
0.00
0.00
basic
improved
optimized
higher clk rate
DDR
Execution’s speed improvement
• Every optimization that we have made has
decreased the total time and improved the speed.
• The most significant improve was attributed by the
2nd SW optimization.
• Both DDR and Bram
speeds were
eventually increased by
3 times and more.
Bram vs. DDR
• In every optimization : running the application
from BRAM was significantly slower then
running from DDR.
• This is due to:
– DDR has it own dedicated Bus.
– The DDR clock rate is 550 MHZ, when BRAM clock
rate is 160 MHZ.
– DDR works on both rising and falling edge.
Transmission rate
• The typical maximum data rate in USB is 1.5 MB/s
(The typical rates are around 0.5 MB/s.)
• The encryption rate we were able to achieve at the
end is 323 KB/s  1.5 times slower.
• Conclusion:
An hardware accelerator is needed.
Project Specifications
• Implementing on a Zync SoPC by Xilinx.
• Suitable for portable systems (Disk-on-Key, tablets, etc.) - low power
system.
• Finding the best architecture – according to the requirements
above:
 Profiling AES algorithm.
Demonstration
Download