Final_Report - WordPress.com

advertisement
FINAL REPORT
SECURE CLOUD COMPUTING
MACHINES
Under the Guidance of
Prof. Young Cho
Team: Cold Fire
Amit Jain
Pruthwin Kadmaje Giridhara
Shailesh Kayambady Sathyanarayana Bhat
https://coldfirenetworkprocessor.wordpress.com/
Table of Contents
1.
Introduction .......................................................................................................................................... 4
2.
Motivation............................................................................................................................................. 6
3.
Proposal ................................................................................................................................................ 6
4.
System Level Working ........................................................................................................................... 7
5.
Implementation Details ........................................................................................................................ 9
5.1.
RSA ................................................................................................................................................ 9
5.2.
Compiler ...................................................................................................................................... 10
5.3.
Hardware Details......................................................................................................................... 10
5.4.
Software Support ........................................................................................................................ 12
5.4.1.
EE533_CF_SecuredProcessor .............................................................................................. 12
5.4.2.
EE533_datapath .................................................................................................................. 13
5.4.3.
EE533_nf_config ................................................................................................................. 13
5.4.4.
EE533_HostSide_Config ...................................................................................................... 13
5.4.5.
mini_asm_compiler............................................................................................................. 14
6.
Comparison with other secured Systems ........................................................................................... 15
7.
Future Improvements ......................................................................................................................... 15
8.
Datasheet ............................................................................................................................................ 16
8.1.
Front End Master Processor ....................................................................................................... 16
8.2.
Slave Crypto Processor................................................................................................................ 16
8.3.
Instructions supported by our ISA .............................................................................................. 17
NOP ..................................................................................................................................................... 18
BEZ ...................................................................................................................................................... 19
ALU ...................................................................................................................................................... 19
SD ........................................................................................................................................................ 20
LD ........................................................................................................................................................ 20
ALUI ..................................................................................................................................................... 20
JUMP ................................................................................................................................................... 21
JAL ....................................................................................................................................................... 21
JR ......................................................................................................................................................... 21
8.4.
Network Interface Instructions ................................................................................................... 21
RHEAD ................................................................................................................................................. 21
RTAIL ................................................................................................................................................... 22
RXPKT .................................................................................................................................................. 22
TXPKT .................................................................................................................................................. 22
RINT ..................................................................................................................................................... 22
8.5.
Multiplication/Division.................................................................................................................... 23
MULT ................................................................................................................................................... 23
DIV ....................................................................................................................................................... 23
8.6.
Inter Processor Communication Instruction ................................................................................... 23
PC_EN .................................................................................................................................................. 23
PC_DIS ................................................................................................................................................. 24
SD2 ...................................................................................................................................................... 24
LD2 ...................................................................................................................................................... 24
SD_SYM ............................................................................................................................................... 24
References .................................................................................................................................................. 25
1. Introduction
Cloud computing, to put it simply, means “Internet Computing”. The Internet is commonly visualized as
clouds; hence the term “cloud computing” for computation done through the Internet. With Cloud
Computing users can access database resources via the Internet from anywhere, for as long as they
need, without worrying about any maintenance or management of actual resources. Besides, databases
in cloud are very dynamic and scalable“. Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing resources (e.g., networks, servers,
storage, applications, services) that can be rapidly provisioned and released with minimal management
effort or service provider interaction.” Cloud computing is typically defined as a type of computing that
relies on sharing computing resources rather than having local servers or personal devices to handle
applications.
Figure 1 : Top level view of Cloud system
There are mainly 3 kinds of Service Models offered by the Cloud Providers, namely
a)
Software as a Service (SaaS)
In this type of model, the consumer can access applications hosted on the cloud using
standardized interfaces. The cloud provider is responsible for the management the application,
operating systems and underlying infrastructure. The consumer can only control some of the
user-specific application configuration settings.
Example: Yahoo!, Gmail, Google Diocs, etc.
b) Platform as a Service (PaaS)
The PaaS service model offers the services as operation and development platforms to the
consumer. The consumer can use the platform to develop and run his own applications,
supported by a cloud-based infrastructure. The consumer does not manage or control the
underlying cloud infrastructure including network, servers, operating systems, or storage, but
has control over the deployed applications and possibly application hosting environment
configurations.
Example: Google Aps, SQL Azure, etc.
c) Infrastructure as a Service (IaaS)
The IaaS service model is the lowest service model in the technology stack, offering
infrastructure resources as a service, such as raw data storage, processing power and network
capacity. The consumer can the use IaaS based service offerings to deploy his own operating
systems and applications, offering a wider variety of deployment possibilities for a consumer
than the PaaS and SaaS models. “The consumer does not manage or control the underlying
cloud infrastructure but has control over operating systems; storage, deployed applications, and
possibly limited control of select networking components (e.g., host firewalls)”.
Example: Amazon (S3, EC2), Windows Azure, etc.
Note: Our project system is based on the IaaS model. We provide only infrastructure that is highly
secure. It is up to the user to run his/her own applications and operating system in our platform. The
application and operating system sent to our system is encrypted using the symmetric key desired
by the user.
2. Motivation
While a cloud migration can enable a company to potentially reduce capital expenditures and operating
costs while also benefiting from the dynamic scaling, high availability, cloud computing also present
numerous challenges and raise security concerns such as Insiders threat. A dissatisfied employee base
provides a vector for insider security events, while the inadvertent injection of malware through
removable media or web interconnections can make any employee the origination point for a network
security violation. There can be fatal consequences if an unauthorized person has access to the cloud
computing systems. For example, let’s say that a doctor wants to perform some image processing on
brain scan images to detect tumors. Ideally he would like to send those images to the cloud systems in
order to perform some analysis on those, since the systems present in cloud may host those applications
or it may be economical. But there are lots of risks associated with this scenario. If a person with
malicious intention has access to the cloud system, he can probe the hardware and get the code and
data. He can then check the code to understand what the computation is about and can manipulate the
results for his own purpose. When the doctor receives the wrong the result, it will lead to serious
consequences.
Usually the security is implemented at the software level. But with the just the software encryption and
decryption, the contents stored in the memory systems like instruction cache or data cache is not
encrypted. So one can probe these devices and get the required information. This makes us to think,
what can be done at the hardware level to make the system secure.
3. Proposal
We have come up with a solution which is very first of its kind. Our idea is based on the fact that the
security implemented at the software level is not protected as we think. So we want to do everything at
the hardware level. Implementing security at the hardware has two main advantages. First of all, the
encryption and decryption at the hardware level are very fast compared to implementing them in the
software. Second, it is harder to break the encrypted hardware system compared to secured software
solutions.
We demonstrate our idea using a simple 5 stage pipelined processor. So all the data including code
coming to the processor is encrypted. The data going out of the processor is also encrypted. Now we
have some design choices like at what stage we can decrypt and encrypt the code and data depending
on how secured our system must be. To make the processor highly secured and nearly impossible to
crack, we can put the decryption engine just before the execution stage of the 5 stage pipeline and again
encrypt it before writing to the memory. All the memory systems like I-cache, D-cache and, even the
register files can be encrypted.
4. System Level Working
At a very high level, our system works in the below way. We have implemented a two level security
mechanism. The first level is used for channel encryption and 2nd level is used for encryption/decryption
of code by core 2.










RSA key exchange between the customer and cloud computer.
Using the Public RSA keys, the first level Symmetric keys are exchanged. Now the whole channel
communication between the cloud and the host is encrypted using this first level key.
The second level symmetric key is sent which is encrypted using the first level key. This key is
stored in a special register by the core 1.
Encrypted code is generated using our compiler. This code is encrypted twice. Once using first
level symmetric key and then again using second level symmetric key.
The code is broken into smaller packets and appended with some patterns in order for the
receiving node to identify these packets.
The Encrypted packets are sent to the Cloud computer one by one.
One core in the cloud is dedicated and optimized for Packet processing. The core 1 does the first
level decryption and checks for the valid pattern (Identifier) .
The encrypted Instructions are extracted and loaded into the Cache of second core.
Second processor contains an in-built encryption and decryption engine which isolates Memory
from the Pipeline. Everything stored in the caches will be encrypted using the symmetric key
provided by customer.
The program result is encrypted again and sent back to customer by the First core.
We have demonstrated the two level encryption systems in our secured network processor. The use
case described in the next page explains a real life scenario where this level of encryption can be used.
Since we do not have multiple FPGA’s to demonstrate the actual two level encryption like described in
the use case, we try to emulate it by using a single host and a FPGA. The host (compiler) encrypts the
code which is to be sent to the cloud using the two keys. The second key is sent to the FPGA in a packet
as a normal data. But this is encrypted using the first level key. There is no explicit RSA key exchange for
the exchange of the second level key. Only the first level key is exchanged through RSA.
The below block diagram shows a use case of two level encryption system.
In the above figure there are mainly three sub systems, the cloud system, ISP (Internet service provider)
and the users. This use case is mainly to show how our two level encryption system can be used in a
real life scenario. Between the ISP and Cloud provider, a channel encryption is maintained. This can be
achieved using RSA key exchange and symmetric key exchange. We demonstrate this, by exchanging the
first level symmetric key initially with the Cloud (NetFPGA). All the packets going to the cloud providers
through the ISP from various users are encrypted using this key. In our case even the packet identifiers
are also encrypted using the first level key. Different ISP’s can maintain their own different keys to
encrypt the channel. The users can have their own encryption mechanism. In the above use case all 5
users use different symmetric keys to encrypt the code and data to be sent to the cloud for some
processing. This encryption is independent of the encryption used by the ISP’s. Now the user sent
packets traverse through the ISP who will again encrypt it, using their own keys, before sending to the
cloud provider.
When the cloud system receives packets from a particular ISP, it knows which keys to use to decrypt all
the incoming packets. The cloud can then extract the packets and get all the required data. The initial
packets from the user will contain the second level key which is encrypted in the first level key. This key
can be decrypted and can be used further to decrypt code and data and execute them.
5. Implementation Details
Let’s look into the implementation details of all the steps mentioned above.
5.1.
RSA
RSA is one of the first practical public-key cryptosystems and is widely used for secure data
transmission. RSA stands for Ron Rivest, Adi Shamir and Leonard Adleman, who are the designers of this
crypto system. The below block diagram shows a typical RSA key exchange protocol.
NETFPGA
HOST
KEY EXCHANGE
(Public key and
Private Key)
(Public key and
Private Key)
Symmetric Key
Both the sender and the receiver have a pair of keys. These pair of keys is called Public and private keys.
RSA is a asymmetric crypto system and is very slow. Crypto system using the symmetric keys is much
faster. So using the RSA we exchange the symmetric keys in a secured way. Below steps are followed in
the key exchange.




Host sends its public key to the NetFPGA.
NetFPGA receives the Host public key and encrypts its own public key using the host public key.
Now the NetFPGA send the encrypted key back to host.
Host receives the encrypted key. It decrypts it using its private key.
The Host encrypts the symmetric key using the public obtained from the NetFPGA and sends it
to the NetFPGA.
Note: For our demonstration purpose we use a very simple 8-bit RSA keys. Our symmetric key is 64-bit
long.
5.2.
Compiler
We have built a smart compiler which is compatible with our processor. Our compiler has many
features:







It can detect the dependency automatically and insert only a required amount of NOOPS to
avoid hazard.
It also supports address resolution for all the labels and branches as well.
Unsupported MIPS instructions are mapped automatically using a combination of our
instructions.
It also supports generation of the output in both hex format which is used to run the code in
NetFPGA and binary format for simulation purposes.
It can also support generation of encrypted instructions using a user provided symmetric key. It
can encrypt the instructions twice to support our extension of security.
It also supports encoding of the instructions into some ASCII characters. This feature is provided
in order to compensate for the”iperf” tool issue which will be discussed in later sections in
detail.
The whole compiler is designed in a very modular and flexible fashion. Addition or modifications
to the ISA can be easily handled by the compiler with very minimal changes in the configuration
files.
In the Software Support section 5.4 of the report, the compiler usage is discussed in detail.
5.3.
Hardware Details
The below block diagram shows our secured dual core processor.
Let us look into the processor in detail. As mentioned before, our processor consists of two cores. Both
the cores are made up of simple 5-stage pipelined design.
Core 1:
This is the front end core and it takes all the heat from the network. This core is mainly
responsible for capturing the network packets and processing them. Also it handles the RSA key
exchange with the host and stores the symmetric in a special register. During out initial phase of the
design, we came across a challenge. We need a scratch memory for packet processing in addition to the
FIFO support. So we decided to split the D-cache of the core 1 into two parts. The first 128 words acts as
memory and remaining 128 as FIFO. All the network packets arrive at the FIFO and notify the core using
a special instruction. When the core 1 realizes that some packet has arrived, it starts to fetch the packet
from FIFO and process it in scratch memory. All the incoming packets containing encrypted code has
some fixed pattern. The core 1 searches for this pattern and sends out the packet immediately, if the
pattern is not found. If the pattern is found it forwards the encrypted packet to the core 2 removing the
pattern.
Hardware accelerator:
On the way to core 2, the packets go through a hardware accelerator. This hardware
accelerator is capable of performing some ASCII to hex mapping. The UDP packets sent from
the host using iperf, can only take ASCII values. So the compiler at the host side, after it
encrypts the instructions it will encode the full program.
Core 2
Now the decoded packets from the hardware accelerator arrive at core 2. But these packets are
still encrypted. They are stored in the instruction cache of the Core 2. When the entire program
is sent to the core 2, core 1 signals using a special instruction that the transfer is complete and
core 2 can start executing. Upon receiving the go-ahead signal from core 1, core 2 starts
fetching the encrypted instructions from its I-cache. A decryption engine lies in between the Icache and decode stage. Just before the decode stage the instructions are decrypted and sent
to the execution stage. Similarly an encryption unit lies in between the execution and D-cache.
After the execution of the instruction, just before writing back to the memory, the data is again
encrypted and written in memory. Also in the write back stage the registers are encrypted
before writing back to register files.
Thus all the memory parts like I-Cache, D-Cache and register files are encrypted.
5.4.
Software Support
We have created an extensive software support to test our system. Below are the some of the
software’s available. We go in detail how to use them and what it is capable of.
Software Name
EE533_CF_SecuredProcessor
EE533_datapath
EE533_nf_config
EE533_HostSide_Config
mini_asm_compiler
Location
\Software\AssemblyCode
\Software\Scripts
\Software\Scripts
\Software\Scripts\HostSideConfig
\Software\Compiler
5.4.1. EE533_CF_SecuredProcessor
This is an assembly code written using the ISA for our ColdFire Secured Processor. This code is run on the
core 1 of our processor. This software acts like an Operating system for our system. This code is
responsible for the below functions:
-
Capture of packets from network
-
FIFO and memory management
RSA Key exchange with the Host
First level packet decryption in core 1
Decrypting the encrypted symmetric key and saving in a special register
Pattern detection (Code, Keys etc)
Transfer of encrypted code to Core 2
Booting of encrypted code to be run on core 2.
Sending out the invalid packets out of system
Handles request of result from users
5.4.2. EE533_datapath
This a perl script behaves like a boot loader code. It loads the “EE533_CF_SecuredProcessor”program to
the Core 1 and starts the core 1. It can do the following functionalities.
-
Loading of “EE533_CF_SecuredProcessor” program to core 1 instruction cache.
Enable/disable of program counter of Core1
Read/write Data cache and instruction cache
Below are the commands to enable the above mentioned functionalities.
Commands
Reset
Start
Read ic <address>
Ic <File Name>
Read data <address>
dc <FileName>
Functionality
Resets the PC of core 1
Enables PC of core 1
Reads the instruction cache core 1 given the address
Write into instruction cache of core 1 from the input file provided
Reads data from data cache of core 1from the address
Write into the data cache of core 1 from the input file
5.4.3. EE533_nf_config
This is a perl script to configure and program the NetFPGA with the base “Reference Router” program.
Execute this script to point to the log file generated after the Nodes are swapped in the deter.
“Perl EE533_nf_config /proj/USCEE533/exp/CFlab12/tbdata/tbreport.log”
5.4.4. EE533_HostSide_Config
This is a perl script which is run on the Host side of the system. This script provides a bunch of
functionalities to communicate with the NetFPGA which acts like a Cloud Computer in our demo.
Below are the functionalities provided by this script
-
RSA Key exchange with the Cloud system (NetFPGA)
Sending first and second level symmetric keys
Send encrypted data to FPGA
Send encrypted code to FPGA
Request the results from the FPGA
The below table gives the usage information of this script
Commands
-pub
-sym
-sym2
-data
-code
-req
Functionality
Send host public key to fpga
Send encrypted first level symmetric keys to fpga
Send encrypted second level symmetric keys to fpga
Send any encrypted data to fpga
Send the encrypted code to fpga. This command takes the encrypted
code from a file “FileInp” which should be located in the same path as
this script
Request result from fpga. This command takes the requesting address
from a file “RequestAddress” which should be located in the same
path as this script.
5.4.5. mini_asm_compiler
This is a smart compiler written for our custom ISA. The features of the compiler are already discussed in
the Compiler section 5.2. The usage details of the compiler are as below.
Usage: perl mini_asm_compiler <-s> <-d> <-r> <-o> <-en> <-b/-h>\n";
Commands
-all
-s :
-d
-r
-o
-b/-h
-en
-ed
En2
Functionality
Run all the programs in correct order <input file> <output file>\n";
Generate assembly for custom isa
Resolve the instruction dependencies
Resolve branch target address
Generate output file custom isa
Generate output file in binary/hexadecimal format <default : hexadecimal>
Generate encrypted instruction for custom isa
Generate encoded instruction for custom isa
Generate 2nd level encrypted instruction for custom isa
6. Comparison with other secured Systems
We believe that, no systems similar to ours are available. So we cannot make a direct comparison to
evaluate the performance of our system. Currently the security is implemented at the software level as
mentioned in the previous sections. Below table compares our system with a system in which security is
implemented at the software level.
Our System
Encryption/decryption in hardware
Faster encryption/decryption
I-Cache , D-Cache, Register file contents are
encrypted
Probing the hardware to snoop data/code is
very hard
Software level security
Encryption/decryption in software
Slower
I-Cache , D-Cache, Register file contents are
NOT encrypted.
Can easily probe memory devices and get the
data.
7. Future Improvements
-
Currently we have implemented the RSA using a simple 8 bit key. It can be extended to
768/1024 bits to provide more secured channel.
The symmetric encryption/decryption is implemented using simple XOR. It can be replaced by
AES/DES encryption engine.
Incorporate more cores/threads to service more user encrypted code simultaneously in cloud
computer (NetFPGA).
8. Datasheet
8.1.
Front End Master Processor
8.2.
Slave Crypto Processor
8.3.
Instructions supported by our ISA
Instruction
Opcodes
AluCntrl
Rd
Rs
Rt
Remaining bits
(4 bits)
(4 bits)
(4 bits)
(4 bits)
(4 bits)
(12 bits)
NOP
0000
-
-
-
-
-
BEZ
0001
-
-
Rs
ALU
0010
AluCntrl
Rd
Rs
Rt
-
SD
0011
0000
-
Rs
Rt
Offset
LD
0100
0000
Rd
Rs
Offset
ALUI
0101
AluCntrl
Rd
Rs
Immediate
JR
0110
0010
-
-
Rt
-
JAL
0110
0001
Rd
-
$0
Address
J
0110
0000
-
-
-
Address
RHEAD
0111
0000
Rd
-
-
-
RTAIL
0111
0001
Rd
-
-
-
RXPKT
0111
0101
-
-
-
-
TXPKT
0111
0110
-
-
-
-
RINT
1000
0000
Rd
-
-
-
MULT
1001
0000
Rd
Rs
Rt
-
DIV
1001
0001
Rd
Rs
Rt
-
PC_EN
1010
0000
-
-
-
-
PC_DIS
1010
0001
-
-
-
-
SD2
1011
0000
-
Rs
Rt
Offset
LD2
1100
0000
Rd
Rs
SD_SYM
1101
0000
-
Rs
Offset
Offset
-
NOP
Opcode
31-28
0000
Other fields
27-0
XXXX-XXXX-XXXX-XXXX-XXXX-XXXX-XXXX
-
This instruction does not change any architected state of the registers. Also it will not modify any
memory content in the processor. NOP instruction takes one clock cycle and increments the PC. This
instruction is used in between producer and consumer instructions so that consumer gets the correct
and most update value of register. This is also used after Jump/Branch instructions to prevent fetching
of incorrect instructions.
BEZ
Opcode
31-28
0001
AluCntrl
27-24
0000
--------23-20
XXXX
Rs
19-16
####
Offset
15-0
####-####-####-####
Branch if equal to zero (BEZ) is a conditional branch instruction. If the LSB of the value of register
specified by Rs is equal to zero branch will be taken. The values in the bits [63 :1] of the registers are
ignored. This helps in reducing the decision making delay caused by the branch. The branch address is
calculated as PC + offset. The 16 bit offset is represented in 2's complement form. Hence the possible
values of offset are (-32768 to 32767). This instruction should be followed by 3 NOPs.
ALU
Opcode
31-28
0010
AluCntrl
27-24
####
Rd
23-20
####
Rs
19-16
####
Rt
15-12
####
--------11-0
XXXX-XXXX-XXXX
The Arithmetic and Logical operations are collectively represented by the Opcode and AluCntrl bits. The
operations specified by the AluCntrl are performed on the values of register Rs and Rt by the ALU and
the result is written back into register Rd. The operations performed by ALU are as shown below. The
arithmetic operations are unsigned and overflows are ignored. For NOT operation register Rs is ignored.
The dependant instructions should be separated by at least 3 independent intermediate instructions.
AluCntrl[3:0]
0000
0001
Operation
ADD
SUB
Operand relation
Rd = Rs + Rt
Rd = Rs - Rt
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
11XX
Left Shift
Right Shift
AND
OR
NOT
XOR
XNOR
SETG
SETL
SETE
---
Rd = Rs << Rt
Rd = Rs >> Rt
Rd = Rs & Rt
Rd = Rs | Rt
Rd = ~ Rt
Rd = Rs ⊕ Rt
Rd = Rs ʘ Rt
Rd = ( Rs > Rt ) ? 1 : 0
Rd = ( Rs < Rt ) ? 1 : 0
Rd = ( Rs == Rt ) ? 1 : 0
---
SD
Opcode
31-28
0011
AluCntrl
27-24
0000
--------23-20
XXXX
Rs
19-16
####
Rt
15-12
####
Offset
11-0
####-####-####
Store instruction transfers the 64 bit value in register Rt to the address specified by Rs and Offset. ALU
performs addition on Rs and Offset and the result is used as address for the DCache . The upper bits of
ALU result (more than the size of DCache) are ignored. The offset is 12 bits and can vary from 0 to 4095.
LD
Opcode
31-28
0100
AluCntrl
27-24
0000
Rd
23-20
####
Rs
19-16
####
Offset
15-0
####-####-####-####
Load instruction transfers the 64 bit value from the memory (DCache) to the register Rd. The address of
memory location is calculated by the ALU by adding Rs and Offset. Note that in Load instruction the
offset can potentially have 16 bit value. The dependant instructions should be separated by at least 3
independent intermediate instructions.
ALUI
Opcode
31-28
AluCntrl
27-24
Rd
23-20
Rs
19-16
Immediate
15-0
0101
0000
####
####
####-####-####-####
Instruction similar to ALU , but the second operand is a 16-bit immediate value taken directly from the
instruction. The upper 48 bits of the Immediate value is set to 0. The dependant instructions should be
separated by at least 3 independent intermediate instructions.
JUMP
Opcode
31-28
0110
Cntrl
27-24
0000
--------23-12
XXXX-XXXX-XXXX
Address
11-0
####-####-####
Jump instruction loads the PC with address provided in the lower 12 bits of the instruction. There are no
ALU operations performed. This instruction should be followed by 3 NOPs.
JAL
Opcode
31-28
0110
Cntrl
27-24
0001
Rd
23-20
####
--------19-12
XXXX
Address
11-0
####-####-####
This instruction is used to jump to a sub routine. This instruction stores the current PC in the Link
register and replaces the PC with the jump address. This instruction should be followed by 3 NOPs.
JR
Opcode
31-28
0110
Cntrl
27-24
0010
--------23-16
XXXX
Rt
15-12
####
--------11-0
XXXX-XXXX
Jump Register instruction replaces the PC with the value present in register Rt. This instruction should be
followed by 3 NOPs.
8.4.
Network Interface Instructions
RHEAD
Opcode
31-28
0111
Cntrl
27-24
0000
Rd
23-20
####
--------19-0
XXXX-XXXX-XXXX-XXXX-XXXX
The Dcache of the processor is a convertible into a FIFO. The pointer to the head of the FIFO is read and
written into register Rd. The dependant instructions should be separated by at least 3 independent
intermediate instructions.
RTAIL
Opcode
31-28
0111
Cntrl
27-24
0001
Rd
23-20
####
--------19-0
XXXX-XXXX-XXXX-XXXX-XXXX
The pointer to the tail of the FIFO is read and written into register Rd. The dependant instructions
should be separated by at least 3 independent intermediate instructions.
RXPKT
Opcode
31-28
0111
Cntrl
27-24
0101
--------23-0
XXXX-XXXX-XXXX-XXXX-XXXX-XXXX
Receive Packet instruction sends signal to the FIFO control to receive packets from the network and
store it in FIFO. Upon receiving a packet, the pointer to the tail of FIFO increments.
TXPKT
Opcode
31-28
0111
Cntrl
27-24
0110
--------23-0
XXXX-XXXX-XXXX-XXXX-XXXX-XXXX
Transmit Packet instruction sends signal to the FIFO control to send the packets from FIFO to the
network. Upon sending a packet, the pointer to the head of FIFO increments until it becomes equal to
tail pointer.
RINT
Opcode
31-28
1000
Cntrl
27-24
0000
Rd
23-20
####
--------19-0
XXXX-XXXX-XXXX-XXXX-XXXX
Read Interrupt instruction gives an access to internal Interrupt Register. Interrupt Register holds the
interrupt ID. Once the RINT instruction is issued the interrupt ID is transferred to Rd and Interrupt
register is reset. The dependant instructions should be separated by at least 3 independent intermediate
instructions.
8.5. Multiplication/Division
MULT
Opcode
31-28
1001
Cntrl
27-24
0000
Rd
23-20
####
Rs
19-16
####
Rt
15-12
####
--------11-0
XXXX-XXXX-XXXX
A 32 bit multiplier is integrated to the processor. The lower 32 bits of registers Rs and Rt are multiplied
and the 64 bit result is written back into register Rd. The multiplier unit has a 4 clock cycles of latency.
Hence a MULT instruction should be followed by 5 NOPs.
DIV
Opcode
31-28
1001
Cntrl
27-24
0001
Rd
23-20
####
Rs
19-16
####
Rt
15-12
####
--------11-0
XXXX-XXXX-XXXX
A 16 bit division unit is integrated to the processor. The lower 16 bits of Rs and Rt are considered and
the integer value of Rs/Rt is computed. The 16 bit result is written back into register Rd. The Division
unit has a 16 clock cycles of latency. Hence a DIV instruction should be followed by 17 NOPs.
8.6. Inter Processor Communication Instruction
PC_EN
Opcode
31-28
1010
Cntrl
27-24
0000
--------23-0
XXXX-XXXX-XXXX-XXXX-XXXX-XXXX
This instruction enables the program counter for the Slave processor attached to the Master. Upon this
instruction slave processor starts running the instructions from its ICache.
PC_DIS
Opcode
31-28
1010
Cntrl
27-24
0001
--------23-0
XXXX-XXXX-XXXX-XXXX-XXXX-XXXX
Upon issuing the PC disable instruction by the master, the program counter, stage register , register file
of the slave processors are reset.
SD2
Opcode
31-28
1011
AluCntrl
27-24
0000
--------23-20
XXXX
Rs
19-16
####
Rt
15-12
####
Offset
11-0
####-####-####
SD2 instruction is similar to SD instruction. But when a master executes SD2 instruction instead of
writing into the memory, it transfers the value to the ICache of the slave processor. The delays
associated with SD2 are similar to SD.
LD2
Opcode
31-28
1100
AluCntrl
27-24
0000
Rd
23-20
####
Rs
19-16
####
Offset
15-0
####-####-####-####
LD2 instruction is similar to LD, but when a master executes it, instead of reading from its memory, the
DCache of the slave processor is read. This value is written back into register Rd. The dependant
instructions should be separated by at least 3 independent intermediate instructions.
SD_SYM
Opcode
31-28
1101
AluCntrl
27-24
0000
--------23-20
XXXX
Rs
19-16
####
--------15-0
####-####-####-####
The slave processor is a crypto engine. The master provides the symmetric key to be used by the slave
through SD_SYM instruction. This updates an internal register with the value of Rs.
References
1] Lubos Gaspar, Viktor Fischer, Lilian Bossuet, Robert Fouquet. Secure extension of FPGA softcore
processors for symmetric key cryptography. 6th International Workshop on Reconfigurable
Communication-centric Systems-on-Chip, ReCoSoC 2011, Jun 2011, Montpellier, France.
[2] Eslami, Yadollah, et al. "An area-efficient universal cryptography processor for smart cards."Very
Large Scale Integration (VLSI) Systems, IEEE Transactions on 14.1 (2006): 43-56.
[3] Hu, Kekai, et al. "System-level security for network processors with hardware monitors."Design
Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE. IEEE, 2014.
[4] Fletcher, Christopher W., Marten van Dijk, and Srinivas Devadas. "A secure processor architecture for
encrypted computation on untrusted programs."Proceedings of the seventh ACM workshop on Scalable
trusted computing. ACM, 2012.
Download