FINAL REPORT SECURE CLOUD COMPUTING MACHINES Under the Guidance of Prof. Young Cho Team: Cold Fire Amit Jain Pruthwin Kadmaje Giridhara Shailesh Kayambady Sathyanarayana Bhat https://coldfirenetworkprocessor.wordpress.com/ Table of Contents 1. Introduction .......................................................................................................................................... 4 2. Motivation............................................................................................................................................. 6 3. Proposal ................................................................................................................................................ 6 4. System Level Working ........................................................................................................................... 7 5. Implementation Details ........................................................................................................................ 9 5.1. RSA ................................................................................................................................................ 9 5.2. Compiler ...................................................................................................................................... 10 5.3. Hardware Details......................................................................................................................... 10 5.4. Software Support ........................................................................................................................ 12 5.4.1. EE533_CF_SecuredProcessor .............................................................................................. 12 5.4.2. EE533_datapath .................................................................................................................. 13 5.4.3. EE533_nf_config ................................................................................................................. 13 5.4.4. EE533_HostSide_Config ...................................................................................................... 13 5.4.5. mini_asm_compiler............................................................................................................. 14 6. Comparison with other secured Systems ........................................................................................... 15 7. Future Improvements ......................................................................................................................... 15 8. Datasheet ............................................................................................................................................ 16 8.1. Front End Master Processor ....................................................................................................... 16 8.2. Slave Crypto Processor................................................................................................................ 16 8.3. Instructions supported by our ISA .............................................................................................. 17 NOP ..................................................................................................................................................... 18 BEZ ...................................................................................................................................................... 19 ALU ...................................................................................................................................................... 19 SD ........................................................................................................................................................ 20 LD ........................................................................................................................................................ 20 ALUI ..................................................................................................................................................... 20 JUMP ................................................................................................................................................... 21 JAL ....................................................................................................................................................... 21 JR ......................................................................................................................................................... 21 8.4. Network Interface Instructions ................................................................................................... 21 RHEAD ................................................................................................................................................. 21 RTAIL ................................................................................................................................................... 22 RXPKT .................................................................................................................................................. 22 TXPKT .................................................................................................................................................. 22 RINT ..................................................................................................................................................... 22 8.5. Multiplication/Division.................................................................................................................... 23 MULT ................................................................................................................................................... 23 DIV ....................................................................................................................................................... 23 8.6. Inter Processor Communication Instruction ................................................................................... 23 PC_EN .................................................................................................................................................. 23 PC_DIS ................................................................................................................................................. 24 SD2 ...................................................................................................................................................... 24 LD2 ...................................................................................................................................................... 24 SD_SYM ............................................................................................................................................... 24 References .................................................................................................................................................. 25 1. Introduction Cloud computing, to put it simply, means “Internet Computing”. The Internet is commonly visualized as clouds; hence the term “cloud computing” for computation done through the Internet. With Cloud Computing users can access database resources via the Internet from anywhere, for as long as they need, without worrying about any maintenance or management of actual resources. Besides, databases in cloud are very dynamic and scalable“. Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” Cloud computing is typically defined as a type of computing that relies on sharing computing resources rather than having local servers or personal devices to handle applications. Figure 1 : Top level view of Cloud system There are mainly 3 kinds of Service Models offered by the Cloud Providers, namely a) Software as a Service (SaaS) In this type of model, the consumer can access applications hosted on the cloud using standardized interfaces. The cloud provider is responsible for the management the application, operating systems and underlying infrastructure. The consumer can only control some of the user-specific application configuration settings. Example: Yahoo!, Gmail, Google Diocs, etc. b) Platform as a Service (PaaS) The PaaS service model offers the services as operation and development platforms to the consumer. The consumer can use the platform to develop and run his own applications, supported by a cloud-based infrastructure. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations. Example: Google Aps, SQL Azure, etc. c) Infrastructure as a Service (IaaS) The IaaS service model is the lowest service model in the technology stack, offering infrastructure resources as a service, such as raw data storage, processing power and network capacity. The consumer can the use IaaS based service offerings to deploy his own operating systems and applications, offering a wider variety of deployment possibilities for a consumer than the PaaS and SaaS models. “The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems; storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls)”. Example: Amazon (S3, EC2), Windows Azure, etc. Note: Our project system is based on the IaaS model. We provide only infrastructure that is highly secure. It is up to the user to run his/her own applications and operating system in our platform. The application and operating system sent to our system is encrypted using the symmetric key desired by the user. 2. Motivation While a cloud migration can enable a company to potentially reduce capital expenditures and operating costs while also benefiting from the dynamic scaling, high availability, cloud computing also present numerous challenges and raise security concerns such as Insiders threat. A dissatisfied employee base provides a vector for insider security events, while the inadvertent injection of malware through removable media or web interconnections can make any employee the origination point for a network security violation. There can be fatal consequences if an unauthorized person has access to the cloud computing systems. For example, let’s say that a doctor wants to perform some image processing on brain scan images to detect tumors. Ideally he would like to send those images to the cloud systems in order to perform some analysis on those, since the systems present in cloud may host those applications or it may be economical. But there are lots of risks associated with this scenario. If a person with malicious intention has access to the cloud system, he can probe the hardware and get the code and data. He can then check the code to understand what the computation is about and can manipulate the results for his own purpose. When the doctor receives the wrong the result, it will lead to serious consequences. Usually the security is implemented at the software level. But with the just the software encryption and decryption, the contents stored in the memory systems like instruction cache or data cache is not encrypted. So one can probe these devices and get the required information. This makes us to think, what can be done at the hardware level to make the system secure. 3. Proposal We have come up with a solution which is very first of its kind. Our idea is based on the fact that the security implemented at the software level is not protected as we think. So we want to do everything at the hardware level. Implementing security at the hardware has two main advantages. First of all, the encryption and decryption at the hardware level are very fast compared to implementing them in the software. Second, it is harder to break the encrypted hardware system compared to secured software solutions. We demonstrate our idea using a simple 5 stage pipelined processor. So all the data including code coming to the processor is encrypted. The data going out of the processor is also encrypted. Now we have some design choices like at what stage we can decrypt and encrypt the code and data depending on how secured our system must be. To make the processor highly secured and nearly impossible to crack, we can put the decryption engine just before the execution stage of the 5 stage pipeline and again encrypt it before writing to the memory. All the memory systems like I-cache, D-cache and, even the register files can be encrypted. 4. System Level Working At a very high level, our system works in the below way. We have implemented a two level security mechanism. The first level is used for channel encryption and 2nd level is used for encryption/decryption of code by core 2. RSA key exchange between the customer and cloud computer. Using the Public RSA keys, the first level Symmetric keys are exchanged. Now the whole channel communication between the cloud and the host is encrypted using this first level key. The second level symmetric key is sent which is encrypted using the first level key. This key is stored in a special register by the core 1. Encrypted code is generated using our compiler. This code is encrypted twice. Once using first level symmetric key and then again using second level symmetric key. The code is broken into smaller packets and appended with some patterns in order for the receiving node to identify these packets. The Encrypted packets are sent to the Cloud computer one by one. One core in the cloud is dedicated and optimized for Packet processing. The core 1 does the first level decryption and checks for the valid pattern (Identifier) . The encrypted Instructions are extracted and loaded into the Cache of second core. Second processor contains an in-built encryption and decryption engine which isolates Memory from the Pipeline. Everything stored in the caches will be encrypted using the symmetric key provided by customer. The program result is encrypted again and sent back to customer by the First core. We have demonstrated the two level encryption systems in our secured network processor. The use case described in the next page explains a real life scenario where this level of encryption can be used. Since we do not have multiple FPGA’s to demonstrate the actual two level encryption like described in the use case, we try to emulate it by using a single host and a FPGA. The host (compiler) encrypts the code which is to be sent to the cloud using the two keys. The second key is sent to the FPGA in a packet as a normal data. But this is encrypted using the first level key. There is no explicit RSA key exchange for the exchange of the second level key. Only the first level key is exchanged through RSA. The below block diagram shows a use case of two level encryption system. In the above figure there are mainly three sub systems, the cloud system, ISP (Internet service provider) and the users. This use case is mainly to show how our two level encryption system can be used in a real life scenario. Between the ISP and Cloud provider, a channel encryption is maintained. This can be achieved using RSA key exchange and symmetric key exchange. We demonstrate this, by exchanging the first level symmetric key initially with the Cloud (NetFPGA). All the packets going to the cloud providers through the ISP from various users are encrypted using this key. In our case even the packet identifiers are also encrypted using the first level key. Different ISP’s can maintain their own different keys to encrypt the channel. The users can have their own encryption mechanism. In the above use case all 5 users use different symmetric keys to encrypt the code and data to be sent to the cloud for some processing. This encryption is independent of the encryption used by the ISP’s. Now the user sent packets traverse through the ISP who will again encrypt it, using their own keys, before sending to the cloud provider. When the cloud system receives packets from a particular ISP, it knows which keys to use to decrypt all the incoming packets. The cloud can then extract the packets and get all the required data. The initial packets from the user will contain the second level key which is encrypted in the first level key. This key can be decrypted and can be used further to decrypt code and data and execute them. 5. Implementation Details Let’s look into the implementation details of all the steps mentioned above. 5.1. RSA RSA is one of the first practical public-key cryptosystems and is widely used for secure data transmission. RSA stands for Ron Rivest, Adi Shamir and Leonard Adleman, who are the designers of this crypto system. The below block diagram shows a typical RSA key exchange protocol. NETFPGA HOST KEY EXCHANGE (Public key and Private Key) (Public key and Private Key) Symmetric Key Both the sender and the receiver have a pair of keys. These pair of keys is called Public and private keys. RSA is a asymmetric crypto system and is very slow. Crypto system using the symmetric keys is much faster. So using the RSA we exchange the symmetric keys in a secured way. Below steps are followed in the key exchange. Host sends its public key to the NetFPGA. NetFPGA receives the Host public key and encrypts its own public key using the host public key. Now the NetFPGA send the encrypted key back to host. Host receives the encrypted key. It decrypts it using its private key. The Host encrypts the symmetric key using the public obtained from the NetFPGA and sends it to the NetFPGA. Note: For our demonstration purpose we use a very simple 8-bit RSA keys. Our symmetric key is 64-bit long. 5.2. Compiler We have built a smart compiler which is compatible with our processor. Our compiler has many features: It can detect the dependency automatically and insert only a required amount of NOOPS to avoid hazard. It also supports address resolution for all the labels and branches as well. Unsupported MIPS instructions are mapped automatically using a combination of our instructions. It also supports generation of the output in both hex format which is used to run the code in NetFPGA and binary format for simulation purposes. It can also support generation of encrypted instructions using a user provided symmetric key. It can encrypt the instructions twice to support our extension of security. It also supports encoding of the instructions into some ASCII characters. This feature is provided in order to compensate for the”iperf” tool issue which will be discussed in later sections in detail. The whole compiler is designed in a very modular and flexible fashion. Addition or modifications to the ISA can be easily handled by the compiler with very minimal changes in the configuration files. In the Software Support section 5.4 of the report, the compiler usage is discussed in detail. 5.3. Hardware Details The below block diagram shows our secured dual core processor. Let us look into the processor in detail. As mentioned before, our processor consists of two cores. Both the cores are made up of simple 5-stage pipelined design. Core 1: This is the front end core and it takes all the heat from the network. This core is mainly responsible for capturing the network packets and processing them. Also it handles the RSA key exchange with the host and stores the symmetric in a special register. During out initial phase of the design, we came across a challenge. We need a scratch memory for packet processing in addition to the FIFO support. So we decided to split the D-cache of the core 1 into two parts. The first 128 words acts as memory and remaining 128 as FIFO. All the network packets arrive at the FIFO and notify the core using a special instruction. When the core 1 realizes that some packet has arrived, it starts to fetch the packet from FIFO and process it in scratch memory. All the incoming packets containing encrypted code has some fixed pattern. The core 1 searches for this pattern and sends out the packet immediately, if the pattern is not found. If the pattern is found it forwards the encrypted packet to the core 2 removing the pattern. Hardware accelerator: On the way to core 2, the packets go through a hardware accelerator. This hardware accelerator is capable of performing some ASCII to hex mapping. The UDP packets sent from the host using iperf, can only take ASCII values. So the compiler at the host side, after it encrypts the instructions it will encode the full program. Core 2 Now the decoded packets from the hardware accelerator arrive at core 2. But these packets are still encrypted. They are stored in the instruction cache of the Core 2. When the entire program is sent to the core 2, core 1 signals using a special instruction that the transfer is complete and core 2 can start executing. Upon receiving the go-ahead signal from core 1, core 2 starts fetching the encrypted instructions from its I-cache. A decryption engine lies in between the Icache and decode stage. Just before the decode stage the instructions are decrypted and sent to the execution stage. Similarly an encryption unit lies in between the execution and D-cache. After the execution of the instruction, just before writing back to the memory, the data is again encrypted and written in memory. Also in the write back stage the registers are encrypted before writing back to register files. Thus all the memory parts like I-Cache, D-Cache and register files are encrypted. 5.4. Software Support We have created an extensive software support to test our system. Below are the some of the software’s available. We go in detail how to use them and what it is capable of. Software Name EE533_CF_SecuredProcessor EE533_datapath EE533_nf_config EE533_HostSide_Config mini_asm_compiler Location \Software\AssemblyCode \Software\Scripts \Software\Scripts \Software\Scripts\HostSideConfig \Software\Compiler 5.4.1. EE533_CF_SecuredProcessor This is an assembly code written using the ISA for our ColdFire Secured Processor. This code is run on the core 1 of our processor. This software acts like an Operating system for our system. This code is responsible for the below functions: - Capture of packets from network - FIFO and memory management RSA Key exchange with the Host First level packet decryption in core 1 Decrypting the encrypted symmetric key and saving in a special register Pattern detection (Code, Keys etc) Transfer of encrypted code to Core 2 Booting of encrypted code to be run on core 2. Sending out the invalid packets out of system Handles request of result from users 5.4.2. EE533_datapath This a perl script behaves like a boot loader code. It loads the “EE533_CF_SecuredProcessor”program to the Core 1 and starts the core 1. It can do the following functionalities. - Loading of “EE533_CF_SecuredProcessor” program to core 1 instruction cache. Enable/disable of program counter of Core1 Read/write Data cache and instruction cache Below are the commands to enable the above mentioned functionalities. Commands Reset Start Read ic <address> Ic <File Name> Read data <address> dc <FileName> Functionality Resets the PC of core 1 Enables PC of core 1 Reads the instruction cache core 1 given the address Write into instruction cache of core 1 from the input file provided Reads data from data cache of core 1from the address Write into the data cache of core 1 from the input file 5.4.3. EE533_nf_config This is a perl script to configure and program the NetFPGA with the base “Reference Router” program. Execute this script to point to the log file generated after the Nodes are swapped in the deter. “Perl EE533_nf_config /proj/USCEE533/exp/CFlab12/tbdata/tbreport.log” 5.4.4. EE533_HostSide_Config This is a perl script which is run on the Host side of the system. This script provides a bunch of functionalities to communicate with the NetFPGA which acts like a Cloud Computer in our demo. Below are the functionalities provided by this script - RSA Key exchange with the Cloud system (NetFPGA) Sending first and second level symmetric keys Send encrypted data to FPGA Send encrypted code to FPGA Request the results from the FPGA The below table gives the usage information of this script Commands -pub -sym -sym2 -data -code -req Functionality Send host public key to fpga Send encrypted first level symmetric keys to fpga Send encrypted second level symmetric keys to fpga Send any encrypted data to fpga Send the encrypted code to fpga. This command takes the encrypted code from a file “FileInp” which should be located in the same path as this script Request result from fpga. This command takes the requesting address from a file “RequestAddress” which should be located in the same path as this script. 5.4.5. mini_asm_compiler This is a smart compiler written for our custom ISA. The features of the compiler are already discussed in the Compiler section 5.2. The usage details of the compiler are as below. Usage: perl mini_asm_compiler <-s> <-d> <-r> <-o> <-en> <-b/-h>\n"; Commands -all -s : -d -r -o -b/-h -en -ed En2 Functionality Run all the programs in correct order <input file> <output file>\n"; Generate assembly for custom isa Resolve the instruction dependencies Resolve branch target address Generate output file custom isa Generate output file in binary/hexadecimal format <default : hexadecimal> Generate encrypted instruction for custom isa Generate encoded instruction for custom isa Generate 2nd level encrypted instruction for custom isa 6. Comparison with other secured Systems We believe that, no systems similar to ours are available. So we cannot make a direct comparison to evaluate the performance of our system. Currently the security is implemented at the software level as mentioned in the previous sections. Below table compares our system with a system in which security is implemented at the software level. Our System Encryption/decryption in hardware Faster encryption/decryption I-Cache , D-Cache, Register file contents are encrypted Probing the hardware to snoop data/code is very hard Software level security Encryption/decryption in software Slower I-Cache , D-Cache, Register file contents are NOT encrypted. Can easily probe memory devices and get the data. 7. Future Improvements - Currently we have implemented the RSA using a simple 8 bit key. It can be extended to 768/1024 bits to provide more secured channel. The symmetric encryption/decryption is implemented using simple XOR. It can be replaced by AES/DES encryption engine. Incorporate more cores/threads to service more user encrypted code simultaneously in cloud computer (NetFPGA). 8. Datasheet 8.1. Front End Master Processor 8.2. Slave Crypto Processor 8.3. Instructions supported by our ISA Instruction Opcodes AluCntrl Rd Rs Rt Remaining bits (4 bits) (4 bits) (4 bits) (4 bits) (4 bits) (12 bits) NOP 0000 - - - - - BEZ 0001 - - Rs ALU 0010 AluCntrl Rd Rs Rt - SD 0011 0000 - Rs Rt Offset LD 0100 0000 Rd Rs Offset ALUI 0101 AluCntrl Rd Rs Immediate JR 0110 0010 - - Rt - JAL 0110 0001 Rd - $0 Address J 0110 0000 - - - Address RHEAD 0111 0000 Rd - - - RTAIL 0111 0001 Rd - - - RXPKT 0111 0101 - - - - TXPKT 0111 0110 - - - - RINT 1000 0000 Rd - - - MULT 1001 0000 Rd Rs Rt - DIV 1001 0001 Rd Rs Rt - PC_EN 1010 0000 - - - - PC_DIS 1010 0001 - - - - SD2 1011 0000 - Rs Rt Offset LD2 1100 0000 Rd Rs SD_SYM 1101 0000 - Rs Offset Offset - NOP Opcode 31-28 0000 Other fields 27-0 XXXX-XXXX-XXXX-XXXX-XXXX-XXXX-XXXX - This instruction does not change any architected state of the registers. Also it will not modify any memory content in the processor. NOP instruction takes one clock cycle and increments the PC. This instruction is used in between producer and consumer instructions so that consumer gets the correct and most update value of register. This is also used after Jump/Branch instructions to prevent fetching of incorrect instructions. BEZ Opcode 31-28 0001 AluCntrl 27-24 0000 --------23-20 XXXX Rs 19-16 #### Offset 15-0 ####-####-####-#### Branch if equal to zero (BEZ) is a conditional branch instruction. If the LSB of the value of register specified by Rs is equal to zero branch will be taken. The values in the bits [63 :1] of the registers are ignored. This helps in reducing the decision making delay caused by the branch. The branch address is calculated as PC + offset. The 16 bit offset is represented in 2's complement form. Hence the possible values of offset are (-32768 to 32767). This instruction should be followed by 3 NOPs. ALU Opcode 31-28 0010 AluCntrl 27-24 #### Rd 23-20 #### Rs 19-16 #### Rt 15-12 #### --------11-0 XXXX-XXXX-XXXX The Arithmetic and Logical operations are collectively represented by the Opcode and AluCntrl bits. The operations specified by the AluCntrl are performed on the values of register Rs and Rt by the ALU and the result is written back into register Rd. The operations performed by ALU are as shown below. The arithmetic operations are unsigned and overflows are ignored. For NOT operation register Rs is ignored. The dependant instructions should be separated by at least 3 independent intermediate instructions. AluCntrl[3:0] 0000 0001 Operation ADD SUB Operand relation Rd = Rs + Rt Rd = Rs - Rt 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 11XX Left Shift Right Shift AND OR NOT XOR XNOR SETG SETL SETE --- Rd = Rs << Rt Rd = Rs >> Rt Rd = Rs & Rt Rd = Rs | Rt Rd = ~ Rt Rd = Rs ⊕ Rt Rd = Rs ʘ Rt Rd = ( Rs > Rt ) ? 1 : 0 Rd = ( Rs < Rt ) ? 1 : 0 Rd = ( Rs == Rt ) ? 1 : 0 --- SD Opcode 31-28 0011 AluCntrl 27-24 0000 --------23-20 XXXX Rs 19-16 #### Rt 15-12 #### Offset 11-0 ####-####-#### Store instruction transfers the 64 bit value in register Rt to the address specified by Rs and Offset. ALU performs addition on Rs and Offset and the result is used as address for the DCache . The upper bits of ALU result (more than the size of DCache) are ignored. The offset is 12 bits and can vary from 0 to 4095. LD Opcode 31-28 0100 AluCntrl 27-24 0000 Rd 23-20 #### Rs 19-16 #### Offset 15-0 ####-####-####-#### Load instruction transfers the 64 bit value from the memory (DCache) to the register Rd. The address of memory location is calculated by the ALU by adding Rs and Offset. Note that in Load instruction the offset can potentially have 16 bit value. The dependant instructions should be separated by at least 3 independent intermediate instructions. ALUI Opcode 31-28 AluCntrl 27-24 Rd 23-20 Rs 19-16 Immediate 15-0 0101 0000 #### #### ####-####-####-#### Instruction similar to ALU , but the second operand is a 16-bit immediate value taken directly from the instruction. The upper 48 bits of the Immediate value is set to 0. The dependant instructions should be separated by at least 3 independent intermediate instructions. JUMP Opcode 31-28 0110 Cntrl 27-24 0000 --------23-12 XXXX-XXXX-XXXX Address 11-0 ####-####-#### Jump instruction loads the PC with address provided in the lower 12 bits of the instruction. There are no ALU operations performed. This instruction should be followed by 3 NOPs. JAL Opcode 31-28 0110 Cntrl 27-24 0001 Rd 23-20 #### --------19-12 XXXX Address 11-0 ####-####-#### This instruction is used to jump to a sub routine. This instruction stores the current PC in the Link register and replaces the PC with the jump address. This instruction should be followed by 3 NOPs. JR Opcode 31-28 0110 Cntrl 27-24 0010 --------23-16 XXXX Rt 15-12 #### --------11-0 XXXX-XXXX Jump Register instruction replaces the PC with the value present in register Rt. This instruction should be followed by 3 NOPs. 8.4. Network Interface Instructions RHEAD Opcode 31-28 0111 Cntrl 27-24 0000 Rd 23-20 #### --------19-0 XXXX-XXXX-XXXX-XXXX-XXXX The Dcache of the processor is a convertible into a FIFO. The pointer to the head of the FIFO is read and written into register Rd. The dependant instructions should be separated by at least 3 independent intermediate instructions. RTAIL Opcode 31-28 0111 Cntrl 27-24 0001 Rd 23-20 #### --------19-0 XXXX-XXXX-XXXX-XXXX-XXXX The pointer to the tail of the FIFO is read and written into register Rd. The dependant instructions should be separated by at least 3 independent intermediate instructions. RXPKT Opcode 31-28 0111 Cntrl 27-24 0101 --------23-0 XXXX-XXXX-XXXX-XXXX-XXXX-XXXX Receive Packet instruction sends signal to the FIFO control to receive packets from the network and store it in FIFO. Upon receiving a packet, the pointer to the tail of FIFO increments. TXPKT Opcode 31-28 0111 Cntrl 27-24 0110 --------23-0 XXXX-XXXX-XXXX-XXXX-XXXX-XXXX Transmit Packet instruction sends signal to the FIFO control to send the packets from FIFO to the network. Upon sending a packet, the pointer to the head of FIFO increments until it becomes equal to tail pointer. RINT Opcode 31-28 1000 Cntrl 27-24 0000 Rd 23-20 #### --------19-0 XXXX-XXXX-XXXX-XXXX-XXXX Read Interrupt instruction gives an access to internal Interrupt Register. Interrupt Register holds the interrupt ID. Once the RINT instruction is issued the interrupt ID is transferred to Rd and Interrupt register is reset. The dependant instructions should be separated by at least 3 independent intermediate instructions. 8.5. Multiplication/Division MULT Opcode 31-28 1001 Cntrl 27-24 0000 Rd 23-20 #### Rs 19-16 #### Rt 15-12 #### --------11-0 XXXX-XXXX-XXXX A 32 bit multiplier is integrated to the processor. The lower 32 bits of registers Rs and Rt are multiplied and the 64 bit result is written back into register Rd. The multiplier unit has a 4 clock cycles of latency. Hence a MULT instruction should be followed by 5 NOPs. DIV Opcode 31-28 1001 Cntrl 27-24 0001 Rd 23-20 #### Rs 19-16 #### Rt 15-12 #### --------11-0 XXXX-XXXX-XXXX A 16 bit division unit is integrated to the processor. The lower 16 bits of Rs and Rt are considered and the integer value of Rs/Rt is computed. The 16 bit result is written back into register Rd. The Division unit has a 16 clock cycles of latency. Hence a DIV instruction should be followed by 17 NOPs. 8.6. Inter Processor Communication Instruction PC_EN Opcode 31-28 1010 Cntrl 27-24 0000 --------23-0 XXXX-XXXX-XXXX-XXXX-XXXX-XXXX This instruction enables the program counter for the Slave processor attached to the Master. Upon this instruction slave processor starts running the instructions from its ICache. PC_DIS Opcode 31-28 1010 Cntrl 27-24 0001 --------23-0 XXXX-XXXX-XXXX-XXXX-XXXX-XXXX Upon issuing the PC disable instruction by the master, the program counter, stage register , register file of the slave processors are reset. SD2 Opcode 31-28 1011 AluCntrl 27-24 0000 --------23-20 XXXX Rs 19-16 #### Rt 15-12 #### Offset 11-0 ####-####-#### SD2 instruction is similar to SD instruction. But when a master executes SD2 instruction instead of writing into the memory, it transfers the value to the ICache of the slave processor. The delays associated with SD2 are similar to SD. LD2 Opcode 31-28 1100 AluCntrl 27-24 0000 Rd 23-20 #### Rs 19-16 #### Offset 15-0 ####-####-####-#### LD2 instruction is similar to LD, but when a master executes it, instead of reading from its memory, the DCache of the slave processor is read. This value is written back into register Rd. The dependant instructions should be separated by at least 3 independent intermediate instructions. SD_SYM Opcode 31-28 1101 AluCntrl 27-24 0000 --------23-20 XXXX Rs 19-16 #### --------15-0 ####-####-####-#### The slave processor is a crypto engine. The master provides the symmetric key to be used by the slave through SD_SYM instruction. This updates an internal register with the value of Rs. References 1] Lubos Gaspar, Viktor Fischer, Lilian Bossuet, Robert Fouquet. Secure extension of FPGA softcore processors for symmetric key cryptography. 6th International Workshop on Reconfigurable Communication-centric Systems-on-Chip, ReCoSoC 2011, Jun 2011, Montpellier, France. [2] Eslami, Yadollah, et al. "An area-efficient universal cryptography processor for smart cards."Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 14.1 (2006): 43-56. [3] Hu, Kekai, et al. "System-level security for network processors with hardware monitors."Design Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE. IEEE, 2014. [4] Fletcher, Christopher W., Marten van Dijk, and Srinivas Devadas. "A secure processor architecture for encrypted computation on untrusted programs."Proceedings of the seventh ACM workshop on Scalable trusted computing. ACM, 2012.