Department of Electrical and Computer Engineering Computer Architecture – Fall 2022 Instructor: Dr. Maria Abi Saad Homework 1 Alex H. Ajamian 201804477 0|Page Table of Contents: Table of Figures: ............................................................................................................................. 2 Introduction: .................................................................................................................................... 3 Textbook Assignments: .................................................................................................................. 4 Problem 1: ................................................................................................................................... 4 Problem 2: ................................................................................................................................... 4 Problem 3: ................................................................................................................................... 5 Problem 4: ................................................................................................................................... 5 Problem 5: ................................................................................................................................... 7 Problem 6: ................................................................................................................................... 8 Problem 7: ................................................................................................................................... 9 Non-Textbook Assignments: ........................................................................................................ 11 Problem 1: ................................................................................................................................. 11 Problem 2: ................................................................................................................................. 13 Problem 3: ................................................................................................................................. 13 Problem 4: ................................................................................................................................. 14 Problem 5: ................................................................................................................................. 15 Works Cited: ................................................................................................................................. 18 1|Page Table of Figures: Figure 1 : Given of problem 4......................................................................................................... 5 Figure 2 : Given of problem 5......................................................................................................... 7 Figure 3: Major Features and differences of CISC and RISC architectures ................................. 14 Figure 4 : Apples' chips' details .................................................................................................... 16 2|Page Introduction: This word document will cover the first assignment of the computer architecture course. The format is as follows: Large titles: Times New Roman, Font 16, blue, bold, underlined. Small title: Times New Roman, Font 14, blue, bold, underlined. Regular texts: Times New Roman, 12. Line spacing is 1.5 throughout. All calculations are rounded up to the second decimal point. Kindly note that no certain referencing was used (APA, MLA); the information used extracted from reliable sources can be found in the work cited page at the end. 3|Page Textbook Assignments: Problem 1: Computer architecture refers to the attributes of a system that are visible to the programmer, in other words, those attributes that have a direct impact on the logical execution of a program. Whereas, computer organization refers to the operational units and their interconnections that realize the architectural specifications. For example, an architectural design issue is whether a computer will have a multiply instruction imbedded in its system, however, it is an organizational issue whether this multiply function is implemented using a block that multiples or multiple blocks that add the value multiple times. The decision on how the multiplication process is implemented in the organization depends on multiple factors such as how fast must the multiplication process be, or how many often are we going to have to multiply. Problem 2: One notable performance advantage for the IBM 65 and 75 that the overall processing speed has been increased, since we now have access to two words from memory in a single fetch instruction due to the addresses being staggered in even and odd placements inside of the main memory; hence a single clock cycle. In addition, this change in design will allow us to add more data lines instead of increasing our clock speed; since one is easier to achieve than the other. 4|Page Problem 3: To begin with, the ENIAC machine that has each register represented by a ring of ten vacuum tubes is an inefficient one since only one of those tubes can be ON at any time. To further explain, the representation is inefficient or wasteful due to the fact that we only need four vacuum tubes to represent all of the decimal digits hence the other six tubes are somewhat redundant. When discussing the range of numbers that the ENIAC machine can represent using all of the ten vacuum tubes that each have two states. When considering binary representation, we can represent a total of 2^10 combinations. Considering that we have to also represent the decimal digit 0 and all of the possible combinations, the machine can account for a range of 0 ο 2^10 -1, hence 0 ο 1023. Problem 4: Given: clock rate of both machines 200 MHz Figure 1 : Given of problem 4 5|Page We are asked to calculate the Clock per Instruction (CPI), execution time, and the MIPS Clock per instruction calculation: πΆππΌπ΄ = `πΆππΌπ΅ = ( 8∗1)+(4∗3)+(2∗4)+(4∗3) 8+4+2+4 = 2.22 ππ¦ππππ πππ πππ π‘ππ’ππ‘πππ ( 10∗1)+(8∗2)+(2∗4)+(4∗3) 10+8+2+4 = 1.92ππ¦ππππ πππ πππ π‘ππ’ππ‘πππ Execution time calculation: πΈπ₯πππ’π‘πππ ππππ = πππ π‘ππ’ππ‘πππ πππππ ππ¦ππππ π ππππππ ∗ ∗ πππππππ πΌππ π‘ππ’ππ‘ππππ πππππ ππ¦ππππ πΈπ₯πππ’π‘πππ ππππ ππ π΄ = (8 + 4 + 2 + 4)π₯ 106 ∗ 2.22 ∗ πΈπ₯πππ’π‘πππ ππππ ππ π΅ = (10 + 8 + 2 + 4)π₯ 106 ∗ 1.92 ∗ 1 = 0.2 π ππππππ 200 ∗ 106 1 = 0.23 π ππππππ 200 ∗ 106 MIPS calculation: (πΌππ π‘ππ’ππ‘πππ πππ’ππ‘) ππΌππ = πΈπ₯πππ’π‘πππ π‘πππ∗106 ππΌππ ππ π΄ = ππΌππ ππ π΅ = (πΌππ π‘ππ’ππ‘πππ πππ’ππ‘) 18 ∗ 106 = ( ) = 90 πΈπ₯πππ’π‘πππ π‘πππ ∗ 106 0.2 ∗ 106 (πΌππ π‘ππ’ππ‘πππ πππ’ππ‘) 24 ∗ 106 = ( ) = 103.35 πΈπ₯πππ’π‘πππ π‘πππ ∗ 106 0.23 ∗ 106 Conclusion: Our primary indication for performance is the execution time of the machines, due to the fact that at the same clock speed machine A is able to achieve a lower execution time by 0.03 seconds then we can conclude that machine A is more efficient that machine B. 6|Page It is significant to take into consideration that machine B has a higher MIPS than machine A; however, the execution time remains our main factor of determining the better outcome of the two, especially the fact that their clock cycles are both 200 MHz Problem 5: Given: Figure 2 : Given of problem 5 πΌππ π‘ππ’ππ‘πππ πππ’ππ‘ A) ππΌππ = πΈπ₯πππ’π‘πππ π‘πππ ∗106 → πππ π‘ππ’ππ‘πππ πππ’ππ‘ = ππΌππ ∗ πΈπ₯πππ’π‘πππ π‘πππ ∗ 10^6 πΌπππ π‘ππ’ππ‘πππ πππ’ππ‘ πππ ππ΄π = 12π₯ ∗ 1 ∗ 106 πΌππ π‘ππ’ππ‘πππ πππ’ππ‘ πππ πΌπ΅π = π₯ ∗ 18 ∗ 106 If we have a code running on these two machines the relative size of the instruction count can be calculated as follows: relative size = 12x/18x = 2/3 = 0.67 B) ππΌππ = πππππ πππ‘π πΆππΌ∗106 → πΆππΌ = πππππ πππ‘π ππΌππ∗106 (5 ∗ 106 ) πΆππΌ πππ ππ΄π = =5 1 ∗ 106 7|Page 25 ∗ 106 πΆππΌ πππ πΌπ΅π = = 1.39 18 ∗ 106 Problem 6: 8|Page Problem 7: Given: 32-bit microprocessor, 16-bit external data bus, 8MHz input clock Microprocessor bus cycle = 4 * clock cycle To be able to achieve the maximum data transfer that the bus can contain we need to determine the number of bytes that can be transferred per bus cycle. 9|Page Knowing that the frequency of the input clock is 8MHz we can determine the frequency of the data bus as well, which will allow us to figure out how much data the bus is transferring. The input frequency of the data bus can be calculated as 8MHz/4 = 2MHz; given that the external data bus has a capacity of 16 bit, the maximum transfer occurs when we are using the full external data bus capacity. Therefore, the maximum transfer rate of the bus = 16 bits/cycle * 2MHz = 32 bits/sec = 4 bytes/sec. To be able to increase performance, we can either increase the bus size where we can transfer more data from the now larger bus or we can increase the clock speed where we transfer the same data but at a faster pace; increasing the size of the data bus to 32 bits is the more logical outcome since we are simply increasing its size or widening it. Even though it is more difficult and impractical to increase the clock speed, it does have a lot more benefits than increasing the size of the data bus. 10 | P a g e Non-Textbook Assignments: Problem 1: 1) To be able to asses which machine has a higher performance, we will take a look at the run time of each machine and compare them to each other, the overall run time is an optimal performance metric that will allow us to either prove or disprove the given fallacy. (0.9 ππππππ πππ πππ π‘ππ’ππ‘πππ ∗ 5 ∗ 109 πππ π‘ππ’ππ‘ππππ ) π1 ππ’π π‘πππ = = 1.125 π ππ 4 ∗ 109 ππππππ πππ π πππππ (0.75 ππ¦ππππ πππ πππ π‘ππ’ππ‘πππ ∗ 109 πππ π‘ππ’ππ‘ππππ ) π2 ππ’π π‘πππ = = 0.25 3 ∗ 109 We can clearly see that P2 has a better run time by quite a significant margin ο 1.125/0.25 = 4.5 times faster that P1, even though P1 has a larger clock cycle, hence we prove that the statement is indeed a fallacy. 2) To begin with, the starting assumption that the processor or machine that goes through the most instruction is the better functioning or performing one is not a valid performance measure; however, we will compute the performance indicators that are requested in the problem. π1 ππ’π π‘πππ (1πΊ πΌππ π‘ππ’ππ‘ππππ ) = (0.9 ∗ 1 ∗ 109 ) = 0.225 π ππππππ 4 ∗ 109 # ππ πππ π‘ππ’ππ‘ππππ ππ π2 π’π πππ ππ’π π‘πππ ππ π1 = (3 ∗ 109 ∗ 0.225) = 9 ∗ 109 πππ π‘ππ’ππ‘ππππ 0.75 11 | P a g e As we can see from the outcome, both processors have a different number of instructions despite the fact that they had the same run time; the difference in clock speed and CPI shows its effect here. 3) Find the MIPS calculations below: ππΌππ ππ π1 = (4 ∗ 109 ) = 4444.44 0.9 ∗ 106 ππΌππ ππ π2 = (3 ∗ 109 ) = 4000 0.75 ∗ 106 We can clearly see that P1 has a higher MIPS, even though we established that P2 has a far better run time than P1. 4) Find the MFLOPS calculations below: ππΊπΏπππ ππ π1 = 0.4 ∗ (4 ∗ 109 ) = 1777.78 0.9 ∗ 106 0.4 ∗ (3 ∗ 109 ) ππΉπΏπππ ππ π2 = = 1600 0.75 ∗ 106 We have reached the same conclusion as in part 3, the MFLOPS are not a good performance measure since the opposite of the run time conclusion was reached. 5) When trying to use the MIPS as a performance measure we are not considering the run time of the machine, we are considering the number of instructions completed in a second completely disregarding the clock speed of the processor and the cycles used per instruction. 12 | P a g e Problem 2: 1) To be able to find the clock writing period in ns we need to first calculate T then subtract 50 ns from it; we can find T = 1/f = 1/5MHz = 200 ns Hence our clock writing time Tw = 150 ns (our window to write information reliably) 2) We simply need 20 more ns since that is the duration of the writing signal that remains after fall off ο Twf = 150 ns + 20 ns = 170 ns 3) We can only fit one 190 ns in 200 ns hence we can only have one wait state if 190 ns are needed for reliable data representation Problem 3: A) To find the percent increase in time we must find the time it takes for each program and then find the percent difference or directly find the percent difference from the number of bus cycles since the difference will be the same. π·πππππππππ = (πππππ−ππππ‘πππ) ππππ‘πππ = 29−19 19 ∗ 100 = 52.63 % πππππππ π ππ π‘πππ ππ’π π‘π π€πππ‘π Final = 4+2+3+2+3+2+3+2+3+3+2 = 29 bus cycles Initial = 4+3+3+3+3+3 = 19 cycles B) part 2 13 | P a g e Problem 4: We will now discuss both CISC and RISC architectures: - Reduced Instruction Set Computer (RISC) is a type of processor where machines execute one instruction per clock cycle. - Complex Instruction Set Computer is a type of processor where machines can have special instructions as well as instructions that take up more than one clock cycle. - RISC is known as the streamlined version of CISC, since CISC architecture might take several instructions to execute on a RISC machine Figure 3: Major Features and differences of CISC and RISC architectures Some known examples or RISC processors are: o MIPS, PowerPC, Atmel’s AVR, RISC-V, and most of the modern processors known to date Some known examples of CISC processors are: o VAX, Motorola 68000 family, System/360, AMD, and intel x86 CPUs 14 | P a g e Problem 5: We will conduct a small research on apples’ M1 microprocessor. It is important to note that we can see some of the concepts covered in this course in some real-life applications and one notable application is conducting such research or being able to use the knowledge taught to us in this course as a way to come to reasonable conclusions regarding certain purchases or decision. A) Some very notable features that we can find about Apples’ M1 processor that may be of useful information to us: a. Clock speed b. Amount of cache c. Levels of cache d. Power consumption e. Typical uses To begin with, the M1 processor has a clock speed of 3.2 GHz in an 8 core CPU and a 78 core GPU; compared to other processor standards the M1 is almost 3.5x faster than a quad-core 1.2 GHz Intel Core i7 processor. Second, the Apple M1 processor has multiple cores that have different caches with different levels accessible to the different cores; the high-performance cores have a large 192 KB of L1 instruction cache and 128 KB of L1 data cache and also shares a 12 MB L2 cache; whereas, the more energy-efficient cores have a 128 KB L1 instruction cache and 64 KB L1 data cache, and a shared 4 MB L2 cache. The CPU and GPU can both use the 24 MB SLC (System level Cache). Hence, we have a total of 7 caches in the Apple M1 where they are separated between the high efficiency cores and the more energyefficient cores o 3 caches for high-performance cores (2 L1s and 1 L2) o 3 caches for energy-efficient cores (2L1s and 1L2) o System level cache between the CPU and GPU 15 | P a g e Before getting into the typical uses for the Apple M1, we will take a look at its power consumption, when in peak power consumption the M1 chip will consume around 30-31 W of power. A small table from a reliable source regarding all of the chips’ detail and significant figures: Figure 4 : Apples' chips' details B) The research states that the Apple M1 was designed to run in mobile devices, where it also offers an M1 pro, M1 Max, and M1 Ultra. The other versions of the Apple M1 are used for machines that have higher-performance requirements or larger machine such as iPad and macs. C) Other processor brands in the market include Intel and AMD. Intel is an American multinational corporation that sells microprocessors to international computer companies including Dell and Hewlett Packard (HP). On the other hand, AMD also produces and sells microprocessors and is considered to be Intel's main competitor. Regarding the 16 | P a g e reason why Apple shifted from Intel to its own made ARM-based chips (M1) is largely due to three things: Increasing the performance, increasing its efficiency, and maximizing profits. According to Apple, the Macs with M1 chips are considered to be approximately 3 times faster and better than the older versions. Moreover, it will also ensure longer battery life. Now it is indeed a given that every new product released should be better and more efficient than the preceding one. The latter ideology is also the reason why Apple shifted from M1 to M2. It is said that the M2 chip is 1.4 times faster than the M1. Also, the M2 chip offers up to a 10-core GPU, while the M1 offers a maximum of 8-core. The aforementioned improvements are only a few from the wide range of improvements that have been seen through the shift from M1 to M2. 17 | P a g e Works Cited: (1) https://www.microcontrollertips.com/risc-vs-cisc-architectures-onebetter/#:~:text=RISC%2Dbased%20machines%20execute%20one,execute%20on%2 0a%20RISC%20machine. (2) https://www.javatpoint.com/risc-vscisc#:~:text=Examples%20of%20CISC%3A%20VAX%2C%20Motorola,and%20the %20Intel%20x86%20CPUs. (3) https://www.apple.com/euro/business/mac/pdf/Apple-at-Work-M1-Overview.pdf (4) https://discussions.apple.com/thread/252061257 (5) https://www.notebookcheck.net/Apple-M1-Pro-Processor-Benchmarks-andSpecs.579915.0.html (6) https://www.macrumors.com/guide/m1/ (7) https://www.investopedia.com/ask/answers/120114/who-are-intels-intc-maincompetitors.asp (8) https://www.techtarget.com/searchenterprisedesktop/news/252491875/Apple-debutsMacBooks-Mac-mini-with-in-house-M1-chip (9) https://www.apple.com/newsroom/2022/06/apple-unveils-m2-with-breakthroughperformance-and-capabilities/ 18 | P a g e