Uploaded by allen070533

Computer Architecture Homework Solution: RISC-V & AndeSight

advertisement
Department of Computer Science
National Tsing Hua University
EECS403000 Computer Architecture
Spring 2024, Homework 1 Solution (Revised) and Rubric
Due date:
1. (40 points)
Installing and using AndeSight™ for RISC-V program development.
(1) See AndeSight_STD_v5.3_Installation_Guide.pdf for the installation guide.
(2) Create a new Andes C Project
(i) Click on File → New → Project → C/C++, and select “Andes C Project”.
(ii) Create a project with the project name “fast_power_recur”.
(iii) From Chip Profile, select chip profile “AE350” → “ADP-AE350-NX25F”.
(iv) From Project Type, select project type “Andes Executable” → “Hello World ANSI C Project”
(v) From Toolchains, select the “nds64le-elf-mculib-v5d”.
(vi) Other configurations are left as default.
(3) Replace fast_power_recur.c in the project with the one we provided.
(4) To build the project, click on the expanding arrow (a small triangle ) beside “Build”
in the
toolbar → “1 Debug”
for project “fast_power_recur” in the toolbar.
(5) To execute the program, press “Debug”
“Resume”
in the debug window.
in the toolbar → “1 Application Program” and press
You can follow the same steps for other program codes.
To select (or check) the optimization setting, follow the figure below.
To inspect the Assembly code of the program, follow the figure below.
In this exercise, we will experiment with the naïve and fast power computation in two different
implementations (iterative and recursive) with AndeSight™. There are four source codes, namely
naive_power_iter.c, naive_power_recur.c, fast_power_iter.c, and fast_power_recur.c. The optimization
level -Og will be used by default in the following questions unless stated otherwise.
(a) (10 points) Effects of algorithms on performance
Press “Profile”
in the toolbar and select Profile as “Application Program”. Press the button
“Resume”
in the debug window, record the CycC and InsC for the four functions listed in the
table below and complete the table. Based on the characteristics of the programs, briefly compare
and explain the differences between the naïve and fast power algorithms in their profiles.
Function
Source
CycC
InsC
naive_power_iter()
naive_power_recur()
fast_power_iter()
fast_power_recur()
naive_power_iter.c
naive_power_recur.c
fast_power_iter.c
fast_power_recur.c
83
194
42
118
50
135
27
73
naive_power_iter()
naive_power_recur()
fast_power_iter()
fast_power_recur()
● InsC and CycC of the iterative naïve power > InsC and CycC of the iterative fast power
algorithm
● InsC and CycC of the recursive naïve power > InsC and CycC of the recursive fast power
algorithm
● The number of multiplication operations in the naïve implementation is directly proportional to
the given number of powers. In this case, both the iterative and recursive versions involve 11
multiplication operations.
● On the other hand, the number of multiplication operations in the fast power algorithm is at most
2×⌈ log π‘™π‘œπ‘” (π‘π‘œπ‘€π‘’π‘Ÿ) ⌉. In this case, both the iterative and recursive versions involve 7
multiplication operations.
● Therefore, the fast power algorithm naturally has lower CycC and InsC than the naïve
counterpart.
Grading policy:
● Each CycC and InsC values are worth 1 point each.
● Comparison and justification are worth 2 points.
(b) (10 points) Effects of programming on performance
From the table above and based on the characteristics of the programs, briefly compare and explain
the differences between the iterative and recursive implementations of the fast power algorithm in
their profiles. Suppose that they are executed in a processor with a clock rate of 3 GHz, what are the
average CPI and CPU execution time for the fast_power_iter() and fast_power_recur() functions?
Function
Average CPI
fast_power_iter()
𝐢𝑦𝑐𝐢
42
𝐢𝑃𝐼 = 𝐼𝑛𝑠𝐢 = 27 = 1. 5
𝐢𝑦𝑐𝐢
118
𝐢𝑃𝐼 = 𝐼𝑛𝑠𝐢 = 73 = 1. 61644
fast_power_recur()
Average Execution Time
𝐢𝑦𝑐𝐢
42
𝐢𝑦𝑐𝐢
3×10
118
𝐸𝑇 = πΆπ‘™π‘œπ‘π‘˜ π‘…π‘Žπ‘‘π‘’ =
𝐸𝑇 = πΆπ‘™π‘œπ‘π‘˜ π‘…π‘Žπ‘‘π‘’ =
9
9
3×10
= 14𝑛𝑠
= 39. 3𝑛𝑠
● CycC & InsC of the recursive fast power algorithm > CycC & InsC of the iterative fast power
algorithm.
● The recursive implementation requires more push and pop (memory access) instructions, which
leads to a higher cycle count.
Grading policy:
● Each CPI and Execution Time values are worth 2 points each.
● Comparison and justification are worth 2 points.
(c) (10 points) Effects of the compiler on performance
Compile fast_power_iter.c and fast_power_recur.c with two optimizations levels, -O0 and -O1.
Record the CycC and InsC and compute their corresponding CPI for the two different optimization
levels. Furthermore, briefly compare and explain the differences in their profiles.
Function
Optimization level -O0
CycC
InsC
Optimization level -O1
CPI
CycC
InsC
179
or
83
fast_power_iter()
179 or 176
176
83
83
42
27
118
73
CPI
42
27
= 1. 5
118
73
= 1. 6
= 2. 15663
or 2. 12048
fast_power_recur()
271
166
271
166
= 1. 6
fast_power_iter()
with optimization flag -O0
fast_power_recur()
with optimization flag -O0
fast_power_iter()
with optimization flag -O1
fast_power_recur()
with optimization flag -O1
● CycC & InsC of iterative with -O0 opt flag > CycC & InsC of iterative with -O1 opt flag.
● CycC & InsC of recursive with -O0 opt flag > CycC & InsC of recursive with -O1 opt flag.
● The -O1 optimization flag allows the compiler to optimize for speed.
● On the other hand, the -O0 optimization flag does not optimize the program.
● The -O1 optimization flag naturally has lower CycC and InsC than the -O0 optimization flag.
Note: the screenshot of fast_power_iter() with optimization flag -O0 with CycC 176 is unavailable.
Grading policy:
● Each CycC and InsC values are worth 0.5 points each.
● Each CPI value is worth 1 point each.
● Comparison and justification are worth 2 points.
(d) (10 points) Compilers versus hardware implementations
If we want to run the -O0 codes compiled in (c) on a faster processor to achieve the same speedup as
running the -O1 codes on the original processor with clock rate of 3 GHz in fast_power_iter() and
fast_power_recur(), what will the clock rates of the faster processor be for fast_power_iter.c and
fast_power_recur.c respectively?
Function
fast_power_iter()
fast_power_recur()
9
179
×3×10 = 12. 78571 𝐺𝐻𝑧
42
Clock Rate
or
9
176
×3×10 = 12. 57143 𝐺𝐻𝑧
42
πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠−𝑂0
𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’−𝑂0, 𝑛𝑒𝑀 π‘π‘Ÿπ‘œπ‘π‘’π‘ π‘ π‘œπ‘Ÿ = 𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’−𝑂1, π‘œπ‘™π‘‘ π‘π‘Ÿπ‘œπ‘π‘’π‘ π‘ π‘œπ‘Ÿ πΆπ‘™π‘œπ‘π‘˜ π‘…π‘Žπ‘‘π‘’
𝑛𝑒𝑀 π‘π‘Ÿπ‘œπ‘π‘’π‘ π‘ π‘œπ‘Ÿ
πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠−𝑂1
= πΆπ‘™π‘œπ‘π‘˜ π‘…π‘Žπ‘‘π‘’
π‘œπ‘™π‘‘ π‘π‘Ÿπ‘œπ‘π‘’π‘ π‘ π‘œπ‘Ÿ
Grading policy:
● Each Clock Rate value is worth 4 points each.
● Correct derivation of the formula is worth 2 points.
9
271
×3×10 = 6. 88983 𝐺𝐻𝑧
118
πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠
πΆπ‘™π‘œπ‘π‘˜ π‘…π‘Žπ‘‘π‘’π‘›π‘’π‘€ π‘π‘Ÿπ‘œπ‘π‘’π‘ π‘ π‘œπ‘Ÿ = πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠−𝑂0 × πΆπ‘™π‘œπ‘π‘˜ π‘…π‘Žπ‘‘π‘’π‘œπ‘™π‘‘ π‘π‘Ÿπ‘œπ‘π‘’π‘ π‘ π‘œπ‘Ÿ
−𝑂1
2. (25 points) Benchmarking
Below is a comparison between three mobile phones and their processors.
Product
Samsung Galaxy S24
Ultra
Apple iPhone 15 Pro
Max
Google Pixel 8 Pro
SoC
Snapdragon 8 Gen 3
Apple A17 Pro
Google Tensor G3
Cores
8 (1+3+2+2)
6 (2+4)
9 (1+4+4)
PDF renderer
227.4 Mpixels/sec
178.5 Mpixels/sec
153 Mpixels/sec
HDR
238.2 Mpixels/sec
232.4 Mpixels/sec
136.9 Mpixels/sec
Background blur
26.7 images/sec
27.9 images/sec
15 images/sec
Photo processing
64.9 images/sec
79.1 images/sec
47 images/sec
Ray tracing
7.38 Mpixels/sec
7.58 Mpixels/sec
4.52 Mpixels/sec
The information above is provided by https://nanoreview.net/en/soc-list/rating.
(a) (5 points) Follow the link https://nanoreview.net/en/soc-list/rating and fill in the table below.
Core name
Snapdragon
8
Gen 3
Apple A17
Pro
Google
Tensor G3
Pro
Peak frequency of the most performant block of cores (MHz)
One core
Three cores
Two cores
Two cores
Cortex-X4
Cortex-A720
Cortex-A720
Cortex-A520
3300 MHz
3150 MHz
2960 MHz
2260 MHz
Two cores
Four cores
Everest
Sawtooth
3780 MHz
2110 MHz
One core
Four cores
Four cores
Cortex-X3
Cortex-A715
Cortex-A510
2910 MHz
2370 MHz
1700 MHz
Note: 1 GHz = 1000 MHz
References: Snapdragon 8 Gen 3, Apple A17 Pro, and Google Tensor G3.
Grading policy:
● Each incorrect value is -1 point.
● A minimum of zero points is given.
(b) (10 points) Suppose that we run three computer graphics and multimedia programs on all three
smartphones:
Program A: Renders 114,000,000 pixels when viewing HW1.pdf.
Program B: Blurs the background of 2,000 images in the image gallery.
Program C: Processes 4,000 images in the image gallery.
For simplicity, we assume that the program only runs on a single core. The Samsung Galaxy S24
Ultra uses Cortex-X4, the Apple iPhone 15 Pro Max uses Everest, and the Google Pixel 8 Pro uses
Cortex-X3. Furthermore, there is no other overhead. We are interested in the execution time (in
seconds) and the clock cycles (in millions) of each smartphone. Use the provided information in the
table and your answer in (a) to complete the table below.
Program A
Smartphon
e
Seconds
Samsung
(Cortex-X4
)
Apple
(Everest)
6
114*10
6
227.4*10
6
178.5*10
Google
(Cortex-X3
)
=0
6
114*10
6
153*10
Clock Cycles
114
* 3300 =
= 0 227.4
6
114*10
Program B
= 0.
114
178.5
* 3780
=2414.118
114
153
* 2910
=2168.235
Seconds
Program C
Clock Cycles
Seconds
Clock Cycles
2000
26.7
= 74. 906
2000
26.7
* 3300 = 2 64.9 = 61. 633
4000
4000
64.9
* 3300 = 2
2000
27.9
= 71. 685
2000
27.9
* 3780 = 2 79.1 = 50. 569
4000
4000
79.1
* 3780 = 1
2000
15
= 133. 333
2000
15
* 2910 = 3
4000
47
4000
47
* 2910 = 2
= 85. 106
𝑂𝑏𝑗𝑒𝑐𝑑𝑠
𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’ = π‘ƒπ‘Ÿπ‘œπ‘π‘’π‘ π‘ π‘–π‘›π‘” 𝑆𝑝𝑒𝑒𝑑
πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠 = 𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’× πΆπ‘™π‘œπ‘π‘˜ π‘…π‘Žπ‘‘π‘’ =
𝑂𝑏𝑗𝑒𝑐𝑑𝑠
π‘ƒπ‘Ÿπ‘œπ‘π‘’π‘ π‘ π‘–π‘›π‘” 𝑆𝑝𝑒𝑒𝑑
× πΆπ‘™π‘œπ‘π‘˜ π‘…π‘Žπ‘‘π‘’
Note: Some precision errors may exist if the rounded results of seconds are directly multiplied with
the peak frequency of the cores.
Grading policy:
● Each incorrect second and clock cycle value is worth 0.5 points each.
● Correct derivation of the formula is worth 1 point.
(c) (10 points) We are interested in comparing the performances of the three smartphones. Calculate the
relative performance of the three smartphones, with each phone as the reference for comparison. Use
your answer in (b) to complete the table below and summarize the performance results by calculating
the geometric mean of the performance ratio of the three benchmark programs (Program A,
Program B, and Program C). Hint: you might only need to compute some of the six values from
scratch.
Performance Ratio
Reference
Samsung Galaxy S24
Ultra
Samsung Galaxy S24
Ultra
1
Apple iPhone 15 Pro
Max
1
0.99990
= 1. 00010
Google Pixel 8 Pro
1
0.64930
= 1. 54012
Apple iPhone 15 Pro
Max
3
178.5
227.4
27.9
79.1
× 26.7 × 64.9
1
1
0.64936
= 1. 53997
Google Pixel 8 Pro
3
153
227.4
× 26.7 × 64.9 =
15
3
153
178.5
× 27.9 × 79.1
15
47
47
1
where πΊπ‘’π‘œπ‘šπ‘’π‘‘π‘Ÿπ‘–π‘ π‘€π‘’π‘Žπ‘›π΄, 𝐡 is the geometric mean of the performance ratio of machine A with
machine B as reference.
Hint explanation: Observe that the geometric mean of A with reference B is the reciprocal of the
geometric mean of B with reference A. Thus, we only need to compute three values and take their
reciprocals.
Note: Some precision errors may exist if the rounded results of seconds are used to compute the
geometric means of performance ratio, but the ranking of the three smartphones should remain the
same.
Grading policy:
● Overall ranking is worth 5 points.
● Each incorrect geometric mean value is -1 point.
● A minimum of zero points is given.
3. (10 points) Performance and Speedup
Assume that a program requires the execution of 100 × 106 FP instructions, 140 × 106 INT instructions,
110 × 106 L/S instructions, and 55 × 106 branch instructions. The CPI for each type of instruction is 3, 2,
5, and 3, respectively. Assume that the processor has a 5 GHz clock rate.
(a) (5 points) By how much must we improve the CPI of INT instructions if we want the program to run
two times faster? Please show the calculation procedure.
πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠 = 𝐢𝑃𝐼𝐹𝑃 × #𝐹𝑃 + 𝐢𝑃𝐼𝐼𝑁𝑇 × #𝐼𝑁𝑇 + 𝐢𝑃𝐼𝐿/𝑆 × #𝐿/𝑆 + πΆπ‘ƒπΌπ΅π‘Ÿπ‘Žπ‘›π‘β„Ž × #π΅π‘Ÿπ‘Žπ‘›π‘β„Ž = (3×100 + 2×140 +
To achieve two times speedup, we must have:
𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’π‘›π‘’π‘€
𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’π‘œπ‘™π‘‘
πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠
1
= πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠𝑛𝑒𝑀 = 2
π‘œπ‘™π‘‘
To have the number of clock cycles by improving the CPI of INT instructions:
πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠𝑛𝑒𝑀
πΆπ‘™π‘œπ‘π‘˜ πΆπ‘¦π‘π‘™π‘’π‘ π‘œπ‘™π‘‘
1
= 2
𝑛𝑒𝑀
𝐢𝑃𝐼𝐹𝑃×#𝐹𝑃+𝐢𝑃𝐼𝐼𝑁𝑇 ×#𝐼𝑁𝑇+𝐢𝑃𝐼𝐿/𝑆×#𝐿/𝑆+πΆπ‘ƒπΌπ΅π‘Ÿπ‘Žπ‘›π‘β„Ž×#π΅π‘Ÿπ‘Žπ‘›π‘β„Ž
πΆπ‘™π‘œπ‘π‘˜ πΆπ‘¦π‘π‘™π‘’π‘ π‘œπ‘™π‘‘
𝑛𝑒𝑀
𝐢𝑃𝐼𝐼𝑁𝑇 =
πΆπ‘™π‘œπ‘π‘˜ πΆπ‘¦π‘π‘™π‘’π‘ π‘œπ‘™π‘‘
2
(
)
− 𝐢𝑃𝐼𝐹𝑃×#𝐹𝑃+𝐢𝑃𝐼𝐿/𝑆×#𝐿/𝑆+πΆπ‘ƒπΌπ΅π‘Ÿπ‘Žπ‘›π‘β„Ž×#π΅π‘Ÿπ‘Žπ‘›π‘β„Ž
#𝐼𝑁𝑇
1
= 2
=
647.5−1015
140
< 0
Therefore, it is impossible to improve the CPI of INT instructions if we want the program to run
two times faster.
Grading policy:
● Correct clock cycles is worth 2 points.
● Correct CPI is worth 2 points.
● Correct answer is worth 1 point.
(b) (5 points) By how much is the execution time of the program improved if the CPI of FP instructions
is reduced by 28%, the CPI of INT instructions is reduced by 32% and the CPI of L/S instructions is
reduced by 61% and the CPI of branch instructions is reduced by 64%? Please show the calculation
procedure
𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’π‘›π‘’π‘€
𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’π‘œπ‘™π‘‘
πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠
= πΆπ‘™π‘œπ‘π‘˜ 𝐢𝑦𝑐𝑙𝑒𝑠𝑛𝑒𝑀 =
π‘œπ‘™π‘‘
3×100×0.72+2×140×0.68+5×110×0.39+3×55×0.36
3×100+2×140+5×110+3×55
The execution time of the program is reduced by 0.47%.
Grading policy:
● Correct formula is worth 2 points.
● Correct calculation is worth 1 point.
● Correct answer is worth 2 point.
= 0. 53
4. (10 points) Amdahl’s Law and the Eight Great Ideas of Computer Architecture
One of the great ideas of computer architecture is parallelization. Amdahl’s law can be used to calculate
the overall speedup of parallel executions. Amdahl's Law is defined as follow:
π‘†π‘™π‘Žπ‘‘π‘’π‘›π‘π‘¦ =
1
𝑝
(1−𝑝)+ 𝑠
where
π‘†π‘™π‘Žπ‘‘π‘’π‘›π‘π‘¦ : the theoretical speedup of the execution of the whole task,
: the speedup of the part of the task that benefits from improved system resources,
: the proportion of the execution time that the part benefiting from improved resources originally
occupied.
The ideal speedup of a parallelized program is the number of processors used. However, the theoretical
speedups have limitations by the percentage of the application that cannot be parallelized, which
includes the communication costs. The problem is that the communication costs are not fixed but often
vary based on the number of processors used. In the following, let us consider the communication costs
separately from the non-parallelizable execution of the program.
(a) (5 points) Suppose we have a method to parallelize the fast_power_iter() function in (1) using an
arbitrary number of processors. Moreover, the execution time 𝑇 on one processor is the result
obtained in (1)(b). Compute the parallel execution time of fast_power_iter() on 2, 4, 8 processors
assuming 75% of the function is parallelizable and there is no communication cost.
Number of Processors
Parallel Execution Time
𝑠
𝑝
((1 − 0. 75) + )×14×10 = 8. 75 𝑛𝑠
((1 − 0. 75) + )×14×10 = 6. 125 𝑛𝑠
((1 − 0. 75) + )×14×10 = 4. 8125 𝑛𝑠
−9
0.75
2
0.75
4
0.75
8
2
4
8
π‘†π‘™π‘Žπ‘‘π‘’π‘›π‘π‘¦ =
𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’π‘’π‘›π‘–π‘π‘Ÿπ‘œπ‘π‘’π‘ π‘ π‘œπ‘Ÿ
𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’π‘π‘Žπ‘Ÿπ‘Žπ‘™π‘™π‘’π‘™
(
𝑝
)
=
−9
−9
1
𝑝
(1−𝑝)+ 𝑠
𝐸π‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’π‘π‘Žπ‘Ÿπ‘Žπ‘™π‘™π‘’π‘™ = (1 − 𝑝) + 𝑠 × πΈπ‘₯π‘’π‘π‘’π‘‘π‘–π‘œπ‘› π‘‡π‘–π‘šπ‘’π‘’π‘›π‘–π‘π‘Ÿπ‘œπ‘π‘’π‘ π‘ π‘œπ‘Ÿ
Grading policy:
● Each Parallel Execution Time value is 1.5 points each.
● The partially correct derivation of the formula is 0.5 points.
(b) (5 points) Assuming the communication costs are 4% of the original execution time regardless of the
number of cores, what is the speedup with 8 cores when 75% of the program is parallelizable?
Since the communication costs are now considered, the communication cost has to be added in the
Amdahl’s Law. Communication cost is 0.04.
π‘†π‘™π‘Žπ‘‘π‘’π‘›π‘π‘¦ =
1
(1−0.75)+0.04+
0.75
8
≈ 2. 606
The speedup with 8 cores when 75% of the program is parallelizable with 4% communication cost is
2.606.
Grading policy:
● Correct answer +5
● Minor error : -1
● Wrong answer : Only calculate execution time (if calculation is correct) +1
● Wrong answer : but partially correct formula +1
(c) (5 points) Assuming the communication costs are increased by 2% of the original execution time
every time the number of processors are doubled, what is the speedup with n cores when 75% of the
program is parallelizable? Furthermore, what is the specific speedup value when n = 8 in this
scenario?
The speedup with n cores when 75% of the program is parallelizable with 2% communication cost
for every doubled core is given by
π‘†π‘™π‘Žπ‘‘π‘’π‘›π‘π‘¦ =
1
0.75
𝑛
(1−0.75)+0.02×π‘™π‘œπ‘”2(𝑛) +
Moreover,
π‘†π‘™π‘Žπ‘‘π‘’π‘›π‘π‘¦ =
1
≈ 2. 477
0.75
8
(1−0.75)+0.02×π‘™π‘œπ‘”2(8) +
The speedup with 8 cores when 75% of the program is parallelizable with 2% communication cost
for every doubled core is 2.477.
Grading policy:
● Correct formula for speedup and correct calculation when n = 8 +5
● Minor error -1
● Wrong formula : Formula and calculation time for execution time (given it’s correct) +1
5. (10 points) Integrated Circuit Cost and Manufacturing
Assume that a 50mm diameter-wafer has a cost of $9 and contains 95 dies. The yield for this wafer is
90%.
(a) (4 points) Find the defects per area for this wafer using the Equation on Page 28 of the textbook (or
Page 45 of the slide of Computer Abstractions and Technology).
2
π‘Šπ‘Žπ‘“π‘’π‘Ÿ π΄π‘Ÿπ‘’π‘Ž
𝐷𝑖𝑒 π΄π‘Ÿπ‘’π‘Ž = 𝐷𝑖𝑒𝑠 π‘π‘’π‘Ÿ π‘Šπ‘Žπ‘“π‘’π‘Ÿ =
1
π‘Œπ‘–π‘’π‘™π‘‘
−1
π×25
95
1
−1
= 20. 67 π‘šπ‘š
2
2
0.9
𝐷𝑒𝑓𝑒𝑐𝑑𝑠 π‘π‘’π‘Ÿ π΄π‘Ÿπ‘’π‘Ž = 2× π·π‘–π‘’ π΄π‘Ÿπ‘’π‘Ž = 10.335
= 0. 005 𝑑𝑒𝑓𝑒𝑐𝑑𝑠/π‘šπ‘š
The defects per area for this wafer is 0.005 defects/mm2.
Grading policy:
● Each correct formula is worth 1 point.
● Each correct answer is worth 1 point.
● No unit is -0.5 point.
(b) (2 points) Find the cost per die for this wafer.
πΆπ‘œπ‘ π‘‘ π‘π‘’π‘Ÿ π‘Šπ‘Žπ‘“π‘’π‘Ÿ
$9
πΆπ‘œπ‘ π‘‘ π‘π‘’π‘Ÿ 𝐷𝑖𝑒 = 𝐷𝑖𝑒𝑠 π‘π‘’π‘Ÿ π‘Šπ‘Žπ‘“π‘’π‘Ÿ×π‘Œπ‘–π‘’π‘™π‘‘ = 95×0.9 = $0. 1053
The cost per die for this wafer is $0.1053.
Grading policy:
● Correct formula is worth 1 point.
● Correct answer is worth 1 point.
(c) (4 points) If the number of dies per wafer is increased by 10% and the defects per area unit increases
by 25%, find the new die area and new yield.
2
π‘Šπ‘Žπ‘“π‘’π‘Ÿ π΄π‘Ÿπ‘’π‘Ž
π×25
𝐷𝑖𝑒 π΄π‘Ÿπ‘’π‘Ž = 𝐷𝑖𝑒𝑠 π‘π‘’π‘Ÿ π‘Šπ‘Žπ‘“π‘’π‘Ÿ = 95×1.1 = 18. 79 π‘šπ‘š
π‘Œπ‘–π‘’π‘™π‘‘ =
1
( (
𝐷𝑖𝑒 π΄π‘Ÿπ‘’π‘Ž
1+ 𝐷𝑒𝑓𝑒𝑐𝑑𝑠 π‘π‘’π‘Ÿ π΄π‘Ÿπ‘’π‘Ž× 2
2
))
=
1
(1+(0.005×1.25×
The new die area is 18.79 mm2 and the new yield is 0.8922.
Grading policy:
● Each correct formula is worth 1 point.
● Each correct answer is worth 1 point.
● No unit is -0.5 point.
18.79
2
2
))
2
= 0. 8922
Download