記錄 編號 10783 狀態 G0496516011 助教 查核 建檔完成 索書 號

advertisement
記錄
10783
編號
狀態 G0496516011
助教
建檔完成
查核
索書
查核完成
號
學校
輔仁大學
名稱
系所
資訊工程學系
名稱
舊系
所名
稱
學號 496516011
研究
生 鍾華元
(中)
研究
生 Hua-Yuan Chung
(英)
論文
名稱 用 VHDL 實做出已排班的資料流架構和暫存器內文
(中)
論文
VHDL Implementation of Scheduled Dataflow Architecture and the Register
名稱
Context
(英)
其他
題名
指導
教授 周賜福
(中)
指導
教授 Joseph M. Arul
(英)
校內
全文 2012.8.31
開放
日期
校外
全文 2012.8.31
開放
日期
全文
不開
放理
由
電子
全文
同意
送交
國
圖.
國圖
全文
2012.8.31
開放
日
期.
檔案
封面 摘要 謝辭 目次 第一章 第二章 第三章 第四章 第五章 參考書目
說明
電子
01 02 03 04 05 06 07 08 09 10
全文
學位
碩士
類別
畢業
學年 98
度
出版
99
年
語文
英文
別
關鍵
字 非阻斷 多緒執行 排班 資料流架構
(中)
關鍵
字 Nonblocking Multi-threaded Scheduled Dataflow Architecture
(英)
自從微處理器從 1970 年開始發展,業界的 CPU 效能的改進大多是從 ILP
來著手。到了大約 2000 年,ILP 的發展似乎到了一個瓶頸,並且因為功
(中)
率消耗和 CPU 易過熱的考量使得 CPU 發展重點從 ILP 改到了 TLP 並設法
摘要
的有效使用多個處理器。然而目前的 CPU 的設計還是靠著複雜的硬體來
偵測 RAW 危障,此舉導致了 CPU 耗電量增大並且使 CPU 的設計更加複
雜。 在這篇論文中,我們提出了一個全然不同的架構和方法來解決 RAW
危障。透過使用資料流的概念,我們可以很自然的移除 RAW 危障。除此
之外,這個架構的運作也藉由結合控制流概念和資料流概念來提升 ILP 和
TLP。這個架構也就是已排成的資料流架構(SDF)。SDF 是一個非阻斷多緒
執行分開記憶體存取和資料運算執行的資料流架構。也因著分開記憶體存
取和算數運算,同步處理器(SP)負責資料的記憶體存取而算數處理器(EP)
則負責執行所有的算術運算。 之前的 SDF 是透過 C++和 C 來模擬,然而
為了更精確的模擬到硬體的細節,SDF 在這篇論文是用 VHDL 來實做並
且用 ModelSIM 來模擬。除了模擬之外,我們也用硬體來測試 SDF。另外
在這篇論文中也測試看提升 register context 可以提升多少效能。平常在多
執行緒架構中,執行緒互傳資料可以透過 frame memory。如果一個執行緒
能透過 register context 來傳資料或是運算結果給別的執行緒就可以避免記
憶體的存取。因此效能可以有所提升。為了測試 SDF,我們把 SDF 燒到
DE2 板子上的 CycloneII FPGA 晶片。研究顯示合成 SDF 至少 CycloneII
50% 的資源。 Cyclone II 最多可以合成有四個 register set 的 SDF。 這個研
究分析了每種 Cyclone II 合成 SDF 的狀態。並且發現 SDF 至少要有兩個
register set 才可以使多執行緒的程式同時執行。
Since the invention of microprocessors around 1970, CPU performance
improvement together with the ILP had been the main focus in the computer
industry. Around the year 2000, ILP seemed to have reached a limit, together with
the power consumption and heat dissipation emerged multi-core era. The focus has
shifted from ILP to TLP and efficient use of multi-core processors. However, the
RAW hazard detection technique relies on complex hardware in the current
computers which may cause the designers to make the CPU consume lot of energy
and the design more complex. In this particular research we propose a totally
different architecture and a different way to solve the RAW hazard. By using
摘要 dataflow paradigm, we can naturally eliminate the RAW hazards. Besides, this
architecture comes as a new paradigm to closely link the ILP and TLP by
combining sequential and dataflow paradigm. This is named as Scheduled Dataflow
Architecture (SDF). SDF is a non-blocking multithreaded decoupled dataflow
(英)
architecture, because the main engine relies on dataflow paradigm. Since it is a
decoupled architecture, the synchronization processor is responsible for data access
and the execution processor is responsible for execution of all the instructions.
Previously SDF was simulated in C++ and C languages [19-20]. For more precisely
to imitate the hardware complexity, this simulation uses VHDL to implement SDF
and simulated it by ModelSIM. We have also tested using Altera DE2 hardware.
The main focus of this research is to measure the performance gain having more
register context. When a multithreaded architecture is used, passing of data between
threads can happen through the frame memory. If we use the register context, and
efficiently pass the data to the following threads that need the results of the
previous thread, several memory accesses can be reduced, thus improving the
performance of a program. To test the SDF, we have also used the program into
CycloneII FPGA chip of DE2 board. SDF uses at least 50% of the resource of
CycloneII. CycloneII can synthesis SDF using at most four register sets. We used
for all these synthesis and found that SDF requires at least two register sets to run
multithreaded program concurrently.
摘要 I Abstract III 謝誌 V List of Figures IX List of Tables X Chapter1
Introduction 1 1.1 Introduction 1 1.2 Introduction to FPGA Technology 2 1.3
Motivation 3 1.4 Organization of This Thesis 3 Chapter2 Background and Related
Work 5 2.1 Background 5 2.1.1 Background of Data flow Architecture 5 2.1.2
Background to Decoupled Memory Architecture 7 2.2 Related Work 7 2.3 SDF
Background 11 Chapter 3 Scheduled Dataflow Architecture 12 3.1 The Hardware
Composition of SDF 12 3.1.1 Control Unit 12 3.1.2 Synchronization Pipeline 14
論文 3.1.3 Execution Pipeline 15 3.1.4 Linking of Register Sets to SP and EP 16 3.2
Register to Register Method 17 3.3 Memory Management Unit 19 3.4 Thread
Status 20 3.5 Implementation of Common Application 21 3.5.1 The Detailed
目次
Explanations of RTM Method 22 3.5.2 The Detailed Explanations of RTR Method
24 3.5.3 Branch Implementation in an SDF Thread 25 3.5.4 Loop Implementation
in SDF by RTR and RTM Method 26 Chapter 4 Architecture Implementation
Analysis 28 4.1 SDF Environment 28 4.1.1 Basic Elements of DE2 29 4.1.2 SDF
Processor Logic Elements 29 4.1.3 Memory Bit used by SDF 30 4.2Measure
multithreaded program run on DE2 31 4.2.1 Experiment method 31 4.2.2
Multithread Summation Program. 33 4.3 Conclusion from the Experiments 35
Chapter 5 Conclusion and Future Work 37 5.1 Conclusion 37 5.2 Future Work 38
Reference 39
[1] Richard M. Karp and Raymond E. Miller, “Properties of a Model for Parallel
Computation: Determinacy, Termination, Queueing,” SIAM Journal on Applied
Mathematics, Vol. 14, No. 6, pp. 1390-1411, Nov., 1966. [2] Jack B. Dennis and
David P. Misunas, “A Preliminary Architecture for a Basic Data-Flow
Processor,” ACM SIGARCH Computer Architecture News, Vol. 3, Issue 4, pp.
126-132, 1975. [3] K. Arvind and Rishiyur S. Nikhil, “Executing a Program on
the MIT Tagged-Token Dataflow Architecture,” IEEE Transactions on Computers
參考 Archive, Vol. 39, Issue 3, pp. 300-318, Mar., 1990. [4] Gregory M. Papadopoulos
and David E. Culler, “Monsoon: an explicit token-store architecture,” in
Proceedings of the 17th annual international symposium on Computer Architecture,
文獻 pp. 82-91, Seattle, WA., May, 1990. [5] Mitsuhisa Sato, Yuetsu Kodama, Shuichi
Sakai, Yoshinori Yamaguchi, and Yasuhito Koumura, “Thread-based
programming for the EM-4 hybrid dataflow machine,” ACM SIGARCH
Computer Architecture News, Vol. 20, Issue 2, pp. 224-233, May, 1992. [6] James
E. Smith, “Decoupled access/execute computer architectures”, in Proceedings of
the 9th annual symposium on Computer Architecture, pp. 112-119, Austin, Texas,
United States,Apr., 1982. [7] J. Kreuzinger and T. Ungerer, “Context-switching
techniques for decoupled multithreaded processors,” in Euromicro ’99, pp. 248251, Milan ,Italy, 1999. [8] James E. Smith, G.E. Dermer, B.D. Vanderwarn, S.D.
Klinger, C.M. Rozewski,D.L. Fowler, K.R. Scidmore, and J.P. Laudon “The ZS-1
Central Processor,” ACM SIGOPS Operating Systems Review archive, Vol. 21,
Issue 4, pp. 199-204, Oct., 1987. [9] James E. Smith, Shlomo Weiss, and Nicholas
Y. Pang, “A Simulation Study of Decoupled Architecture Computers,” IEEE
Trans. Computers, Vol. 35, No. 8, pp. 692-702, Aug., 1986. [10] Won W. Ro,
Stephen P. Crago, Alvin M. Despain, and Jean-Luc Gaudiot, “HiDISC: A
Decoupled Architecture for Data-Intensive Applications,” in Proceedings of the
17th International Symposium on Parallel and Distributed Processing, pp. 3.2,Nice,
France, Apr., 2003. [11] Kyriakos Stavrou, Costas Kyriacou, Paraskevas Evripidou,
and Pedro Trancoso “Chip multiprocessor based on data-driven,” International
Journal of High Performance Systems Architecture, Vol. 1, No. 1, pp. 34-43, 2007.
[12] Roberto M. Giorgi, Zdravko Popovic, and Nikola Puzovic,“ DTA-C: A
Decoupled multi-Threaded Architecture for CMP Systems,” in Computer
Architecture and High Performance Computing, SBAC-PAD 2007. 19th
International Symposium, pp. 263-270, Gramado, RS, Brazil, Oct., 2007. [13] John
L. Hennesy and David A. Patterson, “Computer Architecture a Quantitative
Approach 4th,” ELSEVIER, 2006. [14] David M. Harris and Sarah L. Harris,
“Digital design and computer architecture,” ELSEVIER, 2007. [15] Michael
Sung, Ronny Krashinsky, and Krste Asanović , “Multithreading decoupled
architectures for complexity-effective general purpose computing”, ACM
SIGARCH Computer Architecture News, Vol. 29, Issue 5, pp. 56-61, Dec., 2001.
[16] DE2 User Manual, ftp://ftp.altera.com/up/pub/Webdocs/DE2_UserManual.pdf
[17] Cyclone II Device Handbook, Volume 1,
http://www.altera.com/literature/hb/cyc2/cyc2_cii5v1.pdf [18] Krishna M. Kavi,
Roberto M. Giorgi, and, Joseph M. Arul, “Scheduled Dataflow: Execution
Paradigm, Architecture, and Performance Evaluation,” IEEE Trans. On
Computers, Vol. 50, No. 8, pp. 834-846, Aug., 2001. [19] Joseph M. Arul, Tso-Zen
Yeh, Chia-Cheng Hsu, and Jan-Jr Li, “An Efficient Way of Passing of Data in a
Multithreaded Scheduled Dataflow Architecture,” in Proceedings of 8th
International Conference on High-Performance Computing in Asia-Pacific Region,
pp. 487-492, Beijing, China, Dec.,2005.
論文
51
頁數
附註
全文
點閱
次數
資料
建置 2010/8/31
時間
轉檔
2010/09/01
日期
全文 496516011 2010.8.31 16:27 140.136.149.190 new 01 496516011 2010.8.31 16:32
檔存 140.136.149.190 new 01 496516011 2010.8.31 16:32 140.136.149.190 new 02
取記 496516011 2010.8.31 16:32 140.136.149.190 new 03 496516011 2010.8.31 16:51
錄 140.136.149.190 new 04 496516011 2010.8.31 16:52 140.136.149.190 new 05
496516011 2010.8.31 16:52 140.136.149.190 new 06 496516011 2010.8.31 16:52
140.136.149.190 new 07 496516011 2010.8.31 16:52 140.136.149.190 new 08
496516011 2010.8.31 16:54 140.136.149.190 new 09 496516011 2010.8.31 16:54
140.136.149.190 new 10
C 496516011 Y2010.M8.D31 16:57 140.136.149.190 M 496516011
Y2010.M8.D31 16:58 140.136.149.190 M inen3883 Y2010.M8.D31 17:02
異動 140.136.149.190 M inen3883 Y2010.M9.D1 9:11 140.136.148.222 M inen3883
Y2010.M9.D1 9:12 140.136.148.222 M inen3883 Y2010.M9.D1 9:12
記錄
140.136.148.222 M 030540 Y2010.M9.D1 9:30 140.136.209.41 M 030540
Y2010.M9.D1 9:33 140.136.209.41 M 030540 Y2010.M9.D1 9:33 140.136.209.41
I 030540 Y2010.M9.D1 9:35 140.136.209.41
Download