記錄 10783 編號 狀態 G0496516011 助教 建檔完成 查核 索書 查核完成 號 學校 輔仁大學 名稱 系所 資訊工程學系 名稱 舊系 所名 稱 學號 496516011 研究 生 鍾華元 (中) 研究 生 Hua-Yuan Chung (英) 論文 名稱 用 VHDL 實做出已排班的資料流架構和暫存器內文 (中) 論文 VHDL Implementation of Scheduled Dataflow Architecture and the Register 名稱 Context (英) 其他 題名 指導 教授 周賜福 (中) 指導 教授 Joseph M. Arul (英) 校內 全文 2012.8.31 開放 日期 校外 全文 2012.8.31 開放 日期 全文 不開 放理 由 電子 全文 同意 送交 國 圖. 國圖 全文 2012.8.31 開放 日 期. 檔案 封面 摘要 謝辭 目次 第一章 第二章 第三章 第四章 第五章 參考書目 說明 電子 01 02 03 04 05 06 07 08 09 10 全文 學位 碩士 類別 畢業 學年 98 度 出版 99 年 語文 英文 別 關鍵 字 非阻斷 多緒執行 排班 資料流架構 (中) 關鍵 字 Nonblocking Multi-threaded Scheduled Dataflow Architecture (英) 自從微處理器從 1970 年開始發展,業界的 CPU 效能的改進大多是從 ILP 來著手。到了大約 2000 年,ILP 的發展似乎到了一個瓶頸,並且因為功 (中) 率消耗和 CPU 易過熱的考量使得 CPU 發展重點從 ILP 改到了 TLP 並設法 摘要 的有效使用多個處理器。然而目前的 CPU 的設計還是靠著複雜的硬體來 偵測 RAW 危障,此舉導致了 CPU 耗電量增大並且使 CPU 的設計更加複 雜。 在這篇論文中,我們提出了一個全然不同的架構和方法來解決 RAW 危障。透過使用資料流的概念,我們可以很自然的移除 RAW 危障。除此 之外,這個架構的運作也藉由結合控制流概念和資料流概念來提升 ILP 和 TLP。這個架構也就是已排成的資料流架構(SDF)。SDF 是一個非阻斷多緒 執行分開記憶體存取和資料運算執行的資料流架構。也因著分開記憶體存 取和算數運算,同步處理器(SP)負責資料的記憶體存取而算數處理器(EP) 則負責執行所有的算術運算。 之前的 SDF 是透過 C++和 C 來模擬,然而 為了更精確的模擬到硬體的細節,SDF 在這篇論文是用 VHDL 來實做並 且用 ModelSIM 來模擬。除了模擬之外,我們也用硬體來測試 SDF。另外 在這篇論文中也測試看提升 register context 可以提升多少效能。平常在多 執行緒架構中,執行緒互傳資料可以透過 frame memory。如果一個執行緒 能透過 register context 來傳資料或是運算結果給別的執行緒就可以避免記 憶體的存取。因此效能可以有所提升。為了測試 SDF,我們把 SDF 燒到 DE2 板子上的 CycloneII FPGA 晶片。研究顯示合成 SDF 至少 CycloneII 50% 的資源。 Cyclone II 最多可以合成有四個 register set 的 SDF。 這個研 究分析了每種 Cyclone II 合成 SDF 的狀態。並且發現 SDF 至少要有兩個 register set 才可以使多執行緒的程式同時執行。 Since the invention of microprocessors around 1970, CPU performance improvement together with the ILP had been the main focus in the computer industry. Around the year 2000, ILP seemed to have reached a limit, together with the power consumption and heat dissipation emerged multi-core era. The focus has shifted from ILP to TLP and efficient use of multi-core processors. However, the RAW hazard detection technique relies on complex hardware in the current computers which may cause the designers to make the CPU consume lot of energy and the design more complex. In this particular research we propose a totally different architecture and a different way to solve the RAW hazard. By using 摘要 dataflow paradigm, we can naturally eliminate the RAW hazards. Besides, this architecture comes as a new paradigm to closely link the ILP and TLP by combining sequential and dataflow paradigm. This is named as Scheduled Dataflow Architecture (SDF). SDF is a non-blocking multithreaded decoupled dataflow (英) architecture, because the main engine relies on dataflow paradigm. Since it is a decoupled architecture, the synchronization processor is responsible for data access and the execution processor is responsible for execution of all the instructions. Previously SDF was simulated in C++ and C languages [19-20]. For more precisely to imitate the hardware complexity, this simulation uses VHDL to implement SDF and simulated it by ModelSIM. We have also tested using Altera DE2 hardware. The main focus of this research is to measure the performance gain having more register context. When a multithreaded architecture is used, passing of data between threads can happen through the frame memory. If we use the register context, and efficiently pass the data to the following threads that need the results of the previous thread, several memory accesses can be reduced, thus improving the performance of a program. To test the SDF, we have also used the program into CycloneII FPGA chip of DE2 board. SDF uses at least 50% of the resource of CycloneII. CycloneII can synthesis SDF using at most four register sets. We used for all these synthesis and found that SDF requires at least two register sets to run multithreaded program concurrently. 摘要 I Abstract III 謝誌 V List of Figures IX List of Tables X Chapter1 Introduction 1 1.1 Introduction 1 1.2 Introduction to FPGA Technology 2 1.3 Motivation 3 1.4 Organization of This Thesis 3 Chapter2 Background and Related Work 5 2.1 Background 5 2.1.1 Background of Data flow Architecture 5 2.1.2 Background to Decoupled Memory Architecture 7 2.2 Related Work 7 2.3 SDF Background 11 Chapter 3 Scheduled Dataflow Architecture 12 3.1 The Hardware Composition of SDF 12 3.1.1 Control Unit 12 3.1.2 Synchronization Pipeline 14 論文 3.1.3 Execution Pipeline 15 3.1.4 Linking of Register Sets to SP and EP 16 3.2 Register to Register Method 17 3.3 Memory Management Unit 19 3.4 Thread Status 20 3.5 Implementation of Common Application 21 3.5.1 The Detailed 目次 Explanations of RTM Method 22 3.5.2 The Detailed Explanations of RTR Method 24 3.5.3 Branch Implementation in an SDF Thread 25 3.5.4 Loop Implementation in SDF by RTR and RTM Method 26 Chapter 4 Architecture Implementation Analysis 28 4.1 SDF Environment 28 4.1.1 Basic Elements of DE2 29 4.1.2 SDF Processor Logic Elements 29 4.1.3 Memory Bit used by SDF 30 4.2Measure multithreaded program run on DE2 31 4.2.1 Experiment method 31 4.2.2 Multithread Summation Program. 33 4.3 Conclusion from the Experiments 35 Chapter 5 Conclusion and Future Work 37 5.1 Conclusion 37 5.2 Future Work 38 Reference 39 [1] Richard M. Karp and Raymond E. Miller, “Properties of a Model for Parallel Computation: Determinacy, Termination, Queueing,” SIAM Journal on Applied Mathematics, Vol. 14, No. 6, pp. 1390-1411, Nov., 1966. [2] Jack B. Dennis and David P. Misunas, “A Preliminary Architecture for a Basic Data-Flow Processor,” ACM SIGARCH Computer Architecture News, Vol. 3, Issue 4, pp. 126-132, 1975. [3] K. Arvind and Rishiyur S. Nikhil, “Executing a Program on the MIT Tagged-Token Dataflow Architecture,” IEEE Transactions on Computers 參考 Archive, Vol. 39, Issue 3, pp. 300-318, Mar., 1990. [4] Gregory M. Papadopoulos and David E. Culler, “Monsoon: an explicit token-store architecture,” in Proceedings of the 17th annual international symposium on Computer Architecture, 文獻 pp. 82-91, Seattle, WA., May, 1990. [5] Mitsuhisa Sato, Yuetsu Kodama, Shuichi Sakai, Yoshinori Yamaguchi, and Yasuhito Koumura, “Thread-based programming for the EM-4 hybrid dataflow machine,” ACM SIGARCH Computer Architecture News, Vol. 20, Issue 2, pp. 224-233, May, 1992. [6] James E. Smith, “Decoupled access/execute computer architectures”, in Proceedings of the 9th annual symposium on Computer Architecture, pp. 112-119, Austin, Texas, United States,Apr., 1982. [7] J. Kreuzinger and T. Ungerer, “Context-switching techniques for decoupled multithreaded processors,” in Euromicro ’99, pp. 248251, Milan ,Italy, 1999. [8] James E. Smith, G.E. Dermer, B.D. Vanderwarn, S.D. Klinger, C.M. Rozewski,D.L. Fowler, K.R. Scidmore, and J.P. Laudon “The ZS-1 Central Processor,” ACM SIGOPS Operating Systems Review archive, Vol. 21, Issue 4, pp. 199-204, Oct., 1987. [9] James E. Smith, Shlomo Weiss, and Nicholas Y. Pang, “A Simulation Study of Decoupled Architecture Computers,” IEEE Trans. Computers, Vol. 35, No. 8, pp. 692-702, Aug., 1986. [10] Won W. Ro, Stephen P. Crago, Alvin M. Despain, and Jean-Luc Gaudiot, “HiDISC: A Decoupled Architecture for Data-Intensive Applications,” in Proceedings of the 17th International Symposium on Parallel and Distributed Processing, pp. 3.2,Nice, France, Apr., 2003. [11] Kyriakos Stavrou, Costas Kyriacou, Paraskevas Evripidou, and Pedro Trancoso “Chip multiprocessor based on data-driven,” International Journal of High Performance Systems Architecture, Vol. 1, No. 1, pp. 34-43, 2007. [12] Roberto M. Giorgi, Zdravko Popovic, and Nikola Puzovic,“ DTA-C: A Decoupled multi-Threaded Architecture for CMP Systems,” in Computer Architecture and High Performance Computing, SBAC-PAD 2007. 19th International Symposium, pp. 263-270, Gramado, RS, Brazil, Oct., 2007. [13] John L. Hennesy and David A. Patterson, “Computer Architecture a Quantitative Approach 4th,” ELSEVIER, 2006. [14] David M. Harris and Sarah L. Harris, “Digital design and computer architecture,” ELSEVIER, 2007. [15] Michael Sung, Ronny Krashinsky, and Krste Asanović , “Multithreading decoupled architectures for complexity-effective general purpose computing”, ACM SIGARCH Computer Architecture News, Vol. 29, Issue 5, pp. 56-61, Dec., 2001. [16] DE2 User Manual, ftp://ftp.altera.com/up/pub/Webdocs/DE2_UserManual.pdf [17] Cyclone II Device Handbook, Volume 1, http://www.altera.com/literature/hb/cyc2/cyc2_cii5v1.pdf [18] Krishna M. Kavi, Roberto M. Giorgi, and, Joseph M. Arul, “Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation,” IEEE Trans. On Computers, Vol. 50, No. 8, pp. 834-846, Aug., 2001. [19] Joseph M. Arul, Tso-Zen Yeh, Chia-Cheng Hsu, and Jan-Jr Li, “An Efficient Way of Passing of Data in a Multithreaded Scheduled Dataflow Architecture,” in Proceedings of 8th International Conference on High-Performance Computing in Asia-Pacific Region, pp. 487-492, Beijing, China, Dec.,2005. 論文 51 頁數 附註 全文 點閱 次數 資料 建置 2010/8/31 時間 轉檔 2010/09/01 日期 全文 496516011 2010.8.31 16:27 140.136.149.190 new 01 496516011 2010.8.31 16:32 檔存 140.136.149.190 new 01 496516011 2010.8.31 16:32 140.136.149.190 new 02 取記 496516011 2010.8.31 16:32 140.136.149.190 new 03 496516011 2010.8.31 16:51 錄 140.136.149.190 new 04 496516011 2010.8.31 16:52 140.136.149.190 new 05 496516011 2010.8.31 16:52 140.136.149.190 new 06 496516011 2010.8.31 16:52 140.136.149.190 new 07 496516011 2010.8.31 16:52 140.136.149.190 new 08 496516011 2010.8.31 16:54 140.136.149.190 new 09 496516011 2010.8.31 16:54 140.136.149.190 new 10 C 496516011 Y2010.M8.D31 16:57 140.136.149.190 M 496516011 Y2010.M8.D31 16:58 140.136.149.190 M inen3883 Y2010.M8.D31 17:02 異動 140.136.149.190 M inen3883 Y2010.M9.D1 9:11 140.136.148.222 M inen3883 Y2010.M9.D1 9:12 140.136.148.222 M inen3883 Y2010.M9.D1 9:12 記錄 140.136.148.222 M 030540 Y2010.M9.D1 9:30 140.136.209.41 M 030540 Y2010.M9.D1 9:33 140.136.209.41 M 030540 Y2010.M9.D1 9:33 140.136.209.41 I 030540 Y2010.M9.D1 9:35 140.136.209.41