EE-748: Advanced Topics in Computer Architecture

EE-748: Advanced Topics in Computer Architecture

Semester: Jan - Apr 2014

Class Timings:

Office Hours:

Syllabus:

Overview Superscalar and VLIW architectures. Limits of instruction level parallelism (ILP). Simultaneous multi-threaded (SMT) architecture, Performance enhancement through branch prediction and value prediction, BulkSMT, Thread level speculation. Run ahead execution, proactive instruction fetch, multi-core architectures, data marshaling for multi-core architectures, power constrained CMPs, heterogeneous core design, Core Fusion, Transactional memories. Performance evaluation of complex microarchitectures. On-chip interconnects (Network-on-Chip). Architectural vulnerabilities and reliable architectures. Patchable design. Secure architectures. Energy efficient architectures. Power management. Cache design, energy efficient cache partitioning, fast thread migration, thread throatling.

References:

Current Literature (Papers from ISCA, Micro, HPCA, ICCD, DSN, and IEEE Trans. on Computers, IEEE Computer Architecture Letters)

Must to read papers (before coming to the first class)

Shekhar Borkar and Andrew Chien, `The future of microprocessors`, Communications of ACM, vol. 54, no. 5, May 2011
Tilak Agerwala and S. Chatterjee, `Computer architecture: challenges and opportunities for the next the decade`, IEEE Micro, May-June 2005
Mark Hill and Michael Marty, `Amdahl`s law in the multi-core era`, IEEE Computers, July 2008
Dong Hyuk Woo et al., `Extending Amdahl`s Law for Energy Efficient Computing in the Many Core Era`, IEEE Computers, Dec 2008
21^st Century Computer Architecture, A community white paper, computing community consortium, May 2012

List of representative papers (To be discussed in the class)

Xian He-Sun et al., `Re-evaluating Amdahl`s Law in the Multicore Era`, Journal of Parallel Dist. Computing 2010
Pramod Subramanyan et al., `Energy-Efficient Fault Tolerance in Chip Multiprocessors using Critical Value Forwarding`, Proc. of DSN 2010
Pramod Subramanyan et al., `Multiplexed Redundant Execution (MRE): A Technique for Efficient Fault Tolerance in Chip Multiprocessors`, Proc. of DATE 2010
Amin Ansari et al., `Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand`, Proc. of HPCA 2013
Engin Ipek et al., `Core Fusion: Accommodating Software Diversity in Chip Multiprocessors`, Proc. of ISCA 2007
Andrew Lukefahr et al. `Composite Cores: Pushing Heterogeneity into a Core`, Proc. of Micro 2012
Khubaib et al., `Morph Core: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP`,  Proc. of Micro 2012
Pramod Subramanyan et al., `Adaptive Execution Assistance for Multiplexed Fault-Tolerant Chip Multiprocessors`, Proc. of ICCD 2011
Pejman Lotfi-Kamran et al., `Scale-Out Processors`, Proc. of ISCA 2012
K.V. Craeynest et al., Scheduling Heterogeneous Multi-Cores through Performance Impact Estimation (PIE), Proc. of ISCA 2012
A. Nair et al, `A First-Order Mechanistic Model for Architectural Vulnerability Factor`, Proc. of ISCA 2012
Rami Sheikh et al., `Control-Flow Decoupling `, Proc. of Micro 2012
Akbar Sharifi et al., `Addressing End-to-End Memory Access Latency in NoC-Based Multicores` , Proc. of Micro 2012
Xuehai Qian et al., `BulkSMT: Designing SMT Processors for Atomic-Block Execution`, Proc. of HPCA 2012
Shekhar Srikantaiah et al., `MorphCache: A Reconfigurable Adaptive Multi-level Cache Hierarchy`, Proc. of HPCA 2011
Michael Ferdman et al., `Proactive Instruction Fetch`, Proc. of Micro 2012
Mark Gebhart et al, `Energy-efficient mechanisms for managing thread context in throughput processors`, Proc. of ISCA 2011
Omid Azizi et al, `Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis`, Proc. of ISCA 2010
M.A. Suleman et al., `Data marshaling for multi-core architectures`, Proc. of ISCA 2010
Indrani Paul et al., `Cooperative boosting: Needy versus greedy power management, Proc. of ISCA 2013

21. A. Tumeo et al., `Designing next-generation massively multithreaded architectures for irregular application`, IEEE Computers, Aug 2012

22. Emily Blem et al. `Power struggles: Revisiting the RISC vs CISC debate on contemporary ARM and x86 architectures`, Proc. of HPCA 2013

23. Manish Arora et al, `Redefining the role of CPU in the era of CPU-GPU integration`, IEEE Micro magazine 2012

24. Onur Kayiran et al., `Neither more nor less: Optimizing thread level parallelism for GPGPUs`, Proc. of PACT 2013

25. Adwait Jog et al., `Orchestrated scheduling and prefetching for GPGPUs`, Proc. of ISCA 2013

26. Ankita Sethia et al., `A customized processor for energy efficient scientific computing`, IEEE Tran. On Computers, 2012

27. D. Lustig and M. Martonosi, `Reducing GPU offload latency via fine grained CPU-GPU synchronization`, Proc. of HPCA 2013

28. Y. Park et al., `Libra: Tailoring SIMD execution using heterogeneous hardware and dynamic configurability`, Proc. of Micro 2012

29. Jason Power et al., `Heterogeneous system coherence for integrated CPU-GPU systems`, IEEE Micro 2013

30. D. Zier and B. Len, `Performance evaluation of dynamic speculative multithreading with Cascadia architecture`, IEEE Trans on PDS 2010

31. A. Basu et al., Efficient virtual memory for big memory servers`, Proc. of ISCA 2013

32. S. Pai et al, `Improving GPGPU concurrency with elastic kernel`, Proc. of ASPLOS 2013

33. J. Lee et al., `CPU-GPU collaboration for data parallel kernels on heterogeneous systems`, Proc. of PACT 2013

Course Introduction

Pre-requisite: CS-683: Advanced Computer Architecture, OR, EE-739: Processor Design. Instructor`s consent is mandatory.

Evaluation: Mid sem exam (10%), Quizzes (5%), End sem exam (15%), Course project (35%), and Class participation (35%)

Class Schedule:

13 Jan: Course Introduction

15 Jan: High Performance computer Architecture Review - 1

22 Jan: High Performance computer Architecture Review – 2

24 Jan: Future of microprocessor (Paper: The future of microprocessors, Communications of ACM 2011)