EE-748: Advanced Topics in Computer Architecture

 

Semester: Jul - Nov 2014

 

Instructor: Virendra Singh

 

Class Timings:

 

Office Hours:

 

Syllabus:

 

Overview Superscalar and VLIW architectures. Limits of instruction level parallelism (ILP). Simultaneous multi-threaded (SMT) architecture, Performance enhancement through branch prediction and value prediction, BulkSMT, Thread level speculation. Run ahead execution, proactive instruction fetch, multi-core architectures, data marshaling for multi-core architectures, power constrained CMPs, heterogeneous core design, Core Fusion, Transactional memories. Performance evaluation of complex microarchitectures. On-chip interconnects (Network-on-Chip). Architectural vulnerabilities and reliable architectures. Patchable design. Secure architectures. Energy efficient architectures. Power management. Cache design, energy efficient cache partitioning, fast thread migration, thread throatling.

 

 

References:

 

Current Literature (Papers from ISCA, Micro, HPCA, ICCD, DSN, and IEEE Trans. on Computers, IEEE Computer Architecture Letters)

 

Must to read papers (before coming to the first class)

 

  1. Shekhar Borkar and Andrew Chien, `The future of microprocessors`, Communications of ACM, vol. 54, no. 5, May 2011
  2. Tilak Agerwala and S. Chatterjee, `Computer architecture: challenges and opportunities for the next the decade`, IEEE Micro, May-June 2005
  3. Mark Hill and Michael Marty, `Amdahl`s law in the multi-core era`, IEEE Computers, July 2008
  4. Dong Hyuk Woo et al., `Extending Amdahl`s Law for Energy Efficient Computing in the Many Core Era`, IEEE Computers, Dec 2008
  5. 21st Century Computer Architecture, A community white paper, computing community consortium, May 2012

 

List of representative papers (To be discussed in the class)

 

  1. Xian He-Sun et al., `Re-evaluating Amdahl`s Law in the Multicore Era`, Journal of Parallel Dist. Computing 2010
  2. Pramod Subramanyan et al., `Energy-Efficient Fault Tolerance in Chip Multiprocessors using Critical Value Forwarding`, Proc. of DSN 2010
  3. Pramod Subramanyan et al., `Multiplexed Redundant Execution (MRE): A Technique for Efficient Fault Tolerance in Chip Multiprocessors`, Proc. of DATE 2010
  4. Amin Ansari et al., `Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand`, Proc. of HPCA 2013
  5. Engin Ipek et al., `Core Fusion: Accommodating Software Diversity in Chip Multiprocessors`, Proc. of ISCA 2007
  6. Andrew Lukefahr et al. `Composite Cores: Pushing Heterogeneity into a Core`, Proc. of Micro 2012
  7. Khubaib et al., `Morph Core: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP`, 
Proc. of Micro 2012
  8. Pramod Subramanyan et al., `Adaptive Execution Assistance for Multiplexed Fault-Tolerant Chip Multiprocessors`, Proc. of ICCD 2011
  9. D. Palframan et al., `Precision aware soft error protection for GPU`, Proc. of HPCA 2014.
  10. Pejman Lotfi-Kamran et al., `Scale-Out Processors`, Proc. of ISCA 2012
  11. K.V. Craeynest et al., Scheduling Heterogeneous Multi-Cores through Performance Impact Estimation (PIE), Proc. of ISCA 2012
  12. A. Nair et al, `A First-Order Mechanistic Model for Architectural Vulnerability Factor`, Proc. of ISCA 2012
  13. Rami Sheikh et al., `Control-Flow Decoupling `, Proc. of Micro 2012
  14. Akbar Sharifi et al., `Addressing End-to-End Memory Access Latency in NoC-Based Multicores`
, Proc. of Micro 2012
  15. Xuehai Qian et al., `BulkSMT: Designing SMT Processors for Atomic-Block Execution`, Proc. of HPCA 2012
  16. Shekhar Srikantaiah et al., `MorphCache: A Reconfigurable Adaptive Multi-level Cache Hierarchy`, Proc. of HPCA 2011
  17. Michael Ferdman et al., `Proactive Instruction Fetch`, Proc. of Micro 2012
  18. Mark Gebhart et al, `Energy-efficient mechanisms for managing thread context in throughput processors`, Proc. of ISCA 2011
  19. Omid Azizi et al, `Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis`, Proc. of ISCA 2010
  20. M.A. Suleman et al., `Data marshaling for multi-core architectures`, Proc. of ISCA 2010
  21. Indrani Paul et al., `Cooperative boosting: Needy versus greedy power management, Proc. of ISCA 2013

22.  A. Tumeo et al., `Designing next-generation massively multithreaded architectures for irregular application`, IEEE Computers, Aug 2012

23.  Emily Blem et al. `Power struggles: Revisiting the RISC vs CISC debate on contemporary ARM and x86 architectures`, Proc. of HPCA 2013

24.  Manish Arora et al, `Redefining the role of CPU in the era of CPU-GPU integration`, IEEE Micro magazine 2012

25.  Onur Kayiran et al., `Neither more nor less: Optimizing thread level parallelism for GPGPUs`, Proc. of PACT 2013

26.  Adwait Jog et al., `Orchestrated scheduling and prefetching for GPGPUs`, Proc. of ISCA 2013

27.  Ankita Sethia et al., `A customized processor for energy efficient scientific computing`, IEEE Tran. On Computers, 2012

28.  D. Lustig and M. Martonosi, `Reducing GPU offload latency via fine grained CPU-GPU synchronization`, Proc. of HPCA 2013

29.  Y. Park et al., `Libra: Tailoring SIMD execution using heterogeneous hardware and dynamic configurability`, Proc. of Micro 2012

30.  Jason Power et al., `Heterogeneous system coherence for integrated CPU-GPU systems`, IEEE Micro 2013

31.  D. Zier and B. Len, `Performance evaluation of dynamic speculative multithreading with Cascadia architecture`, IEEE Trans on PDS 2010

32.  S. Eyerman and L. Eeckhout, `The benefit of SMT in multi-core era: Flexibility towards degree of thread level parallelism`, Proc. of ASPLOS 2014.

33.  M. Hayenga et al., `Revolver: Processor architecture for power efficient loop execution`, Proc. of HPCA 2014.

34.  A. Basu et al., Efficient virtual memory for big memory servers`, Proc. of ISCA 2013

35.  S. Pai et al, `Improving GPGPU concurrency with elastic kernel`, Proc. of ASPLOS 2013

36.  J. Lee et al., `CPU-GPU collaboration for data parallel kernels on heterogeneous systems`, Proc. of PACT 2013

 

Course Introduction

 

Pre-requisite: CS-683: Advanced Computer Architecture, OR, EE-739: Processor Design. Instructor`s consent is mandatory.

 

Evaluation: Midsem exam (10%), Quizzes (10%), Assignment (10%), Endsem exam (15%), Course project (30%), and Class participation (25%)