EE-748:
Advanced Topics in Computer Architecture
Semester: Jul - Nov 2014
Instructor: Virendra
Singh
Class Timings:
Office Hours:
Syllabus:
Overview
Superscalar and VLIW architectures. Limits of instruction level parallelism (ILP). Simultaneous
multi-threaded (SMT) architecture, Performance enhancement through branch
prediction and value prediction, BulkSMT, Thread
level speculation. Run ahead execution, proactive
instruction fetch, multi-core architectures, data marshaling for multi-core
architectures, power constrained CMPs, heterogeneous core design, Core Fusion,
Transactional memories. Performance evaluation of complex
microarchitectures. On-chip interconnects (Network-on-Chip). Architectural vulnerabilities and reliable architectures.
Patchable design. Secure architectures. Energy efficient
architectures. Power management. Cache design, energy
efficient cache partitioning, fast thread migration, thread throatling.
References:
Current Literature (Papers from ISCA, Micro, HPCA, ICCD, DSN, and IEEE
Trans. on Computers, IEEE Computer Architecture Letters)
Must to read
papers (before coming to
the first class)
List of representative papers (To be discussed in the class)
22.
A. Tumeo et al., `Designing
next-generation massively multithreaded architectures for irregular application`,
IEEE Computers, Aug 2012
23. Emily Blem et al. `Power
struggles: Revisiting the RISC vs CISC debate on
contemporary ARM and x86 architectures`, Proc. of HPCA 2013
24. Manish Arora et al, `Redefining
the role of CPU in the era of CPU-GPU integration`, IEEE Micro magazine
2012
25. Onur Kayiran et al., `Neither more nor less:
Optimizing thread level parallelism for GPGPUs`, Proc. of PACT 2013
26. Adwait Jog et
al., `Orchestrated
scheduling and prefetching for GPGPUs`, Proc. of ISCA 2013
27. Ankita Sethia et al., `A
customized processor for energy efficient scientific computing`, IEEE Tran.
On Computers, 2012
28. D. Lustig and M. Martonosi, `Reducing GPU
offload latency via fine grained CPU-GPU synchronization`, Proc. of HPCA
2013
29. Y. Park et
al., `Libra:
Tailoring SIMD execution using heterogeneous hardware and dynamic
configurability`, Proc. of Micro 2012
30. Jason Power
et al., `Heterogeneous
system coherence for integrated CPU-GPU systems`, IEEE Micro 2013
31. D. Zier and B. Len, `Performance
evaluation of dynamic speculative multithreading with Cascadia architecture`,
IEEE Trans on PDS 2010
32. S. Eyerman and L. Eeckhout, `The benefit of SMT in multi-core era: Flexibility towards
degree of thread level parallelism`, Proc. of ASPLOS 2014.
33. M. Hayenga et al., `Revolver: Processor
architecture for power efficient loop execution`, Proc. of HPCA 2014.
34. A. Basu et al., Efficient
virtual memory for big memory servers`, Proc. of ISCA 2013
35. S. Pai et al, `Improving
GPGPU concurrency with elastic kernel`, Proc. of ASPLOS 2013
36.
J. Lee et al., `CPU-GPU
collaboration for data parallel kernels on heterogeneous systems`, Proc. of
PACT 2013
Pre-requisite: CS-683: Advanced Computer
Architecture, OR, EE-739: Processor Design. Instructor`s consent is
mandatory.
Evaluation: Midsem
exam (10%), Quizzes (10%), Assignment (10%), Endsem
exam (15%), Course project (30%), and Class participation (25%)