A brief overview of present understanding of tradeoff between energy and errors in computing will be presented. Prevailing understanding of a chip’s behavior under large process variations with statistical delay assumptions leads one to conclude that a small number of errors are likely as we progress further down on Moore’s Law. This understanding is challenged by a new hypothesis on the behavior of very large CMOS chips in the presence of process variations. A thought experiment is presented which leads to the new hypothesis. The new hypothesis states that in every large CMOS chip, there exist critical operations points (frequency, voltage) such that it divides the 2-D space (F, V) in to two distinct spaces: 1. Error-free operation and 2. Massive errors (i.e., completely inoperable). Two attempts at disproving this hypothesis with real physical experiments will be described. Some consequences of the hypothesis on energy savings in large data centers are also suggested.
Janak H. Patel is a Research Professor in Coordinated Science Laboratory and Electrical and Computer Engineering dept at Univ. of Illinois at Urbana-Champaign. He received a BS degree in Physics from Gujarat University and BTech in Electrical Engineering from IIT Madras, and an MS and PhD in Electrical Engineering from Stanford University. He is a Fellow of ACM and IEEE and a recipient of the 1998 IEEE Piore Award. He has supervised over 85 MS and PhD theses and published over 200 technical papers. He was a founding technical advisor to Nexgen Microsystems that gave rise to the entire line of microprocessors from AMD. He was a founder of successful startup, Sunrise Test, a CAD company for chip testing, now owned by Synopsys. His research contributions include Pipeline Scheduling, Cache Coherence, Cache Simulation, Interconnection Networks, On-line Error Detection, Reliability analysis of memories with ECC and scrubbing, Design for Testability, Built-In Self-Test, Fault Simulation and Automatic Test Generation.