Transistor scaled HPC application performance
MetadataShow full item record
Citation (published version)Appavoo, Jonathan; Schatzberg, Dan. "Transistor Scaled HPC Application Performance", Technical Report BUCS-TR-2012-009, Computer Science Department, Boston University, April 15, 2012. [Available from: http://hdl.handle.net/2144/11397]
We propose a radically new, biologically inspired, model of extreme scale computer on which ap- plication performance automatically scales with the transistor count even in the face of component failures. Today high performance computers are massively parallel systems composed of potentially hundreds of thousands of traditional processor cores, formed from trillions of transistors, consuming megawatts of power. Unfortunately, increasing the number of cores in a system, unlike increasing clock frequencies, does not automatically translate to application level improvements. No general auto-parallelization techniques or tools exist for HPC systems. To obtain application improvements, HPC application programmers must manually cope with the challenge of multicore programming and the significant drop in reliability associated with the sheer number of transistors. Drawing on biological inspiration, the basic premise behind this work is that computation can be dramatically accelerated by integrating a very large-scale, system-wide, predictive associative memory into the operation of the computer. The memory effectively turns computation into a form of pattern recognition and prediction whose result can be used to avoid significant fractions of computation. To be effective the expectation is that the memory will require billions of concurrent devices akin to biological cortical systems, where each device implements a small amount of storage, computation and localized communication. As typified by the recent announcement of the Lyric GP5 Probability Processor, very efficient scalable hardware for pattern recognition and prediction are on the horizon. One class of such devices, called neuromorphic, was pioneered by Carver Mead in the 80’s to provide a path for breaking the power, scaling, and reliability barriers associated with standard digital VLSI tech- nology. Recent neuromorphic research examples include work at Stanford, MIT, and the DARPA Sponsored SyNAPSE Project. These devices operate transistors as unclocked analog devices orga- nized to implement pattern recognition and prediction several orders of magnitude more efficiently than functionally equivalent digital counterparts. Abstractly, the devices can be used to implement modern machine learning or statistical inference. When exposed to data as a time-varying signal, the devices learn and store patterns in the data at multiple time scales and constantly provide predictions about what the signal will do in the future. This kind of function can be seen as a form of predictive associative memory. In this paper we describe our model and initial plans for exploring it.