In this dissertation, we present the VLSI implementation and evaluation of stream processors, which reduce this performance efficiency gap while retaining full programmability. Imagine is the first implementation of a stream processor. It contains 48 32-bit arithmetic units supporting floating-point and integer data-types organized into eight SIMD arithmetic clusters. Imagine executes applications stream programs consisting of a sequence of computation kernels operating on streams of data records. The prototype Imagine processor is a 21-million transistor chip, implemented in a 0.15 micron CMOS process. At 232 MHz, a peak performance of 9.3 GFLOPS is achieved while dissipating 6.4 Watts with a die size measuring 16 mm on a side.
Furthermore, we extend these experimental results from Imagine to stream processors designed in more area- and energy-efficient custom design methodologies and to future VLSI technologies where thousands of arithmetic units on a single chip will be feasible. Two techniques for increasing the number of arithmetic units in a stream processor are presented: intracluster and intercluster scaling. These scaling techniques are shown to provide high performance efficiencies to tens of ALUs per cluster and to hundreds of arithmetic clusters, demonstrating the viability of stream processing for many years to come.
@PhdThesis{Khailany:2003:VLSI, author = "Brucek Khailany", title = "The {VLSI} Implementation and Evaluation of Area- and Energy-Efficient Streaming Media Processors", school = "Stanford University", month = "June", year = "2003", }