A Programming System for the Imagine Media Processor

Peter Mattson
Stanford University
Computer Systems Laboratory

Abstract:

Media processing applications, such as image-processing, signal processing, and graphics, motivate new processor architectures that place new burdens on the compiler. These applications demand very high arithmetic rates and data bandwidth, but lack data reuse. The Imagine Media Processor (Imagine) is a new architecture that introduces two innovations to meet these demands. To support a large number of arithmetic logic units (ALUs), it uses multiple register files connected to the ALUs by shared buses and register file ports. Instead of a cache, which relies on data reuse, it uses a large on-chip memory called the stream register file (SRF) to hold sequences of records called streams. The compiler is responsible for allocating the shared interconnect and SRF.

This thesis describes a programming system that enables efficient application development for Imagine and other architectures that include these innovations. It introduces an implementation of the stream programming model. The stream programming model divides a media processing application into a series if kernels, computation-intensive functions that operate on streams, and a stream program that defines the high-level control- and data flow between kernels. This thesis introduces KernelC, a C-like language used to write kernels. It describes a KernelC compiler that uses a new technique called communication scheduling to allocate the shared buses and register file ports and manage data flow between multiple functional units and register files. This thesis introduces a language extension called StreamC used with C++ to write stream programs. It describes a StreamC compiler that uses a new technique called stream scheduling to allocate the stream register file and determine when to load and store streams.

The Imagine programming system has been used to implement sophisticated, high-performance applications for Imagine such as stereo depth extraction, MPEG2 encoding, and polygon rendering. Experimental results for a set of image processing, signal processing, and graphics benchmarks show that communication scheduling delivers schedule lengths for an Imagine Media Processor with multiple partitioned register files and shared interconnect that are within 1% of the same architecture with one multi-ported register file that is estimated to be eight times larger. For these benchmarks, stream scheduling allocates the SRF as well or better than experienced Imagine programmers can by hand using assembly language.

Paper


Peter Mattson