EE482C Project Proposal Chaiyasit Manovit John J Kim Sanjit Zubin Biswas Zi-Bin Yang "Compiling/running stream programs on legacy architectures" In this project we will pick a few representative stream programs and make them run on a chosen legacy architecture with which followings are assumed: - Impulse memory controller with software control interface [http://www.cs.utah.edu/impulse/] - non-blocking L1 & L2 cache (with hardware support for L2 cache prefetch -- we plan to turn L2 into SRFs, and L1 into stream buffers) - scalar operations (ignore MMX/SIMD/VIS ISA extension) Approach -------- 1. We will take source code in Brook, transform it into legacy C with software control interface to Impulse memory controller to do gathering/scattering memory accesses. 2. Then we compile this C into assembly with software prefetching of both L1 & L2 cache overlap with normal computation (by accessing desired data in advance utilizing the non-blocking cache accesses. These read ahead data can be kept/buffered in registers as if they were in stream buffers. But due to the limited number of registers, we may have to throw these data away, and reread them again when we actually need them, but now with guarunteed L1 cache hits.) 3. We will then need to do a good job in scheduling this code to hide as much memory latency as we can, through software pipelining, stream scheduling, etc. We may also consider strip-mining at C level using the knowledge of cache size (but we may just first assume small data set such that everything fits in the L2 cache.) We expect to see the performance improvement due to this explicit organizing of bandwidth hierarchy using the knowledge of memory access pattern from the source language.