ELM Circuits |
|
|
We are exploring the benefits of supplementing a standard cell library with custom circuits. While CAD tools have made great strides in the areas of synthesis, placement, and routing of structures such as our Ensemble Processor (EP), custom circuits provide a flexibility that standard cells cannot. This flexibility includes the ability to change signaling methodologies, use circuit styles other than static CMOS, and design large specialized structures. Since our architecture is tiled, any change we make to the EP will be multiplied across the whole chip. The figure below shows where much of our overhead occurs when compared to an ASIC system for the Advanced Encryption Standard (AES) benchmark. Much of our circuits research is focused on reducing the amount of energy consumed by data/instruction supply and movement. Figure 1: Energy Comparison between ELM and ASIC The goal of this research is to be able to provide a prospective on the benefits of custom circuits. By incrementally replacing standard cell circuits, we will be able to show the areas of most benefited by customization. We feel that we have an advantage over standard cells, since they are often designed with different goals than ours. Three aspects we feel provide overhead in standard cells are:
SpecializationWhen targeting the EP with customized circuits, we have the benefit of specialization. By running benchmarks on the placed and routed EP, we are able to pinpoint which structures and gates in the chip dissipate the most energy. For example, we found that many our flops in the system had inputs and outputs from/into multiplexers , accounting for about 2-5% of our total energy dissipation. By combining this structure into one custom cell, we estimate that this energy can be halved. Pragmatically, standard cell developers cannot do this since the number of cell combinations is exponential. Another example of specialization is the tailoring of the datapath for our specific operand size and throughput requirements. SignalingAssuming a second supply, reducing the amount of voltage needed to communicate a bit gives a quadratic energy reduction. Through use of low swing signaling, we hope to significantly reduce wire energy dissipation while still maintaining high performance. One of the drawbacks of low swing signaling, however, is that it requires a differential wire infrastructure. For long wires with a large amount of capacitance, we found that this overhead is worth the potential energy savings. Signals targeted to become low-swing include the writeback bus and communication buses. MemoriesSince much of our design is focused around dual ported register files, we hope to improve on both the standard cell (flip flops and latches) and memory compiler (SRAM) versions. D-flip flops are not a good option since they consume much more area and energy than SRAM cells. The SRAM compiler, on the other hand, may not be able to tailor a system that interfaces perfectly with our other custom circuits, such as the low swing wires. We will explore a variety of options here since we have potential SRAM structures in our project ranging from 128 bits to kilo-bits. We have explored topics including a low-swing write topology and a hierarchical register file. | |