William J. Dally
Last updated July 27, 2010
Bill Dally is the
Willard R. and Inez Kerr Bell Professor of
Computer Science and
Electrical Engineering
and former Chairman of the
Computer Science Department
at
Stanford University.
He is a member of the
Computer Systems Laboratory, leads
the
Concurrent VLSI Architecture Group,
and teaches courses on Computer Architecture,
Computer Design, and VLSI Design.
He is a
Member of the
National Academy of Engineering ,
a Fellow of the
American Academy of Arts & Sciences , a Fellow of the
IEEE, a Fellow of the
ACM, received the
ACM Maurice Wilkes Award in 2000, the
IEEE Seymour Cray Award in 2004, and the
ACM Eckert Mauchly Award in 2010.
He has an h-index of 60.
Before coming to Stanford, Bill was a Professor in the department of
Electrical Engineering and Computer
Science at MIT .
Current Projects
-
ELM: The Efficient Low-Power Microprocessor
-
We are developing a programmable architecture that is easily programmable in
a high-level language ("C") and at the same time has performance per unit
power competitive with hard-wired logic, and 20-30x better than conventional
embedded RISC processors.
This power savings is achieved by using more efficient mechanisms for
instruction supply, based on compiler managed instruction registers, and
data supply, using a deeper register hierarchy and indexable registers.
-
Enabling Technology for On-Chip Networks
-
As CMPs and SoCs scale to include large numbers of cores and other
modules, the on-chip network or NoC that connects them becomes a critical
systems component.
We are developing enabling technology for on-chip networks including
network topologies, flow control mechanisms, and router organizations.
For example, our flattened butterfly topology offers both lower latency
and substantially reduced power compared to conventional on-chip mesh
or ring networks.
-
Sequoia: Programming the Memory Hierarchy
-
Sequoia is a programming language that is designed to facilitate the
development of memory hierarchy aware parallel programs that remain
portable across modern machines with different memory hierarchy
configurations. Sequoia abstractly exposes hierarchical memory in the
programming model and provides language mechanisms to describe
communication vertically through the machine and to localize
computation to particular memory locations within the machine.
A complete Sequoia programming system has been implemented, including
a compiler and runtime systems for both Cell processors and
distributed memory clusters, that delivers efficient performance
running Sequoia programs on both of these platforms. An alpha version
of this programming system will soon be made public.
-
Scalable Network Fabrics
- We are developing architectures and technologies to enable large,
scalable high-performance interconnection networks to be used in
parallel computers, network switches and routers, and high-performance
I/O systems. Recent results include the development of a hierarchical
network topology that makes efficient use of a combination of
electrical and optical links, a locality-preserving randomized
oblivious routing algorithm, a method for scheduling constrained
crossbar switches, new speculative and reservation-based flow control
methods, and a method for computing the worst-case traffic pattern for
any oblivious routing function.
Recent Projects
-
Streaming Supercomputer
- We are developing a streaming supercomputer (SS) that is
scalable from a single-chip to thousands of chips that we estimate
will achieve an order of magnitude or more improvement in the performance per unit
cost on a wide range of demanding numerical computations compared to
conventional cluster-based supercomputers. The SS uses a combination of
stream processing with a high-performance network to access a globally
shared memory to achieve this goal.
-
Imagine: A High-Performance Image and Signal Processor
- Imagine is a programmable signal and image processor that provides
the performance and performance density of a special-purpose processor.
Imagine achieves a peak performance of 20GFLOPS (single-precision
floating point) and 40GOPS (16-bit fixed point) and sustains over
12GFLOPS and 20GOPS on key signal processing benchmarks. Imagine
sustains a power efficiency of 3.7GFLOPS/W on these same benchmarks, a
factor of 20 better than the most efficient conventional signal
processors.
-
Smart Memories
- We are investigating combined processor/memory architectures that
are best able to exploit 2009 semiconductor technologies. We envision
these architectures being composed of 10s to 100s of processors and
memory banks on a single semiconductor chip. Our research addresses
the design of the processors and memories, the architecture of the
interconnection network that ties them together, and mechanisms to
simplify programming of such machines.
- High-Speed Signalling
- We are developing methods and circuits that stretch the
performance bounds of electrical signalling between chips, boards, and
cabinets in a digital system. A prototype 0.25um 4Gb/s CMOS
transceiver has been developed, dissipating only 130mW, amenable for
large scale integration. Future chips include a a 20Gb/s 0.13um CMOS
transceiver.
- The M-Machine
- Is an experimental parallel computer that demonstrated highly-efficient
mechanisms for parallelism including two-level multithreading, efficient
network interfaces, fast communication and synchronization, and support
for efficient shared memory protocols.
- The Reliable Router
- is a high-performance multicomputer router that demonstrates
new technologies ranging from architecture to circuit design. At
the architecture level the router uses a novel adaptive routing
algorithm, a link-level retry protocol, and a unique token
protocol. Together the two protocols greatly reduce the cost of
providing reliable, exactly-once end-to-end communication. At
the circuit level the router demonstrates the latest version of
our simultaneous bidirectional pads and a new method for
plesiochronous synchronization.
- The J-Machine
- is an experimental parallel computer, in operation since July
1991, that demonstrates mechanisms that greatly reduce the
overhead involved in inter-processor interaction.
Publications
A complete list of publications and citations is available
here from
Google Scholar .
Publications can be found at the
CVA group publications page
Some selected publications are included below:
- William Dally et al.
"Stream Processors: Programmability with Efficiency"
ACM Queue, March 2004, pp. 52-62.
- William Dally et al.,
"Merrimac: Supercomputing with Streams"
Supercomputing 2003
- William J. Dally and Brian Towles,
Principles and Practices of Interconnection Networks
Morgan Kaufmann, 2004
- William J. Dally and John W. Poulton,
Digital Systems Engineering,
Cambridge University Press, 1998
- Fillo, Marco, Keckler, Stephen
W., Dally, William
J., Carter,
Nicholas P., Chang, Andrew, Gurevich,
Yevgeny, and Lee,
Whay S., "The
M-Machine Multicomputer" , International Journal of Parallel
Programming - Special Issue on Instruction-Level Parallel Processing
Part II . Vol 25, No 3, 1997 pp 183-212.
- Dally, William J.,
Chang, Andrew., Chien,
Andrew., Fiske, Stuart., Horwat, Waldemar., Keen, John., Lethin,
Richard., Noakes, Michael., Nuth, Peter., Spertus,
Ellen., Wallach, Deborah., and Wills, D. Scott.
"The
J-Machine" . Retrospective in 25 Years of the International
Symposia on Computer Architecture - Selected Papers. pp 54-58.
-
William J. Dally,
Virtual Channel Flow Control,
IEEE Transactions on Parallel and Distributed Systems
March, 1992, pp. 194-205.
Companies
Bill is Chief Scientist at NVIDIA
where he was on leave during 2009 and 2010 as Chief Scientist and Senior Vice President of Research.
Bill has played a key role in founding several companies including:
- Stream Processors Inc.(2004-2009)
- to commercialize stream processors for embedded applications.
- Velio Communications. (CTO 1999-2003)
- Velio pioneered high-speed I/O circuits and applied this
technology to integrated TDM and packet switching chips. Velio's I/O
technology was acquired by Rambus
and Velio itself was acquired by LSI
Logic.
- Avici Systems, Inc.(1997-present)
- Manufactures core Internet routers with industry-leading scalability and
reliability.
Bill has also worked with Cray since 1989
on the development of many of their supercomputers including the
T3D and
T3E.
Courses
William J. Dally
<dally "at" stanford "dot" edu>
Stanford University
Computer Systems Laboratory
Gates Room 301
Stanford, CA 94305
(650) 725-8945
FAX: (650) 725-6949