THE JELLYBEAN MACHINE

<TITLE>J-Machine Project Page</TITLE>

<H1>THE JELLYBEAN MACHINE</H1> 
<H2>MIT Artificial Intelligence Laboratory</H2><P>
<IMG SRC="j-machine-tiny.gif"> <P>
<I>This is http://www.cva.stanford.edu/projects/j-machine/cva_j_machine.html</I><P>
<I>Last updated July 7, 1998 </I>  <P>

<hr>

The J-Machine is a fine grained concurrent computer designed by the
MIT <a href="http://cva.stanford.edu/cva_home_page.html">Concurrent
VLSI Architecture</A> group (now located at Stanford University) in
conjunction with Intel Corporation.

<P>

<H2> Pictures of J-machine Hardware </H2>
<menu>
  <LI> <A HREF="j-machine-1.gif"> 1024-node J-machine with cover removed.</A> </I>
  <LI> <A HREF="j-machine-2.gif"> 64-node J-machine board.</A> </I>
  <LI> <A HREF="j-machine-3.gif"> SCSI Interface for the J-machine.</A></I>
  <LI> <A HREF="j-machine-4.gif"> Host Interface board for the J-machine.</A></I>
  <LI> <A HREF="j-machine-5.gif"> Detail Showing SPARC host and interface to board stack.</A></I>
</menu>

<H2>Note</H2>
<P>
At the moment, we're trying to bring a few more users on-line with creative
applications.  If you have a cycle or bandwidth hungry application please
contact
<a href="http://www.ai.mit.edu/projects/cva/none.html">Andrew Chang</a> or
<a href="http://www.ai.mit.edu/people/lethin/lethin.html">Richard Lethin</a>.
<H2>Overview</H2>
<P>
The J-machine project was started at MIT in about 1988 as an experiment in
message-passing computing based on work that Bill Dally did at Caltech for
his doctoral dissertation.
<P>
The work was driven by the VLSI philosophy "processors are cheap" and
"memory is expensive".  This philosophy is based on a idealistic view
of VLSI economics, in which the cost of a function is based on the
VLSI area dedicated to it.  Although the standard view is that
processors are much more expensive than memory (and this standard view
was very much true before levels of VLSI integration allowed
processors to be integrated on a single chip), if we look at a typical
workstation with 32 Mbyte of memory, the amount of silicon area
dedicated to memory is roughly 100 times that for the CPU.  A bit of
DRAM is 100 lambda^2, so 32Mbyte is 32G lambda^2, versus the
arithmetic units in the CPU, which are about 300M lambda^2.
<P>
Of course, this ignores issues related to the relative production
volumes and process technologies for logic vs. memory, and runs against
the current "wisdom" that the best way to build a fast parallel processor
is to bolt a network-interface and coherent-cached-shared-memory hardware
onto a standard microprocessor.  However, we're interested in technology
imperatives much more than market imperatives.
<P>
With CPUs so cheap, in the silicon-area sense, the J-machine project set
out to explore an architecture in which the processors are more "liberally
scattered" through the machine.  We envisioned a component with economies
of scale like that for DRAMs: a "message-driven processor" with a small
processor and network interface integrated *with* the memory.  The "J" in
"J-machine" stands for "Jellybean", in the sense that the processors would
be cheap and plentiful, like jellybean candies.
<P>
The design of the J-machine incorporated several novel technologies.  The
machine architects immediately realized that a key to performance would be
fast, low-overhead messaging.  The processor and network interface are
tightly coupled, so that user-level messages can be sent with
very little overhead for copying.  The network is a 3-dimensional
deterministic wormhole-routed mesh.  User-level message handlers dispatch
on message arrival, with a small amount of queuing in on-chip memory
at the destination.  We also dispatch to handlers before the message
has completely arrived, trying to speed the dispatch process.
<P>
The communication capacity of the J-machine is pattern-dependent, of course.
Each node can inject into the network at 2 words (72 bits) per clock.
Messages travel over the links in 18-bit "flits" at 12.5MHz.  Message
reception is into on-chip memory, buffered in "4-word (144-bit) queue row buffers"
that write in one clock cycle to minimize interference with processing and
instruction fetching.  The major bisection of a 1024 node machine is 8 by 8 channels,
each 18 bits wide and running at 12.5 Mhz, or 1.8 Gbyte/sec.  Microbenchmark
studies have shown that random traffic patterns can achieve about 40% utilization
of this bandwidth.
<P>
Each processor node has about 4kb of on-chip memory and 1Mbyte of off-chip
external memory. The off-chip memory was added when we discovered the
amount of memory we could put on-chip in the available ~1989 process
technology was too small.  We'd have prefered to have put the memory
on-chip and have had more processors.  Furthermore, pinout constraints
on our packaging technology restricted us to a narrow interface to
external memory.  This remote and narrow path result in an external
access latency of 6 clock cycles, vs. one clock for the on-chip memory. 
<P>
The node's processor itself is very modest: it runs at 10MHz internal clock, with
a limited number of registers, and no floating point hardware, for performance
that's about equivalent to a 25Mhz 386.
<P>
The J-machine also incorporates mechanisms to support a concurrent
object-oriented programming model over a global object space.  Instructions
are included for a quick hashed lookup in an on-chip table. This can be
used for caching, roughly equivalent to a TLB, except that it is managed by
the compiler/os and policy is not set by the hardware.  This is used
to translate "object identifiers" (equivalent to a segmented global virtual
address) to the node which handles methods for the object and to the memory
address on the local node where the object resides in a single instruction.
<P>
Hardware also supports dynamic typing: each 32-bit word is tagged with 4 bits.
Hardware instructions like "add" will trap if they see a type other than integer.
The tag is also used to identify "futures"; these are used for
place-markers for the results of asynchronous method calls.  An attempt to
use a future by an arithmetic instruction results in a fault; the OS
suspends the thread waiting for the arrival of the result.
<P>
We currently have two programming environments for the J-machine.
<menu>
  <LI> Concurrent Smalltalk is a concurrent object-oriented language
       which looks more like Scheme than smalltalk.  It offers support
       for concurrent distributed objects, a global address space,
       object migration, explicit object placement,
       inheritance, concurrency, futures, locks, etc.  All computation
       is done in user level message handlers.  This a really neat
       and underrated language, but it requires a bit of an adventurous
       spirit since it is experimental.
       <A href="ftp://ftp.ai.mit.edu/pub/cva/CST.ps.Z">
       Waldemar Horwat's revised CST
       manual is available by clicking here.  (Note: ghostview has problems with this;
       this HREF prints ok though.)</A>
       Someone should port this
       to something like the CM-5 or T3D to expand its user base.
       </I>
  <LI> Message-Driven C was developed by our collaborators at Caltech.
       The programming model is MIMD with asynchronous function call
       invocations around the machine.
       <A href="ftp://ftp.ai.mit.edu/pub/cva/mdc.ps.Z">
       Daniel Maskit's MS thesis discussing this can be accessed
       by clicking here.</A>
       </I>
</menu>
<P>
Three 1024-node J-machine systems have been built, and live at MIT,
Caltech, and Argonne National Research Labs.
<P>
The 1024-node J-machine at MIT is hosted by the machine jelly-donut.ai.mit.edu,
eg. it's on the Internet. The 1024-node machine has a peak performance of
1G instructions/sec, peak memory bandwidth of 6 GB/sec to external memory,
1.28 GB/sec bandwidth across the central bisection.  The J-machine also
includes a dedicated filesystem and a
<A HREF="ftp://ftp.ai.mit.edu/pub/cva/video-wrkshop93.ps.Z">
distributed graphics system.</a>

<H2>Architectural Evaluation</H2>
<menu>
  <LI> Noakes, Michael D. and Wallach, Deborah A. and Dally, William J.
       "The J-Machine Multicomputer:  An Architectural Evaluation,
       Proceedings of the 20th International Symposium on Computer Architecture,
       1993.
       </I>
  <LI> Spertus, Ellen and others,
       "Evaluation of Mechanisms for Fine-Grained Parallel Programs in the J-Machine and the CM-5",
       Proceedings of the International Symposium on Computer Architecture,
       1993.
       </I>
  <LI> Fatovic, Jerko,
       "A Ray Tracer for the J-Machine",
       MS Thesis, Massachusetts Institute of 
		Technology Department of Electrical Engineering,
       May,
       1992.
       </I>
  <LI> Shaun Yoshie Kaneshiro
       "Branch and Bound Search on the J-Machine,
       MS Thesis, Massachusetts Institute of 
		Technology Department of Electrical Engineering,
       September
       1993.
       </I>
</menu>
<H2>Programming Systems</H2>
<menu>
  <LI> Chien, Andrew and Dally, William J.,
       "CST: An Object-Oriented Concurrent Language"
       Object-Based Concurrent Programming Workshop,
       September,
       1988,
       Conference held at San Diego, CA.  SIGPLAN Notices,
       February 1989.
       </I>
  <LI> Dally, William J.,
       "The J-Machine: System Support for Actors",
       in Towards Open Information Science,
       Editors, Hewitt, Carl and Agha, Gul,
       MIT Press,
       1992.
       </I>
  <LI> Horwat, Waldemar,
       "A Concurrent Smalltalk Compiler for the Message-Driven Processor,
       MIT AI Memo,
       545 Technology Sq., Cambridge, MA 02139,
       May,
       SB Thesis.
       </I>
  <LI> Horwat, Waldemar,
       Concurrent Smalltalk on the Message-Driven Processor,
       Master's Thesis, MIT, May 1989.
       </I>
  <LI> Horwat, Waldemar and Totty, Brian and Dally, William J.,
       "COSMOS: An Operating System for a Fine-Grain Concurrent Computer",
       ?unpublushed?
       </I>
  <LI> Horwat, Waldemar and Andrew Chien and William J. Dally,
       "Experience with CST:Programming and Implementation",
       Proceedings of the ACM SIGPLAN 89 Conference on 
       Programming Language Design and Implementation,
       1989.
       </I>
  <LI> <A href="ftp://ftp.ai.mit.edu/pub/cva/CST.ps.Z">
       Horwat, Waldemar, "Revised CST Manual Version ?", CVA Memo #?.
       VERY WORTHWHILE
       (Note: this HREF points to a .ps file which prints but
       gives trouble to ghostview).
       </A>
       </I>
  <LI> Totty, Brian,
       "An Operating Environment for the Jellybean Machine",
       SB Thesis, Massachusetts Institute of Technology
		Department of Electrical Engineering and Computer Science,
       May, 1988.,
       </I>
</menu>
<H2>Design and Development</H2>
<menu>
  <LI> Dally, William J. 
       and Fiske, J.A. Stuart
       and Keen, John S.
       and Lethin, Richard A.
       and Noakes, Michael D.
       and Nuth, Peter R.
       and Davison, Roy E.
       and Fyler, Gregory A.,
       "The Message-Driven Processor: A Multicomputer
       Processing Node with Efficient Mechanisms",
       "IEEE Micro",
       April, 
       1992.
       </I>
  <LI> Dally, William J. and others},
       "Design and Implementation of the Message-Driven Processor",
       Proceedings of the 1992 Brown/MIT Conference on Advanced Research in VLSI and Parallel Systems,
       MIT Press,
       March,
       1992.
       </I>
  <LI> Dally, William J. and others,
       The Message-Driven Processor: An Integrated Multicomputer Processing Element,
       Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors,
       IEEE Press,
       1992.
       </I>
  <LI> Lethin, Richard A. and Dally, William J.,
       "MDP Tools and Methods",
       Proceedings of the International Conference on 
       Computer Design: VLSI in Computers and Processors,
       1992.
       </I>
  <LI> Richard A. Lethin,
       A Simulator for the Message-Driven Processor,
       Master's Thesis MIT,
       1991.
       </I>
  <LI> Noakes, Michael and Dally, William J.,
       "System Design of the J-Machine",
       Sixth MIT Conference of Advanced Research in VLSI,
       The MIT Press,
       1990.
       </I>
  <LI> Nuth, Peter R. and Dally, William J.,
       "The J-Machine Network",
       Proceedings of the International Conference
       on Computer Design: VLSI in Computers and Processors},
       October,
       1992.
       </I>
</menu>
<H2>Early Architecture</H2>
<menu>
  <LI> Chao, Linda,
       "Architectural Features of a Message-Driven Processor",
       MIT SB Thesis, May, 1987.
       </I>
  <LI> Dally, William J. and others,
       "Architecture of a Message-Driven Processor",
       Proceedings of the 14th International Symposium on Computer Architecture,
       1987.
       </I>
  <LI> Dally, William J. and Seitz, Charles L.,
       "Deadlock Free Message Routing in Multiprocessor Interconnection Networks",
       IEEE Transactions on Computing,
       Volume C-36,
       May,
       1987.
       </I>
  <LI> Dally, William J.,
       Fine-Grain Message Passing Concurrent Computers,
       Proceedings of the Third Conference on Hypercube Concurrent Computers,
       Pasadena, CA,
       1988.
       </I>
  <LI> Dally, William J.,
       "The J-Machine System",
       in Artificial Intelligence at MIT: Expanding Frontiers,
       editor Patrick Winston with Sarah A. Shellard,
       MIT Press,
       1990.,
       </I>
  <LI> Dally, William J. and Kajiya, James T.,
       An Object Oriented Architecture,
       Proceedings of the 12th International Symposium on Computer Architecture,
       1985.
       </I>
  <LI> Dally, William J.,
       A VLSI Architecture for Concurrent Data Structures,
       Kluwer Academic Publishers,
       1987.
       </I>
</menu>
<H2>Miscellaneous</H2>
<menu>
<LI>
An experimental QCD code has been implemented on the J-machine by <a href="http://www.ai.mit.edu/people/lethin/lethin.html">Richard Lethin</a> and <a href="http://www.ai.mit.edu/people/rippel/rippel.html">Robert Rippel</a> in order to evaluate communication and computation behavior (<A HREF="http://www.ai.mit.edu/people/rippel/jqcd.ps">report in PostScript</A>).   

  <LI> Marcus Alexander von Kapff,
       Industry Assesment and Market Analysis of Massively Parallel Computers,
       Master's Thesis, Sloan School at Massachusetts Institute of Technology,
       May,
       1993.
       </I>
</menu>
<hr>
<ADDRESS> lethin@ai.mit.edu </ADDRESS>