EE482A Spring Quarter 1999/2000
Reading List Version 1.1

Mattan Erez and William Dally

The setting: technology, market trends, critical issues (4.3)

Required Reading
  1. K. Diefendorff, "PC Processor Microarchitecture", Microprocessor Report, Vol. 13 No. 9, July 12, 1999.
  2. S. Hamilton, "Taking Moore's Law into the Next Century", Computer, Vol. 32 Issue 1, January 1999.
  3. C. Kozyrakis and D. Patterson, "A New Direction for Computer Architecture Research", Computer, Vol. 31 Issue 11, November 1998.
Branch Prediction (4.5)

Required Reading

  1. J. Lee and A.J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design", Computer, Vol. 17 Issue 1, January 1984.
  2. S. McFarling, "Combining Branch Predictors", Technical Note TN-36, DEC WRL, June 1993.
  3. E. Federovsky, M. Feder, and S. Weiss, "Branch prediction based on universal data compression algorithms", in Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
Highly Recommended Reading
  1. M. Evers, S. Patel, R. Chappell and Y. Patt, "An analysis of correlation and predictability: what makes two-level branch predictors work", in Proceedings of the 31st International Symposium on Microarchitecture, December 1998.
Recommended Reading
  1. A. Eden and T. Mudge, "The YAGS branch prediction scheme", in Proceedings of the 31st International Symposium on Microarchitecture, December 1998.
  2. T. Yeh and Y. Patt, "Alternative Implementations of Two-Level Adaptive Branch Prediction", in Proceedings of the 19th International Symposium on Computer Architecture, 1992.
  3. R. Nair, "Dynamic Path-Based Branch Correlation", in Proceedings of the 28th International Symposium on Microarchitecture, December 1995.
  4. J. Kalamatianos and D. Kaeli, "Predicting indirect branches via data compression", in Proceedings of the 31st International Symposium on Microarchitecture, December 1998.
  5. K. Driesen and U. Holzle, "The Cascaded Predictor: Economical and Adaptive Branch Target Prediction", in Proceedings of the 31st International Symposium on Microarchitecture, December 1998.
  6. D. Grunwald, A. Klauser, S. Manne, and A. Pleszkun, "Confidence Estimation for Speculation Control", in Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
  7. J. E. Smith, T. Heil, S. Sastry, and T. Bezenek, "Improving Branch Predictors by Correlating on Data Values," in Proceedings of the 32nd Annual International Symposium on Microarchitecture, November 1999.
Fetch Issues (4.10)

Required Reading

  1. G. Reinman, B. Calder, and T. Austin, "A Scalable Front-End Architecture for Fast Instruction Delivery", in Proceedings of the 26th International Symposium on Computer Architecture, May 1999.
  2. C.K. Luk and T. Mowry, "Cooperative Prefetching: Compiler and Hardware Support for Effective Instruction Prefetching in Modern Processors", in Proceedings of the 31st International Symposium on Microarchitecture, December 1998.
  3. T. Conte, K. Menezes, P. Mills, and B. Patel, "Optimization of Instruction Fetch Mechanisms for High Issue Rates," in Proceedings of the 22nd International Symposium on Computer Architecture, June 1995.
Recommended Reading
  1. M. Slater, "Rise Joins x86 Fray With mP6", Microprocessor Report Vol. 12 Issue 15, November 16, 1998.
  2. C. Lefurgy, E. Piccininni, and T. Mudge, "Evaluation of a high performance code compression method", in Proceedings of the 32nd International Symposium on Computer Microarchitecture, November 1999.
  3. S. McFarling, "Program Optimization For Instruction Caches", in Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, April 1989.
  4. J. Turley, "PowerPC Adopts Code Compression", Microprocessor Report Vol. 12 Issue 14, October 26, 1998.
  5. J. Bondi, A. Nanda, and S. Dutta, "Integrating a Misprediction Recovery Cache (MRC) into a Superscalar Pipeine", in Proceedings of the 29th International Symposium on Microarchitecture, December 1996.
Trace Cache (4.12)

Required Reading

  1. A. Peleg and U. Weiser, "Dynamic Flow Instruction Cache Memory Organized around Trace Segments Independent of Virtual Address Line," U.S. Patent Number 5,381,533, Intel Corporation, 1994.
  2. D. Friendly, S. Patel, and Y. Patt, "Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism", in Proceedings of the 30th International Symposium on Microarchitecture, Novemeber 1997.
  3. Q. Jacobson, E. Rotenberg, and J. Smith, "Path-Based Next Trace Prediction," in Proceedings of the 30th International Symposium on Microarchitecture, Novemeber 1997.
Recommended Reading
  1. R. Nair and M. Hopkins, "Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups", in Proceedings of the 24th International Symposium on Computer Architecture, June 1997.
  2. S. Dutta and M. Franklin, "Control Flow Prediction with Tree-Like Subgraphs for Superscalar Processors," in Proceedings of the 28th International Symposium on Microarchitecture, November 1995.
  3. Stephan Jourdan, Lihu Rappoport, Yoav Almog, Mattan Erez, Adi Yoaz, and Ronny Ronen, "eXtended Block Cache", in Proceedings of the 6th International Symposium on High-performance Computer Architecture, January 2000.
  4. S. Patel, M. Evers, and Y. Patt, "Improving Trace Cache Effectiveness with Branch Promotion and Trace Packing," in Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
Predication and Eager Execution (4.17)

Required Reading

  1. S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, "Effective Compiler Support for Predicated Execution Using the Hyperblock", in Proceedings of the 25th International Symposium on Microarchitecture, December 1992.
  2. A. Uht and V. Sindagi, "Disjoint Eager Execution: An Optimal Form of Speculative Execution", in Proceedings of the 28th International Symposium on Microarchitecture, November 1995.
  3. A. Klauser, A. Paithankar, and D. Grunwald, "Selective Eager Execution on the PolyPath Architecture", in Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
Recommended Reading
  1. G.S. Tyson, "The Effects Of Predicated Execution On Branch Prediction", in Proceedings of the 27th International Symposium on Microarchitecture, November 1994.
  2. D.I. August, W.W. Hwu, and S.A. Mahlke, "A Framework for Balancing Control Flow and Predication", in Proceedings of the 30th International Symposium on Microarchitecture, November 1997.
  3. A. Klauser, T. Austin, D. Grunwald, and B. Calder, "Dynamic Hammock Predication for Non-Predicated Instruction Set Architectures", in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 1998.
  4. D. Pnevmatikatos and G. Sohi, "Guarded Execution and Branch Prediction in Dynamic ILP Processors", in Proceedings of the 21st International Symposium on Computer Architecture, June 1994.
Memory Systems and Memory Latency (4.19 (papers 1-3), 4.24 (papers 4-6))

Required Reading

  1. A. Saulsbury, F. Pong, A. Nowatzyk, "Missing the Memory Wall: The Case for Processor/Memory Integration", in Proceedings of the 23rd International Symposium on Computer Architecture, May 1996.
  2. N. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers", in Proceedings of the 17th International Symposium on Computer Architecture, June 1990.
  3. S. Rixner, W. Dally, U. Kapasi, P. Mattson, and J. Owens, "Memory Access Scheduling", in Proceedings of the 27th International Symposium on Computer Architecture, June 2000.
  4. T.F. Chen and J.L. Baer, "Effective Hardware Based Prefetching for High-Performance Processors", IEEE Transactions on Computers, Vol. 44 No. 5, May 1995.
  5. T. Mowry, M.S. Lam, and A. Gupta, "Design and Evaluation of a Compiler Algorithm for Prefetching", in Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1992.
  6. D. Joseph and D. Grunwald, "Prefetching Using Markov Predictors", in Proceedings of the 24th International Symposium on Computer Architecture, June 1997.
Recommended Reading
  1. C.K. Luk and T. Mowry, "Compiler-Based Prefetching for Recursive Data Structures", in Proceedings of the seventh International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996.
  2. A. Roth, G. Sohi, "Effective Jump-Pointer Prefetching for Linked Data Structures", in Proceedings of the 26h International Symposium on Computer Architecture, June 1999.
  3. T. Alexander and G. Kedem, "Distributed Prefetch-buffer/Cache Design for High Performance Memory System", in Proceedings of the 2nd International Symposium on High-performance Computer Architecture, February 1996.
  4. M. Bekerman, S. Jourdan, R, Ronnen, G. Kirshenboim, L. Rappoport, A. Yoaz, and U., Weiser, "Correlated Load-Address Predictors", in Proceedings of the 26h International Symposium on Computer Architecture, June 1999.
Memory Disambiguation and Speculation (4.26)

Required Reading

  1. A. Nicolau, "Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies", IEEE Transactions on Computers, Vol. 38 No. 5, May 1989.
  2. G. Reinman and B. Clader, "Predictive Techniques for Aggressive Load Speculation", in Proceedings of the 31st International Symposium on Microarchitecture, December 1998.
  3. A. Moshovos and G. Sohi, "Streamlining Inter-operation Memory Communication via Data Dependence Prediction", in Proceedings of the 30th International Symposium on Microarchitecture, December 1997.
Recommended Reading
  1. T. Austin and G. Sohi, "Zero-Cycle Loads: Microarchitecture Support for Reducing Load Latency", in Proceedings of the 28th International Symposium on Microarchitecture, December 1995.
  2. A. Yoaz, M. Erez, R Ronnen, and S. Jourdan, "Speculation Techniques for Improving Load Related Instruction Scheduling", in Proceedings of the 26h International Symposium on Computer Architecture, June 1999.
  3. M. Franklin and G. Sohi, "ARB: A Hardware Mechanism for Dynamic Memory Disambiguation", IEEE Transactions on Computers, Vol. 45 No. 5, May 1996.
  4. G. Chrysos and J. Emer, "Memory Dependence Prediction Using Store Sets", in Proceedings of the 25h International Symposium on Computer Architecture, July 1998.
  5. J. Hesson, J. LeBlanc, and S. Ciavaglia, "Apparatus to Dynamically Control the Out-Of-Order Execution of Load-Store Instrructions", US Patent 5,615,350, Filed December 1995, Issues March 1997.
Value Prediction (5.1)

Required Reading

  1. S. P. Harbison, "An Architectural Alternative to Optimizing Compilers," in Proceedings of the first Symposium on Architectural Support for Programming Languages and Operating Systems, 1982.
  2. M. H. Lipasti and J. P. Shen, "Exceeding the Dataflow Limit via Value Prediction," in Proceedings of the 29th Annual International Symposium on Microarchitecture, December 1996.
  3. A. Sodani and G. S. Sohi, "An Empirical Analysis of Instruction Repetition," in Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998.
Highly Recommended Reading
  1. S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz, "A Novel Renaming Scheme to Exploit Value Temporal Locality through Physical Register Reuse and Unification," in Proceedings of the 31st Annual International Symposium on Microarchitecture, November 1998.
Recommended reading
  1. F. Gabbay and A. Mendelson, "Using Value Prediction to Increase the Power of Speculative Execution Hardware," in ACM Transactions on Computer Systems, Vol. 16 No. 3, August 1998.
  2. Y. Sazeides and J. E. Smith, "The Predictability of Data Values," in Proceedings of the 30th Annual International Symposium on Microarchitecture, December 1997.
  3. A. Sodani and G. S. Sohi, "Dynamic Instruction Reuse," in Proceedings of the 24th Annual International Symposium on Computer Architecture, June 1997.
  4. B. Calder, G. Reinman, and D. Tullsen, "Selective Value Prediction," in Proceedings of the 26th Internation Symposium on Computer Architecture, May 1999.
  5. K. Wang and M. Franklin, "Highly Accurate Data Value Prediction using Hybrid Predictors", in proceedings of the 30th Annual International Symposium on Microarchitecture, December 1997.
  6. B. Rychlik, J. Faistl, B. Krug and J. Shen, "Efficacy and Performance Impact of Value Prediction", in Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, October 1998.
  7. Y. Sazeides and J. E. Smith, "Modeling Program Predictability", in Proceedings of the 25th Annual International Symposium on Computer Architecture, July 1998.
ILP Execution (5.3)

Required Reading

  1. R. M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units", IBM J. Research and Development 11:1, January 1967.
  2. S. Palacharla, N. Jouppi, and J. Smith, "Complexity-Effective Superscalcar Processors", in Proceedings of the 24th International Symposium on Computer Architecture, June 1997.
  3. J. Smith and A. Pleszkun, "Implementation of Precise Interrupts in Pipelined Processors", in Proceedings of the 12th International Symposium on Computer Architecture, June 1985.
  4. L. Gwennap, "VLIW: The Wave of the Future?", Microprocessor Report, Vol. 8 No.2, February 14, 1994.
Recommended Reading
  1. S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz, "A Novel Renaming Scheme to Exploit Value Temporal Locality through Physical Register Reuse and Unification," in Proceedings of the 31st Annual International Symposium on Microarchitecture, November 1998.
  2. T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, and V. Vinals, "Delaying Physical Register Allocation through Virtual-Physical Registers", in Proceedings of the 32nd Annual International Symposium on Microarchitecture, November 1999.
  3. J. Fisher, "Very Long Instruction Word Architectures and the ELI-512", in Proceedings of the 10th International Symposium on Computer Architecture, June 1983.
  4. R. Colwell, R. Nix, J. O'Donnel, D. Papworth, and P. Rodman, "A VLIW Architecture for a Trace Scheduling Compiler", IEEE Transactions on Computers, Vol. 37 No.8, August 1988.
Beyond ILP (5.8 (papers 1-4), 5.10 (papers 5-7))

Required Reading

  1. W.D. Weber and A. Gupta, "Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary Results", in Proceedings of the 16th International Symposium on Computer Architecture, May 1989.
  2. S. Keckler and W. Dally, "Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism", in Proceedings of the 19th International Symposium on Computer Architecture, June 1992.
  3. L. Hammond, M. Willey, and K. Olukotun, "Data Speculation Support for a Chip Multiprocessor", in Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998.
  4. G. Sohi, S. Breach, and T.N. Vijaykumar, "Multiscalar Processors", in Proceedings of the 22nd International Symposium on Computer Architecture, June 1995.
  5. D. Tullsen, S. Eggers, and H. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism", in Proceedings of the 22nd International Symposium on Computer Architecture, June 1995.
  6. H. Akkary and M. Driscol, "A Dynamic Multithreaded Processor", in Proceedings of the 31st Annual International Symposium on Microarchitecture, November 1998.
  7. E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. Smith, "Trace Processors", in Proceedings of the 30th Annual International Symposium on Microarchitecture, November 1997.
Recommended Reading
  1. P. Marcuello and A. Gonzalez, "Clustered Speculative Multithreaded Processors", in Proceedings of the 1999 International Conference on Supercomputing, April 1999.
  2. S. Keckler, W. Dally, D. Maskit, N. Carter, A. Chang, and W.S. Lee, "Exploiting Fine-Grain Thread Level Parallelism on the MIT Multi-ALU Processor", in Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
Vectors and  Streams (5.15)

Required Reading

  1. S. Rixner, W. Dally, U. Kapasi, B. Khailany, A. Lopez-Lagunas, P. Mattson, and J. Owens, "A Bandwidth-Efficient Architecture for Media Processing", in Proceedings of the 31st Annual International Symposium on Microarchitecture, November 1998.
  2. C. Kozyrakis, S. Perissakis, D. Patterson, T. Anderson, K. Asanovic, M. Cardwell, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, R. Thomas, N. Treuhaft, and K. Yelick, "Scalable Processors in the Billion-Transistor Era: IRAM", Computer, Vol. 30 Issue 9, September 1997.
  3. K. Diefendordd, "Sony's Emotionally Charged Chip", Microprocessor Report, Vol. 13 No. 5, April 19, 1999.
Recommended Reading
  1. P. Glaskowsky , "Media Processors Redefined", Microprocessor Report, January 24, 2000.
  2. P. Glaskowsky, "MAP1000 Unfolds at Equator", Microprocessor Report, Vol. 12 No. 16, December 7, 1998.
  3. P. Glaskowsky, "Philips Advances TriMedia Architecture", Microprocessor Report, Vol. 12 No. 14, October 26, 1998.
Low Power Design (5.17)

Required Reading

  1. J. Montanaro et al., "A 160MHz, 32b, 0.5W CMOS RISC Microprocessor", IEEE Journal of Solid-State Circuits, volume 31, number 11, November 1996, pp. 1703-1714.
  2. A. Sinha and A. Chandrakasan, "Energy Aware Software", in Proceedings of the 13th International Conference on VLSI Design, January 2000.
  3. S. Manne, A. Klauser, and D. Grunwald, "Pipeline Gating: Speculation Control for Energy Reduction", in Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
Recommended Reading
  1. T. Halfhill, "Transmeta Breaks x86 Low_power Barrier", Microprocessor Report, February 14, 2000.
  2. R. Gonzalez and M. Horowitz, "Energy dissipation in general purpose  microprocessors," IEEE Journal of Solid-State Circuits, September 1996, pages 1277-1284.
  3. T. Burd and R. Brodersen, "Energy Efficient CMOS Microprocessor  Design," Proceedings of the 28th Annual HICSS Conference,  Jan. 1995; Vol. I, pp. 288-297.
Reliability, Availability, and Serviceability (5.22)

Required Reading

  1. "Ultra Enterprise 10000 Server: SunTrust Reliability, Availability, and Serviceability", Sun Microsystems Technical White Paper, 1997.
  2. T. Slegel, R. Averill III, M. Check, B. Krumm, C. Krygowski, W. Li, J. Liptay, J. MacDougall, T. McPherson, J. Navarro, E. Schwarz, K. Shum, and C. Webb, "IBM's S/390 G5 Microprocessor Design", IEEE Micro, Vol. 19 No. 2, March-April 1999.
  3. T. Austin, "DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design", in Proceedings of the 32nd Annual International Symposium on Microarchitecture, November 1999.