This is my place on the web.
My research focus is on the interaction of architecture, operating systems, and networking. I strive to come up with new hardware/software interfaces that can solve existing problems and potentially open up new opportunities for both architects and OS designers. Additionally, as a computer architect, I am cognizant of the fact that architectures are changed relatively infrequently and are hampered by a great deal of legacy software. I endeavor to produce designs that are both novel in their approach and practical in their utility. Below are some of the projects and tools that I have worked on. Just about all of my research has required me to write software in order to carry out my experiments.
Increasing bandwidth requirements and power efficiency demands have led architects to frequently use foreboding terminology such as "power wall" and "memory wall". Recent innovation in integrated optics has the promise to help solve these difficult problems. I have been working on several aspects of using optics in computer systems, ranging from near term system designs using VCSELs and future stuff using integrated silicon nanophotonics.

The above figure shows the basic structures that I work with: the silicon ring resonator and the silicon waveguide. When you put these two things next to each other, really neat things happen. If you have light of a particular frequency travelling down the silicon waveguide (like travelling through a fiber optic cable), it would normally pass right by one of these rings, as shown on the leftmost picture. If the size of the ring is just right (it's circumference is an integer multiple of the wavelength of light), then the ring will be resonant and it will suck all of the light from the waveguide into the ring (shown in the second picture). It turns out that it's not just size of the ring that affects resonance, but the speed of light in the ring. The speed of light that most of people think of as "c" is the speed of light in a vacuum. The index of refraction of a material actually changes the speed. So, if we can change the index of refraction of the silicon ring resonator, then we can change the resonant frequency and move the ring between the first and second pictures. It turns out that this can be done with an electrical charge and that this can be done very quickly (think 10GHz). So, the structure shown can modulate light and is extrememly bandwidth and power efficient, even at millimeter scale distances. The third picture shows how the addition of a second waveguide allows us to build a switch that moves light from one waveguide to another. The fourth picture shows that we can build a detector. We dope the ring with Germanium and this causes some of the photons travelling around the ring to generate a current.
While all of the above stuff is really cool, I don't actually build these devices. I'm trying to figure out how they can change how we build computers. An obvious application is to use this technology to replace the electrical wires between chips with optical connections which can operate at a much higher bandwidth and at a much lower power. I want to know if there are really new things that can be done.

One of the primary ideas is optical arbitration. Arbitration is a mutual exclusion problem where there is one resource (say a data channel) and many requestors for that resource. If use the presence of light to represent the availability of a resource and a resonant ring detector to indicate interest in that resource, we can build a very basic optical arbiter. In the above example, there are six nodes (Node 1 is on the left) that can make requests for the resource. Nodes 1, 2, 4, and 6 are not interested in the resource and turn their resonators off. Nodes 3 and 5 are interested and turn their resonator on. With the light entering from the left, node 3 is the first resonant ring detector encountered by the light, so the light is removed from the waveguide and the detector associated with node 3 sees this light. No light ever makes it to node 5. Since node 3 sees light, it knows that no upstream node is requesting the resource. It also knows that since it has successfully removed the light, no downstream node can possibly see the light. Thus we have a simple arbiter. An astute reader will notice that this arbiter is unfair since Node 1 always has the highest priority. Part of my research is to look at how to solve the priority problem, as well as how to add other features to this simple arbiter.
This is just a small example of the type of problem I work on. With my research group, I'm investigating the use of ring resonators for: optical arbitration, on-chip optical interconnects for multicore systems, optically connected memory, and network switch chips using optics for both inter- and intra- chip communication. In addition, I work on trying to use nearer term optical technology (e.g. VCSELs) to build inter-chip interconnects for next generation HP products.
Many critical workloads today, such as web-hosted services, are limited not by raw CPU processing power but by interactions between the CPU cores, the memory system, I/O devices such as disks and network interfaces, and the complex software (applications, middleware, operating systems, virtual machines) that ties all these components together. To improve the efficiency of these workloads and systems, designers and developers need tools to identify the bottlenecks so that they can address them. However, existing performance analysis tools such as software profilers cannot account for hardware bottlenecks or for situations where software overheads are hidden due to overlap with other operations. [ISCA 2009] [ISPASS 2008]
I am a principal developer of the M5 simulator. M5 is a modular platform for computer architecture research, encompassing system-level architecture as well as processor microarchitecture. It is intended for use by researchers in academia or industry looking for a free, open-source, full-system simulation environment for processor, system, or platform architecture studies. The M5 simulator is written in both C++ and Python, with models written in C++ for performance and as much as possible written in Python for simplicity. The code is freely distributable under a BSD-style license and does not depend on any commercial or restricted-license software.
M5, being a system simulator, is in a lot of ways like a virtual machine (e.g. VMware) or a platform emulator (e.g. Bochs) with the primary difference being that measurement is the primary goal, not performance. Just as these other systems, M5 can boot an operating system and do networking and disk access, it just does it much slower and it can tell you all sorts of things about what happened. Furthermore, M5 supports architectural research, so we try to make it easy to do things that typical systems don't do. It's been used to model integrated network interface controllers, many-core server systems that incorporate 3D stacked memory, and future nanophotonic interconnects.
Because the inital focus of the M5 development team has been simulation of network-oriented server workloads, M5 incorporates several features not commonly found in other simulators including:
The wheel of reincarnation refers to a process in which a computer architect chooses to design a peripheral device, adds some intelligence to that peripheral device, decides that the intelligence should be more general, and realizes that the end result is a simple peripheral device attached to an additional general-purpose processor. The caNIC project (and my previous SINIC work) is an instance of going around the wheel of reincarnation when designing a faster network interface controller (NIC).
The caNIC (coherent-attached NIC) project seeks to improve the performance of networked hosts by throwing away traditional hardware and software interfaces between network interface controllers and CPUs and replacing them with more efficient ones that exploit characteristics of network processing. We are able to do this by taking advantage of increasing number of cores per socket that industry has promised by dedicating compute resources (e.g. a processor core or maybe a hardware thread) to network processing in an on-load style. Because we're using a normal processor core, we can embed our software in the operating system, or virtual machine and reduce the overheads involved.
The on-load approach is in contrast to a current trend in the NIC industry: TCP offload engines (TOEs). TOEs go almost all the way around the wheel by adding more intelligence to the NIC---enough intelligence to run the entire TCP/IP protocol stack. The caNIC design comes full circle by moving the special functionality of the NIC back to one of the central processor cores. There are multiple reasons to believe that this is a good approach:
I have demonstrated the importance of tighter integration between the NIC and CPU. I have shown that simple integration of a traditional NIC onto the CPU die can result in bandwidth improvements of more than a factor of two relative to more conventional designs. Tighter integration alone provides significant benefits, but also enables a redesign of the NIC itself to take advantage of the new properties of the interactions between the NIC and CPU, particularly lower latency. This leads to a NIC design that is significantly simpler than current high performance NICs. This design, which I call the simple integrated NIC (SINIC), moves more of the intelligence of network processing to the CPU core allowing system software programmers significantly more flexibility in the way that system software uses the NIC. Thus, a suitably redesigned NIC enables software optimizations not possible with traditional NIC designs. V-SINIC, an extended version of SINIC, provides virtual per-packet registers, enabling packet-level parallel processing while maintaining a FIFO model. V-SINIC also enables deferring the copy of the packet payload on receive, which I exploit to implement a zero-copy receive optimization in the Linux 2.6 kernel.
CPUs consume too much power. Modern complex cores sometimes waste power on functions that are not useful for the code they run. In particular, operating system kernels do not benefit from many power-consuming features that were intended to improve application performance. We propose using asymmetric single-ISA CMPs (ASISA-CMPs), multicore CPUs where all cores execute the same instruction set architecture but have different performance and power characteristics, to avoid wasting power on operating systems code. We modeled an ASISA-CMP using M5 and made extensive modifications to the Linux scheduler that allowed us to optimize power utilization by running code on the cores most suited to it.
PicoServer is a hypothetical server chip employs 3D technology to bond one die containing several simple slow processing cores to multiple DRAM dies sufficient for a primary memory. The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache, allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency in turn reduces power and means that thermal constraints, a concern with 3D stacking, are easily satisfied. The PicoServer architecture specifically targets server applications which exhibit a high degree of thread level parallelism.
Large numbers of logical registers can improve performance by allowing fast access to multiple subroutine contexts (register windows) and multiple thread contexts (multithreading). Support for both of these together requires a multiplicative number of registers that quickly becomes prohibitive. The virtual context architecture (VCA), a new register-file architecture that virtualizes logical register contexts, overcomes this limitation. VCA works by treating the physical registers as a cache of a much larger memory-mapped logical register space. Complete contexts, whether activation records or threads, are no longer required to reside in their entirety in the physical register file. A VCA implementation of register windows on a single-threaded machine reduces data cache accesses by 20%, providing the same performance as a conventional machine while requiring one fewer cache port. Using VCA to support multithreading enables a four-thread machine to use half as many physical registers without a significant performance loss. VCA naturally extends to support both multithreading and register windows, providing higher performance with significantly fewer registers than a conventional machine.
Before the era of multicore, architects were still working hard at increasing the IPC of systems. Out-of-order execution was the norm and people were working on trying to figure out how to build bigger instruction queues (IQ) to build larger instruction windows so as to expose more instruction-level parallelism and thus higher performance. Increasing a conventional IQ's physical size leads to larger latencies and slower clock speeds. We introduced a new IQ design that divided a large queue into small segments, which can be clocked at high frequencies. We use dynamic dependence-based scheduling to promote instructions from segment to segment until they reach a small issue buffer. Our segmented IQ was designed specifically to accommodate variable-latency instructions such as loads. Despite its roughly similar circuit complexity, simulation results (using a very early version of M5) indicated that our segmented instruction queue with 512 entries and 128 chains improved performance by up to 69% over a 32-entry conventional instruction queue for SpecINT 2000 benchmarks, and up to 398% for SpecFP 2000 benchmarks. The segmented IQ achieved from 55% to 98% of the performance of a monolithic 512-entry queue while providing the potential for much higher clock speeds.
As a researcher, my job is to come up with cool ideas and write papers about what I've done. In architecture research, software is mostly considered simply a means to an end, but it is often the only means to the end and requires a substantial fraction of the effort on most research projects. So, in the end, I end up writing far more code than I write English.
I actually enjoy reading books about advanced programming topics and I really like to learn about everything a language has in it (e.g. C++ template metaprogramming and Python metaclasses) even if I don't use them frequently. (As an aside, I generally think these things are cool, but unless you work on a team of excellent programmers, they can be daunting and are often best used sparingly.) I've done kernel hacking on Linux, *BSD (I was an OpenBSD committer for several years), and Tru64 UNIX. I've written user level code for Linux/Unix, Windows (including 3.1), MacOS, DOS, and Apple II. I've written embedded code for microcontrollers (including smart cards). I'm fluent in C, C++, and Python and feel most comfortable using these. Though, I've written thousands of lines of code in PERL, AWK, shell (I prefer bourne shell variants and especially zsh), Pascal, Java, Verilog, VHDL, Lisp, various flavors of BASIC (Visual BASIC, FutureBASIC, QuickBASIC, Applesoft BASIC), Assembly language (x86, Alpha, PowerPC, and 6502), and even Logo. If you count all of this stuff, this equates to a new language every year and a half to two years depending on how you count. I guess it's time to go learn something new. I have not yet jumped on the Ruby bandwagon, but since I'm sometimes forced to read Ruby code, maybe it will be next (though I'm quite fond of Python, so it would take something really special to convert me).
I use M5 regularly in my own research, but I work hard (with the help of others) to make M5 a successful open source project. To that end, my primary role in the project is to work with people to ensure that the code is high quality. So, I spend a lot of time trying to get graduate students to write better code; there are many who will testify that I am anal but that I have made them better programmers. I have to give a lot of credit to Theo De Raadt for me becoming a better programmer. He doesn't stand for people putting bad code into the OpenBSD source tree and I certainly got chewed out more than once. To be honest, I think he's mostly right, though I try to be a bit more diplomatic when I go about things.
Initially, I developed the first Full System version of the simulator by booting Tru64 on the Alpha ISA CPU model. I had to implement models for the TurboLaser chipset, PCI, an Ethernet controller, and a disk controller. I then moved on to working with some others in booting SMP Linux using the more common Tsunami/Typhoon chipset (Linux didn't support TurboLaser and as a bonus, we could actually afford to buy a real Tsunami based machine, an XP1000.) Since my current employer restricts the code I can release publicly, my coding role in M5 has lately been that of infrastructure code and project maintenance. The infrastructure I've done or am currently working on includes (these are things where I'm the primary developer, though I certainly had help with some):
You can also check out my current wish list for M5.
I was a committer to OpenBSD for about five years. I mostly did device driver work for Gigabit Ethernet, RAID controllers, and USB. Doing this sort of development for an open source operating system was a great way to get tons of free gear. I regularly ran OpenBSD on i386, amd64, Alpha, Sparc, Sparc64, and PowerPC systems and tested my device drivers on as many of these platforms as possible. This often involved adding 64-bit support, big endian support, and IOMMU support. Since I was working at Compaq when I started hacking OpenBSD, I also put a lot of effort into improving the Alpha platform support. I spent time hacking on the alpha dependent bits of the kernel and I did a lot of scrounging around at Compaq (making deals with my boss and anyone that wanted to get rid of machines) to provide as many developers as possible with Alpha systems to play with. While in grad school, I also managed a semi-regular consulting gig with Arbor Networks doing OpenBSD hacking for them.
While I was an intern at Compaq, I was one of the initial developers of ASIM. ASIM, like M5, is a modular platform for computer architecture research. It focused on processor and memory system modeling and has since become one of the primary simulation tools at Intel. The simulator was written in C++ and my role involved refining interfaces for simple plug-in of new architectural features.
I developed a natural language parser for English syntax based on residential grammar theory. The parser accepts an English sentence and produces a residential grammar parse tree. This software is currently used for teaching students English syntax and parsing.
| The University of Michigan, Ann Arbor, Ph.D., Computer Science & Engineering | April 2006 |
|---|---|
| Thesis: “Integrated System Architectures for High-Performance Internet Servers” | |
| Advisor: Dr. Steven K. Reinhardt | |
| M.S.E., Computer Science & Engineering | April 2000 |
| B.S.E., Electrical Engineering (Magna Cum Laude) | April 1998 |
| Senior Research Scientist, Exascale Computing Lab - Hewlett-Packard Labs | Sep. 2006–Present |
|---|---|
| Manager: Moray McLaren — Distinguished Technologist |
| Adjunct Lecturer - Stanford University | Spring 2009 |
|---|
| Senior Software Engineer - Arbor Networks, Inc. | 2002–2005 (Part-time Consulting) |
|---|---|
| Manager: Dr. Larry Houston — Director of Architecture | Dec. 2005–Sep. 2006 |
| Research Assistant, Advanced Computer Architecture Lab - The University of Michigan | Jan 2000–Dec 2005 |
|---|---|
| Advisor: Dr. Steven K. Reinhardt—Associate Professor |
| Student Intern, VSSAD Research Group - Compaq Computer Corporation | May–Dec 1999, May–Aug 2000, May–Aug 2001 |
|---|---|
| Supervisor: Dr. Joel S. Emer—Compaq Fellow |
| Research Assistant, Software Systems Research Lab - University of Michigan | Jan 1999–Apr 1999 |
|---|---|
| Advisor: Dr. Farnam Jahanian—Professor |
| Teaching Assistant, EECS - University of Michigan | Sep 1998–Dec 1998 |
|---|---|
| Instructor: Dr. Steven K. Reinhardt—Assistant Professor |
| Summer Intern, Performance Microprocessor Divison - Intel Corporation | May 1998–Sep 1998 |
|---|---|
| Supervisor: Alex Henstrom |
HyperX Networks for Datacenters and Systems.
Nathan Binkert, Moray McLaren, Robert Schreiber, Al Davis, and Jung Ho Ahn.
Hewlett-Packard Tech Con. May 2010.
All Optically Interconnected Data Center Switches.
Moray McLaren, Mike Tan, Charles Clark, Nate Binkert, Al Davis, David Warren, Paul Rosenberg, Wayne Sorin, Sagi Mathai, Lennie Kiyama, and Joe Straznicky.
Hewlett-Packard Tech Con. May 2010.
End-To-End Performance Forecasting: Finding Bottlenecks Before They Happen.
Ali G. Saidi, Nathan L. Binkert, Steven K. Reinhardt, and Trevor N. Mudge.
The 36th International Symposium on Computer Architecture (ISCA). June 2009.
Fast Switching of Threads Between Cores.
Richard Strong, Jayaram Mudigonda, Jeffrey C. Mogul, Nathan Binkert, and Dean Tullsen.
Operating Systems Review special issue on The Interaction Among the OS, the Compiler, and Multicore Processors. April 2009.
Devices and architectures for photonic chip-scale integration.
Jung Ho Ahn, Marco Fiorentino, Raymond G. Beausoleil, Nathan Binkert, Al Davis, David Fattal, Norman P. Jouppi, Moray McLaren, Charles M. Santori, Robert S. Schreiber, Sean M. Spillane, Dana Vantrease, and Qianfan Xu.
Applied Physics A: Materials Science & Processing.
PicoServer: Using 3D Stacking Technology to Build Energy Efficient Servers.
Taeho Kgil, Ali Saidi, Nathan Binkert, Steve Reinhardt, Krisztian Flautner, and Trevor Mudge.
ACM Journal on Emerging Technology in Computers. October 2008.
A Nanophotonic Interconnect for High-Performance Many-Core Computation.
Raymond G. Beausoleil, Jung Ho Ahn, Nathan Binkert, Al Davis, David Fattal, Marco Fiorentino, Norman P. Jouppi, Moray McLaren, Charles M. Santori, Robert S. Schreiber, Sean M. Spillane, Dana Vantrease, and Qianfan Xu.
(Invited) Proceedings of the IEEE Lasers and Electro-Optics Society 5th International Conference on Group IV Photonics. September 17, 2008.
A Nanophotonic Interconnect for High-Performance Many-Core Computation.
Raymond G. Beausoleil, Jung Ho Ahn, Nathan Binkert, Al Davis, David Fattal, Marco Fiorentino, Norman P. Jouppi, Moray McLaren, Charles M. Santori, Robert S. Schreiber, Sean M. Spillane, Dana Vantrease, and Qianfan Xu.
16th IEEE Symposium on High Performance Interconnects (HOTI). August 2008.
A Nanophotonic Interconnect for High-Performance Many-Core Computation.
Raymond G. Beausoleil, Jung Ho Ahn, Nathan Binkert, Al Davis, David Fattal, Marco Fiorentino, Norman P. Jouppi, Moray McLaren, Charles M. Santori, Robert S. Schreiber, Sean M. Spillane, Dana Vantrease, and Qianfan Xu.
Integrated Photonics and Nanophotonics Research and Applications, (Optical Society of America, 2008), paper ITuD2. July 2008.
Corona: System Implications of Emerging Nanophotonic Technology.
Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco Fiorentino, Al Davis, Nathan Binkert, Raymond G. Beausoleil, and Jung Ho Ahn.
The 35th International Symposium on Computer Architecture (ISCA). June 2008.
A Nanophotonic Interconnect for High-Performance Many-Core Computation.
Raymond G. Beausoleil, Jung Ho Ahn, Nathan Binkert, Al Davis, David Fattal, Marco Fiorentino, Norman P. Jouppi, Moray McLaren, Charles M. Santori, Robert S. Schreiber, Sean M. Spillane, Dana Vantrease, and Qianfan Xu.
(Invited) IEEE LEOS Newsletter. June 2008.
Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems.
Jeffrey C. Mogul, Jayaram Mudigonda, Nathan Binkert, Parthasarathy Ranganathan, and Vanish Talwar.
IEEE Micro Special Issue: Interaction of Computer Architecture and Operating Systems in the Many-core Era. May 2008.
A Nanophotonic Interconnect for High-Performance Many-Core Computing.
Jung Ho Ahn, Raymond G. Beausoleil, Nathan Binkert, Al Davis, Marco Fiorentino, David A. Fattal, Norman P. Jouppi, Moray McLaren, Matteo Monchiero, Charles M. Santori, Robert S. Schreiber, Sean M. Spillane, Dana Vantrease, and Qianfan Xu.
Hewlett-Packard Tech Con. May 2008.
Full-system Critical Path Analysis.
Ali G. Saidi, Nathan L. Binkert, Steven K. Reinhardt, and Trevor N. Mudge.
Eighth International Symposium on Performance Analysis of Systems and Software (ISPASS). April 2008.
High-performance Ethernet-Based Communications for Future Multi-core Processors.
Michael Schlansker, Nagabhushan Chitlur, Erwin Oertli, Paul M. Stillwell, Jr, Linda Rankin, Dennis Bradford, Richard J. Carter, Jayaram Mudigonda, Nathan Binkert, and Norman P. Jouppi.
Proceedings of the 2007 ACM/IEEE conference on Supercomputing. November 2007.
The JNIC High-Performance Communication Software Architecture.
Mike Schlansker, Dick Carter, Jayaram Mudigonda, Nathan Binkert, and Norm Jouppi.
Hewlett-Packard Tech Con. April 2007.
PicoServer: The Benefits of 3D Stacking Technology for Low-Power High-Throughput Tier 1 Servers.
Taeho Kgil, Shaun D’Souza, Nathan Binkert, Ali Saidi, Ronald Dreslinski, Steve Reinhardt, and Trevor Mudge.
The Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). October 2006.
Integrated Network Interfaces for High-Bandwidth TCP/IP.
Nathan L. Binkert, Ali G. Saidi, and Steven K. Reinhardt.
The Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). October 2006.
The M5 Simulator: Modeling Networked Systems.
Nathan L. Binkert, Ronald G. Dreslinski, Lisa R. Hsu, Kevin T. Lim, Ali G. Saidi, and Steven K. Reinhardt.
IEEE Micro Special Issue: Architecture Simulation and Modeling. July 2006.
Integrated System Architectures for High-Performance Internet Servers.
Nathan Lorenzo Binkert
Ph.D. Thesis. Department of Electrical Engineering & Computer Science, The University of Michigan. February 2006.
How to Fake 1000 Registers.
David W. Oehmke, Nathan L. Binkert, Steven K. Reinhardt, and Trevor Mudge.
The 38th Annual International Symposium on Microarchitecture (MICRO). November 2005.
Performance Analysis of System Overheads in TCP/IP Workloads.
Nathan L. Binkert, Lisa R. Hsu, Ali G. Saidi, Ronald G. Dreslinski, Andrew L. Schultz, and Steven K. Reinhardt.
The 14th International Conference on Parallel Architectures and Compilation Techniques (PACT). September 2005.
A Scalable Instruction Queue Design Using Dependence Chains.
Steven E. Raasch, Nathan L. Binkert, and Steven K. Reinhardt.
The 29th Annual International Symposium on Computer Architecture (ISCA). May 2002.
ASIM: A performance model framework.
Joel Emer, Pritpal Ahuja, Eric Borch, Artur Klauser, Chi-Keung Luk, Srilatha Manne, Shubhendu S. Mukherjee, Harish Patil, Steven Wallace, Nathan Binkert, Roger Espasa, and Toni Juan.
IEEE Computer. February 2002.
A Comparison of AES Candidates on the Alpha 21264.
Richard Weiss and Nathan Binkert. The Third AES Candidate Conference. April 2000.
HyperX: Topology, Routing, and Packaging of an Efficient Exascale Network.
Jung Ho Ahn, Nathan Binkert, Al Davis, Moray McLaren, and Rob Schreiber.
The International Conference fo High Performance Computing, Networking, Storage and Analysis.
Nanophotonic Barriers.
Nathan Binkert, Al Davis, Mikko Lipasti, Robert Schreiber, and Dana Vantrease.
Workshop on Photonic Interconnects & Computer Architecture. December 2009.
Performance Validation of Network-Intensive Workloads on a Full-System Simulator.
Ali G. Saidi, Nathan L. Binkert, Lisa R. Hsu, and Steven K. Reinhardt.
The First Annual Workshop on Interaction between Operating System and Computer Architecture (IOSCA). October 2005.
Sampling and Stability in TCP/IP Workloads.
Lisa R. Hsu, Ali G. Saidi, Nathan L. Binkert, and Steven K. Reinhardt.
The First Annual Workshop on Modeling, Benchmarking and Simulation (MoBS). June 2005.
Analyzing NIC Overheads in Network-Intensive Workloads..
Nathan L. Binkert, Lisa R. Hsu, Ali G. Saidi, Ronald G. Dreslinski, Andrew L. Schultz, and Steven K. Reinhardt.
The Eighth Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW). February 2005.
The Performance Potential of an Integrated Network Interface.
Nathan L. Binkert, Ronald G. Dreslinski, Erik G. Hallnor, Lisa R. Hsu, Steven E. Raasch, Andrew L. Schultz, and Steven K. Reinhardt.
The Advanced Networking and Communications Hardware Workshop (ANCHOR). June 2004.
Network-Oriented Full-System Simulation using M5.
Nathan L. Binkert, Erik G. Hallnor, and Steven K. Reinhardt.
The Sixth Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW). February 2003.
HyperX: Topology, Routing, and Packaging of Efficient Large-Scale Networks.
Jung Ho Ahn, Nathan Binkert, Al Davis, Moray McLaren, and Robert S. Schreiber.
HP Labs Technical Report HPL-2009-184. August 21, 2009.
Initial Experiments in Visualizing Fine-Grained Execution of Parallel Software Through Cycle-Level Simulation.
Rick Strong, Jayaram Mudigonda, Jeffrey Mogul, and Nathan Binkert.
HP Labs Technical Report HPL-2008-210. December 2008.
Operating Systems and Asymmetric Single-ISA CMPs: The Potential for Saving Energy.
Jeffrey C. Mogul, Jayaram Mudigonda, Nathan Binkert, Partha Ranganathan, and Vanish Talwar.
HP Labs Technical Report HPL-2007-140. August 2007.
A Simple Integrated Network Interface for High-Bandwidth Servers.
Nathan L. Binkert, Ali G. Saidi, and Steven K. Reinhardt.
University of Michigan Technical Report CSE-TR-514-06. January 2006.
Performance Validation of Network-Intensive Workloads on a Full-System Simulator.
Ali G. Saidi, Nathan L. Binkert, Lisa R. Hsu, and Steven K. Reinhardt.
University of Michigan Technical Report CSE-TR-511-05. July 2005.
Analyzing NIC Overheads in Network-Intensive Workloads.
Nathan L. Binkert, Lisa R. Hsu, Ali G. Saidi, Ronald G. Dreslinski, Andrew L. Schultz, and Steven K. Reinhardt.
University of Michigan Technical Report CSE-TR-505-04. December 2004.
Design and Applications of a Virtual Context Architecture.
David W. Oehmke, Nathan L. Binkert, Steven K. Reinhardt, and Trevor Mudge.
University of Michigan Technical Report CSE-TR-497-04. September 2004.
Light Speed Arbitration and Flow Control for Nanophotonic Interconnects.
ON*VECTOR Workshop. February 8, 2010.
Nanophotonic Barriers.
Workshop on Photonic Interconnects & Computer Architecture. December 13, 2009.
HPC System Architectures Panel.
ON*VECTOR Workshop. February 24, 2009.
Photonic Interconnects for HPC Architectures.
ON*VECTOR Workshop. February 24, 2009.
Photonic Interconnects for HPC.
Terabit Networking Workshop. December 17, 2008.
Performance Prediction and Simulation.
Institute for Advanced Architectures and Algorithms Interconnection Networks Workshop. July 21, 2008.
Using the M5 Simulator.
Tutorial in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). March 2, 2008.
Integrated Network Interfaces for High-Bandwidth TCP/IP.
Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). October 2006.
Integrated Network Interfaces for High-Bandwidth TCP/IP.
Invited talk at Intel Research China. September 24, 2006.
How to Fake 1000 Registers.
Invited talk at Intel Research China. September 25, 2006.
Integrated System Architectures for High-Performance Internet Servers. Invited talk at Intel. January 16, 2006.
How to Fake 1000 Registers.
Paper presentation at the 38th Annual International Symposium on Microarchitecture (MICRO). November 14, 2005.
Best Presentation
Using the M5 Simulator.
Invited talk at the First Workshop on Interaction between Operating System and Computer Architecture. October 8, 2005.
Performance Analysis of System Overheads in TCP/IP Workloads.
Paper presentation at the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT). September 20, 2005.
Redesigning Systems for High-speed Networking.
Invited talk at Advanced Micro Devices. May 20, 2005.
Using the M5 Simulator.
Tutorial in conjunction with the 33rd Annual International Symposium on Computer Architecture (ISCA). June 8, 2005.
Analyzing NIC Overheads in Network-Intensive Workloads.
Paper presentation at the Eighth Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW). February 12, 2005.
The Performance Potential of an Integrated Network Interface.
Paper presentation at the Advanced Networking and Communications Hardware Workshop (ANCHOR). June 19, 2004.
Towards Integrated System Architectures for High-Speed Networking.
Invited talk at IBM. January 22, 2004.
Network-Oriented Full-System Simulation using M5.
Sixth Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW). February 9, 2003.
Operating System Changes for SMT Processors.
Invited Presentation at Intel. October 11, 2002.
EE382C: Interconnection Networks (Course Lecturer).
Stanford University. Spring 2009.
EE282: Computer Systems Architecture (Guest lectures on I/O and System Software).
Stanford University. Fall 2008.
EE282: Computer Systems Architecture (Guest Lecture on I/O).
Stanford University. Spring 2007.
US Patent #7,532,785: Photonic interconnects for computer system devices.
Raymond G. Beausoleil, Marco Fiorentino, Norman Paul Jouppi, Nathan Lorenzo Binkert, Robert Samuel Schreiber, and Qianfan Xu.
17 Patents pending and 6 Patents under Preparation
I am a principle developer of the M5 simulator. M5 is a modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. My contributions to M5 include the implementation of:
M5 is freely available, enabling other researchers to build upon it and potentially foster collaboration in a common framework.
OpenBSD is a freely available open source operating system focusing on security. My role as an OpenBSD developer focused on device driver support for Gigabit Ethernet adapters, RAID controllers, and USB devices. In addition, I was involved in support for the Alpha platform.
I was one of the initial developers of the ASIM simulator. ASIM, like M5, is a modular platform for computer architecture research. It focused on processor and memory system modeling and has since become one of the primary simulation tools at Intel. The simulator was written in C++ and my role involved refining interfaces for simple plug-in of new architectural features.
I developed a natural language parser for English syntax based on residential grammar theory. The parser accepts an English sentence and produces a residential grammar parse tree. This software is currently used for teaching students English syntax and parsing.
Nathan L. Binkert