networkZONE Products for the week of May 24, 2004


Chelsio Communications Says…
No Waiting: Chelsio's 10G Ethernet Adapters Slash Latency With On-Chip TCP Offload Engine
ASIC-Based TCP processor Accelerates Most High-User-Density Applications By Delivering Sub-10 Microsecond User-to-User Latency For Standard Ethernet Frames

Chelsio Communications, Inc. has broken through the 10 microsecond (µs) latency barrier for 10-Gigabit Ethernet. The company is the first 10G Ethernet adapter vendor delivering a TCP offload engine (TOE) in silicon, significantly raising the performance and latency bar for the 10G Ethernet adapter industry. Chelsio is also the first to deliver 10G iSCSI in silicon.

"Chelsio is focused on accelerating the convergence of network and storage applications using 10-Gigabit Ethernet technology," said Kianoosh Naghshineh, founder and CEO of Chelsio Communications. "Simple, managed networks of integrated storage and network traffic will be made possible by 10-Gigabit Ethernet, and we are using this high-performance computing conference to show the latest advancements possible in the throughput, latency and scalability of these adapters."

In a demonstration at "Grid Today 2004" Chelsio's host bus adapter, called the T110, will was shown transmitting standard 1500-Byte Ethernet frames in a peer-to-peer configuration at 7.8Gb throughput with less than 10 microseconds latency from user space to user space and 50% CPU utilization with a 2.2GHz Opteron-based server. The line-rate performance of the adapter stays consistent with equal and stable bandwidth per TCP connection, whether there is one or 10,000 connections.

The best performance other 10GE adapters on the market can claim in transferring standard Ethernet frames is only 3 to 4Gbps, with higher latency and more than 100% CPU utilization. This limitation of 10GE adapters has hindered the deployment of the otherwise ubiquitous Ethernet technology in HPC facilities. Some higher performance claims are sometimes made but these use Jumbo Ethernet (9000 Byte) frames, which still cannot achieve the performance of the Chelsio solution using standard Ethernet frames.

"Our benchmark tests show that Chelsio Communications has delivered the first 10G Ethernet adapter card that simultaneously achieves high throughput, low latency, and more importantly, low CPU utilization -- all while using the ubiquitous TCP/IP protocol suite with standard 1500-byte packets. Keeping CPU utilization low frees the CPU to work on other important computing tasks in parallel," said Wu Feng, team leader of research & development in Advanced Network Technologies (RADIANT) at Los Alamos National Laboratory. "Using high-speed Ethernet as the interconnect technology is preferred for many reasons, particularly its ubiquity and ease of deployment."

Chelsio's host bus adapter card, the T110, is built with Chelsio's Terminator ASIC, a deeply-pipelined VLIW (Very Long Instruction Word) architecture that delivers numerous high-bandwidth and low-latency advantages over RISC-based multi-processor system-on-chip implementations. It is the first chip on the market to include a 10G TOE, which is required for high-speed Ethernet networks. The Terminator ASIC has a capacity of one million sessions, while the T110 card can support up to 64,000 connections. The card is sampling now, and is priced at $4,900 each in small quantities.

Performance and Latency Demo at Gt'04
At the Grid Today 2004 conference Chelsio demonstrated 7.8Gbps at 50% CPU utilization with an interconnect latency from user-to-user application space of 9.7 µs. Pallas MPI benchmarks were shown to demonstrate the MPI capabilities of T110 for HPC environments. HP Integrity servers running the Itanium 2 processor and unbranded servers running Opteron processors were the servers with Chelsio's 10G Ethernet adapter with Long Reach XPAKs and Marvell's CX4 for fiber and copper connectivity respectively. A Fujitsu 10-Gigabit Ethernet switch, which has both fiber and copper modules, was used to connect all these servers.

"High performance computing applications need speed and agility from the Information Technology infrastructure in order to meet the demanding needs of engineers and scientists," stated Brian Cox, worldwide product line manager for Hewlett Packard's Business Critical Servers. "10 Gigabit Ethernet technology greatly accelerates the performance flow of information across the fabric in data centers and the computing Grid which dramatically improves time to solution."

"10-Gigabit Ethernet is an important technology for the evolution of next generation infrastructure and client applications," said Kamal Dalmia, director of product marketing for Marvell's Connectivity Business Unit. "As the leader in physical layer technology, we are providing comprehensive solutions for Gigabit and 10-Gigabit Ethernet applications and are happy to be a part of the performance demonstration with Chelsio."

analogZONE Says . . .

Because I write for the electronic design community, I normally confine my reivews to the chip level but make exceptions such as this where the technology or application is important enough, and a board-level solution would make sense for a design engineer to consider. Broadcom has already begun to offer merchant TCP offload chips that support Microsoft's TCP Chimney Architecture running as high as 1 Gbit/s (with others to follow soon), but the order-of-magnitude speed increase Chelsio offers is sufficient to make it worthy of your consideration for extreme applications.

Chelsio has recognised an important emerging trend to converge LAN, SAN, and SCSI networks on a single 10 Gbit/s Ethernet connection. This approach is especially interesting in server blades that could have a single "spigot" that supports storage, networking and clustering applications. Protocols that overlay SCSI on IP (ISCSI), or encapsulating FibreChannel traffic within IP streams can eliminate the need for a second dedicated network to move data to and from storage arrays. The same phenomenon should also emerge in clustering traffic that will eventually have its own exchange protocol, while everything else is handled in native Ethernet format. Using TCP as a common denominator eliminates the need for costly separate switches for each network, and allows the use of "generic" Ethernet switch silicon. This allows designers to to leverage the heavily-developed technologies and products developed for LANS in storage and clustering networks, and perhaps even serial backplane applications.

Much like the current 1-Gbit/s solutions on the market today, Chelsio is intent on addressing the headaches that arise when unifying lots of different traffic types on a single, heavily-subscribed pipeline. In these applications, the protocol translation and encapsulation takes lots of MIPs from a processor-based host and can cause major bottlenecks well before the pipeline is saturated.

To get around this Chelsio uses their own custom "Terminator" chip, an ASIC-based offload engine that relieves host system congestion by handling all Layer-4 functions (transport) plus limited Layer-5 (session) and implements them in silicon. It uses a single processor with a VLIW-like architecture-a coupled with a big chunk dedicated hardware logic (mostly state machines) to handle repetitive operations like CRC and paging calculations. This combination allows the chip to be programmed by setting hundreds of programmable registers (accessable via host bus) that allow fine-tuning for a particular application. And in case you were worried, the TCP rules are all handled in a programmable fashion to ensure maximum flexibility.

The resulting processor can do all the header inspection and manipulation operations necessary to terminate a TCP connection, plus some processor-intensive iSCSI functions (CRC & PDU recovery), as well as accelerates iSCSI, Direct data placement (DDP) , and remote DMA (RMA) functions (RDMA not quite standardized yet.) without any host intervention. It uses a SPI 4.2 for a bus interface-to-line connection, and a second SPI 4.2 as a system-side connection when used as a service blade. Its PCI-X interface is used for HBA host interface.

The card hosting the "Terminator"ASIC features an XPAK 10G optical interface, a PCI-X host interface, and enough external table and buffer memory to let the TOE engine handle up to 64 k smultaneous sessions (with additional RAM, the chip can handle to 1M connections).

The offload results appear to be dramatic, with a 10-Gbit connection producing a CPU utilization level of less than 50% using a single 2.2-GHz Opteron, and less than 15% with a 1.5-GHz Itanium chip. It also boasts 200 kconnections/s worth of set-up/teardown capabilities.

Currently, Chelsio is targeting both normal networks and especially high-performance computing (HPC) applications where latency is an enormous issue. This includes real-time processing applications, such as real-time modeling and simulations that use massive data bases. As the price goes down, this technology should also begin to find widespread applications in more mainstream applications, such as accelerating the response time of large commercial data bases.

Chelsio is currently only selling the Terminator as part of a board-level solution for $4900 in sample quantities. The company says it has no plans for chip sales yet but, if one reads between the lines a little, they have not said "no" to the idea if the "right customer" comes along.

Data Sheet

Lee's Saltshaker Rating

   





acquisitionZONE - audio/videoZONE - greenZONE - hf/rfZONE - i/oZONE - networkZONE - powerZONE - in the ZONE
home

analogZONE
(c) 2004. All rights reserved.