networkZONE Products for the week of August 7, 2006


Tehuti Networks Says…
Tehuti Networks 10GbE Controllers Powers Silicom Connectivity Solutions First 10 Gigabit Ethernet Server Networking Adapter
Tehuti Networks now shipping single-chip, low- power, high-performance 10GbE Components

Tehuti Networks Inc. has announced that Silicom Connectivity Solutions will incorporate Tehuti Networks' chip in its first 10GbE Server Networking Adapter products. Tehuti Networks also announced that the company is now shipping the TN3014 and the Silicom built NICs based on this chip.

"Silicom has been a long time technology partner, and we are proud they have included Tehuti Networks technology in their high-performance 10 Gigabit Ethernet product line," said Arie Brish, CEO of Tehuti Networks. "As we begin shipping our solutions, Tehuti Networks looks forward to working with ODM and OEM customers to help support growing market demand for 10 Gigabit Ethernet solutions."

"Silicom is excited to integrate Tehuti Networks' 10 GbE Network Traffic Accelerator chip into our Ethernet Server Adapter product line," said Mr. Shaike Orbach, president and CEO of Silicom. "Tehuti's solution will help drive Silicom's expansion into additional market segments that need cost-effective, high-performance connectivity solutions."

Tehuti Networks' innovative 10GbE solutions allow servers to add 10 Gigabit Ethernet performance to increase network throughput by up to five times compared to conventional approaches -- without increasing cost, power or complexity. Tehuti Networks' solution supports all major operating systems as well as both hardware (Shared I/O) and software (VMWare, Xen, MS, Solaris, etc.) virtualization schemes.

analogZONE Says . . .

There are several critical issues that need to be resolved in order to push 10G Ethernet past the tipping point where it achieves real market traction, not the least of which is coming up with cost-effective TCP offload techniques. It's become pretty obvious that unless networks move most of the low-level TCP/IP termination tasks as close to the PHY layer as possible, the transaction overhead will absorb most of the processing power and bus bandwidth of all but the most powerful servers. That's why I've chosen to review Tehuti's TN3014 single-chip 10G Ethernet MAC and TCP acceleration engine, despite the company's reluctance to share many important details of their device's inner workings. The press release above is the closest thing to a product announcement that the company has issued to date, but I'll share with you the few new facts I was able to glean from the minimal briefing I received.

In contrast to full-blown TCP Offload Engines (TOEs) such as those made by Broadcom, Chelsio (reviewed here in 2004) and NetEffect, Tehuti has chosen the less radical offload approach which does not bypass the host's TCP/IP stack. This local partial hardware assist approach is intended to attack bottlenecks in the host's CPU, bus, and system memory in a transparent manner using a collection of techniques, including accelerating checksum calculations (using "Smart NIC" mode) and TCP reassembly techniques that aggregate packets and pass larger "chunks" to the server for more efficient transfers and processing.

Besides reducing the number of bus transactions, the MAC's accelerators handle most of the nasty, time-consuming details involved with complex TCP/IP transactions and produce simplified, but legitimate, packet headers that allow the upper layers of the software stack to remain blissfully ignorant of what's going on below. This includes dealing with VPN tags, encapsulation and other connection-related overhead that can be dealt with at the MAC layer. The Tehuti MAC also automatically performs connection search and fetch tasks that identify and manage multiple TCP connections -- a real work-saver for servers, load balancing boxes, security appliances and other equipment that supports large numbers of connections.

Because this partial approach does not require modification of driver software to implement, it's a natural drop-in for legacy applications, especially single-ended upgrades. No protocol stack mods also mean that the chip that works with many of the popular server OS's that do not currently support offload engines such as EMC's VMWare.

A quick look at a high-level block diagram reveals that the chip uses several hardware accelerator cores to support TCP cache and search functions, and a more general-purpose "accelerator engine" which one presumes is responsible for some of the more complex operations. The MAC's business end sports a pair of 10GbE ports (see Fig. 1) and an 8x PCIe host system interface. The production chip will have both XFI and XAUI interfaces as well. Tehuti claims their MAC will support both 10GbE ports at full rate, but I encountered some vague mumbling when I pressed for a definition of what full rate meant and what the actual capacity of the device was.

Another interesting thing to note about this product is that it's FPGA based. Given the NRE required to tool up for a complex merchant chip these days, and the fact that TCP offload technology is still evolving, I think Tehuti was smart to go this route. Since their FPGA vendor (Altera) offers the option, I'm sure we'll see them move an updated version of the design to a structured ASIC once the market matures a bit more and sales volumes can justify the tooling cost. For the moment, however, keeping the product in an FPGA makes it easy for Tehuti to add new features as needed or tweak its design for a particular customer. This is the second FPGA-based networking product I've encountered in the past several months (See my February 2006 review of Tarari's content processor), and I would not be surprised if we see many chips addressing emerging markets making their debuts in FPGAs. For early market and limited volume applications, the unit cost and power penalties involved should be more than outweighed by the quick time to market, easy upgrades and lower development costs that they afford. About the only concern I'd have would be the potential for IP theft via the FPGA's programming memory.

Despite the fact that their solution still relies on the host for a good chunk of the TCP termination, Tehuti claims that it delivers equivalent, or better line utilization and host system offload than most current full-up TOE bypass devices have demonstrated. I did not get to see the actual tests to verify that they were actual apples-to-apples comparisons, but did see some genuine-looking screen shots (see Fig. 2) that showed a Tehuti MAC achieving 85% line rate utilization using about 60% of a 3.6 GHz Intel Xenon's CPU cycles (with Hyper-threading enabled).

I'll also include the results of the benchmark tests against a couple of competing chips which Tehuti showed me so you can draw your own conclusions. The baseline configuration for the tests is as follows:

If one can believe the data from the tests, it would appear that one of the full-TOE bypass MACs only manages to deliver around 40% line utilization with same CPU MIPS. Since I don't have any way to tell whether the test conditions Tehuti used were truly representative of real-world conditions and did not use a corner-case scenario that biased the results, you should dig a little deeper before coming to any firm conclusions. Nevertheless, the claims being made here are the basis for some good questions to ask when you're shopping for a 10G MAC for your next design.

While I think Tehuti is on the right track here and has a very promising product, the limited technical information I was permitted to have kept me from reaching a definitive conclusion on this part and adds a saltshaker to its Vapor Index Rating.

The Tehuti TN3014 10GbE MAC/TCP engine is packaged in a 35mm x 35mm Fineline BGA. Rated power consumption is 7 W. General availability is slated for September 2006. Despite repeated badgering, Tehuti declined to provide even rough chip-level pricing. They were more forthcoming that evaluation kits are available now in both CX-4 and XFP configurations. Both kits include both controller and adapter boards (in PCI half-length form factor) and all the niceties such as cables, BOMs, reference schematics, and data sheets. Driver support includes all major Microsoft Server OSs, Linux 2.4 / 2.6 / 64-bit, VWware ESX 3, XEN, FreeBSD, and OpenSolaris.

The dual XFP kit costs $3,499 and the dual CX4 kit costs $1,999.


Lee's Saltshaker Rating


analogZONE
(c) 2006. All rights reserved.