networkZONE Products for the week of May 12, 2003
Xelerated Says
Proof In The Pudding - Xelerated's Data Flow Architecture
Enables Quantum Power And Price Drops For Multi-Line 10Gbit/s Ethernet
With working silicon to prove it, the X10q-e network processor
sets new cost/performance standard at $245 per 10G port
Xelerated, a fabless semiconductor company is now shipping its X10q-e
Network Processor. The X10q-e enables system vendors to build cost effective,
flexible, wire-speed solutions for the enterprise backbone and metro Ethernet
markets. The X10q reference design system running at 4 x 10 Gbits/s was
demonstrated at NetWorld+Interop 2003 in Las Vegas, April 29 - May 1, 2003.
Low cost solutions for gigabit connectivity to the edge of the enterprise network are now available which is driving the need for cost effective, multi-port GbE and 10GbE line-card solutions for the enterprise backbone and metropolitan area networks. Merchant silicon has played a leading role in driving down the cost of edge connectivity, but enterprise backbone and metropolitan area networks require a combination of wire speed performance and flexibility not offered by standard, off-the-shelf ASICs. Xelerated's new patent pending data flow architecture allows the X10q-e to meet these requirements while maintaining the strict cost and power requirements of the enterprise.
"With the recent set of new product announcements it is clear that the multi-gigabit Ethernet wave is starting," says Michael Howard, principal analyst at Infonetics Research Inc. "The availability of innovative technology like the X10q-e will further fuel that wave, enabling more cost effective solutions for multi-gigabit enterprise backbone and metro Ethernet systems."
Until now, programmable architectures have been burdened either by low performance, or high power dissipation, making them unsuitable for enterprise backbone equipment, leaving expensive custom ASICs as the only development choice. The X10q-e offers a unique combination of flexibility and efficiency that is particularly well suited for this market.
"The speed with which Xelerated has moved from the SONET to the Ethernet market is a testament to the flexibility of their architecture," says Linley Gwennap, principal analyst at the Linley Group. "And at less than 6.5 watts per 40Gbits/s they have set a new mark for network-processor efficiency that traditional programmable architectures will be hard pressed to match."
"The lower packet rates of Ethernet relative to SONET have allowed us to scale back processing power and improve bandwidth efficiency relative to our SONET offerings - while still maintaining 40Gbit/s wire speed performance," says Gary Lidington, VP of marketing at Xelerated. "The benefit we bring to our customers is that we have helped to level the playing field. Now they can build cost competitive, yet differentiated products for the enterprise - without spending tens of millions of dollars to develop all their own ASICs."
Xelerated's X10q range now includes two new offerings: the X10q-e and the X10q-m, each priced based on packet processing rate. The original 4 x 10Gbit/s SONET offering has been re-named the X10q-w. The original 2x10 and 1 x 10Gbit/s SONET offerings will be phased out over time.
As a part of these new offerings, the X10 will also support a new advanced search capability solution that leverages standard DRAM technology. This will reduce the cost of supporting large tables without stealing valuable processing cycles from the processor cores.
First silicon for the X10 was received in early January and customer demonstrations have been taking place since early February. "The response from our customer base has been very positive, and we are now working in a number of customer projects," says Johan Börje, CEO of Xelerated.
Enabled by the availability of .13 micron process technology, the X10 architecture is the first known commercial implementation of a data flow processor. This unique architecture is optimized for the efficient movement of data, eliminating the need for complex inter-processor interconnects as well as redundant data and instruction storage. This allows 200 data flow processors and ten I/O processors to fit on a single, low-cost chip providing unparalleled cost/performance. The 200 data flow processors are organized in a synchronous pipeline with each processor executing a single instruction. This makes them appear to the programmer as a single 80 BOPs processor with a fully deterministic execution time. The benefit over conventional multi-RISC based architectures is a much simpler programming model and guaranteed wire speed performance.
"We are very impressed with the X10q-e," says John G. Metz
of Metz International in Harvard, MA. "Xelerated has taken race bred
technology and made it practical enough for use on the street - no one we
know can match their level of programmability, performance, power and price."
analogZONE Says . . .
Before I say all the nice things about Xelerated, let's get a couple of less palatable items out of the way. First, while this appears to be a new product announcement it is really only a release of a slower speed grade of their original X10q-w SONET-grade processor. Now, some cynics might conclude that this tactic is one way Xelerated is coping with yield problems with its design by offering "reject" chips that run much slower at a much lower price. And they might be right. Be this as it may, the price/performance/power ratio that this device provides at any of its speed grades is impressive, and a real validation of the unique data flow architecture used to build it.
The other thorny issue is that Xelerated is using "Cisco Math" (a close cousin to Bistro Math) which double-counts a switch's input and output capacity, allowing their 20-Gbit/s (full-duplex) processor to be advertised as a 40-Gbit/s machine. So don't get confused when you see me referring to a 20 Gbit/s connection and the manufacturer referring to it as 40 Gbit/s: it's the same thing.
With those prickly little points behind us, lets take a look at the device itself, and the new application Xelerated envisions for it. When I reviewed the X10q nearly a year ago it was intended for supporting SONET-framed IP in metro and high-capacity access applications, something that takes considerably more of processing power than Ethernet-based IP. This is less due to the framing structure and more due to the shorter (40 bytes vs. 64) more closely-spaced packets that SONET can generate under certain conditions.
The resulting lower clock speed translates into a drop in power consumption to 6.5 W in their 20 Gbit/s (full-duplex) processor. This compares nicely to Intel's IXP 2800 chip which draws about 30 W when cranked up fast enough to terminate a 10-Gbit/s half-duplex connection.
Their newer, slower, chip is targeted at the enterprise market where most, or all, of the traffic will be Ethernet which will require less processing power for an equivalent bit rate stream. It's unclear whether a stockpile of slow chips, or a recognition of a new market drove their decision to offer the 10Xq-e, but in either case, the move was well-timed as other semiconductor maker's push to get Gigabit on desktops will almost certainly cause a surge in demand for something to populate the blades of enterprise switches that can handle the onslaught flowing from the backbone.
With commodity silicon creating a "race to the bottom" in work group products, Xelerated is trying to bring its technology to bear on the enterprise chassis-level switch market where lots of layer 2-4 processing power is needed for the high levels of Ethernet switching going on there, to displace the ASICs that have "owned" the high and low end of the market by offering a significantly more efficient solution than a conventional RISC-based NP in terms of both die area and power, i.e. cost.
They also feel that, especially at the high end, their programmable processor can provide the market differentiation capabilities at a good price and with much quicker time-to-market than an ASIC-based solution. The programmability will allow designers to add features at will, including support for legacy protocols, advanced box management capabilities that reduce operating labor expenses, support for much larger tables, as well as proprietary trunking protocols, and support for IPv6 & MPLS.
At this point, it might be wise to look at the chip's architecture, and why I'm reasonably confident it can meet the extravagant claims being made for it. Much of this is thanks to Xelerated's unique data flow architecture which was mostly an academic curiosity until now. I spent a fair amount of time explaining some of its finer points in my earlier review but will touch on some of the highlights here.
For one thing, most (if not all) layer 2-4 processing tasks are very predictable and routine, and involve relatively short sequences - something that makes them ideal for simple-minded data flow machines. Their 200-stage pipelined architecture uses a string of very simple processors capable of executing only a single instruction at each stage. As a result the design only puts transistors and logic where it is needed, instead of RISC or CISC cores which must include logic that might or might not be used for a given application. Also, data parallelism is high and the ratio of instructions to data is low in layer 2-3. In other words, it's optimized for moving data rather than running complex programs.
It's also helpful to note that the layered nature of a networking packet lends itself nicely to the sequential processing (and highly deterministic) scheme used in data flow machines. If this was random RISC architectures, which can cheerfully branch and loop at will, would have an advantage. Xelerated wisely gave up on the idea they had to do everything and concentrated on doing a select group of well-defined tasks in layers 2-4. This leaves the layer 4-7 tasks which tend to be more varied to the more flexible RISC type architectures.
The upshot of all of this is that a dataflow a data flow machine trades off flexibility for efficiency and deterministic behavior. While it can't do all the functions of a RISC machine, a data flow processor can do a large portion of them with far fewer gates (less power and real estate), and at a slower clock rate (less power again). This allows the 2X- OC-192 Machine to be very competitive in terms of price & power, even against OC-48 products.
Of course, development tools and interoperability area big concern when any radical architecture is introduced. The X10's SPI 4.2 interfaces take care of some of these issues by allowing it a glueless interface with many (not all) standard framers, traffic managers, and security chips. Xelerated is working with several major chip makers which I cannot reveal at this time on reference designs and other cooperative efforts that will tie their processor more closely into whole-system solutions.
As far as development tools go Xelerated says that the processor's layer 2-4 focus significantly limits the scope of what's needed. The fact that there is little in the way of big protocol stacks at this level allows more self-written software, and lessens the need for 3rd party support. The 200 processors inside the chip execute 1 instruction per CPU, allowing a developer to program the processor by writing a straightforward sequential list of 200 instructions. No fat, no waste, very efficient.
For this reason, Xelerated did not supply a high level compiler and instead concentrated on good coding tools. Third-party IDE (development environment) from a "name brand" supplier uses graphical environment to give programmers a friendly, efficient environment to work with. Xelerated says that their synchronous architecture makes debug relatively simple and quick, but I don't have enough insight here to agree or disagree.
By not trying to do everything, they have simplified their design and limited their applications to well-defined areas - fortunately for them, it's an area where there will be lots of sockets. The opportunities for design-ins should be significant as the industry picks itself up and tries to ready itself for the long-delayed Gigabit-to-the-Desktop revolution (scheduled for some time before Hades freezes over.) I expect them to be looking for a home as forwarding engine on CO-48 through OC-768 line cards where their serious low power per BOP or Gbit/s (4X-10X better than competition, close to that of ASICs) will allow them to achieve high density on power-limited blades.
The new chip just scales down clock speed from a SONET processor to accommodate lower Ethernet packet speeds - moves from 200 MHz and 180 MHz down to 120 MHz. There are no changes in gate count or architecture - which may allow Xelerated to use chips that would have been binned out before.
Xelerated's chip gets a record low Vapor Index Rating for a chip of this complexity because its faster counterpart is back from the fab and working since late last year. I'd even take another half-saltshaker off of this, except I have not talked to any of Xelerated's customers yet and have some nagging doubts about programming and development efforts.
Pricing for the 10Xq-e runs as low as $490 today. Once the market volume ramps up, we can expect these slower chips to go into plastic packages for added savings.
|