networkZONE Products for the week of August 29, 2005
Cavium Networks Says
Cavium's Multi-Core Processors Integrate Control And
Data Plane Processing At Multi-Gigabit Speeds For Networking & Storage
Apps
Octeon EXP offers up to 16x MIPS64-based cores in a single
chip to reduce cost and increase performance and functionality for converged
Control and Data-plane applications
Cavium Networks has announced the OCTEON Multi-core MIPS64-based EXP Processor Family that enables the integration of control and data-plane processing in next generation service provider, enterprise and storage networking equipment. The OCTEONEXP Multi-core MIPS64-based processor family addresses the need of network equipment vendors merging control and data plane processing with richer functionality at unprecedented multi-gigabit rates using a standard C/C++ software-programming model. OCTEON EXP processor family is being used by networking and storage equipment vendors building Routers, Intelligent Switches, Multi-service Access Equipment, Storage servers, Multi-protocol Storage Switches, Boarder Session Gateways and other Wireless Infrastructure Equipment.
Traditionally, control plane and data plane processing architecture has been fragmented across performance ranges requiring disparate software architectures and multiple software development efforts. In low-end routers, both control and data plane functions are implemented in a single CPU, called a communication processor. In mid-range to high-end modular routers, multi-service access equipment and wireless base station equipment, the control and data plane processing is done in multiple chips consisting of a general purpose processor, micro-code based network processor and/or fixed function ASICs. The new OCTEON EXP processors enable integrated control and data plane functionality to scale from multi-100 Mbit/s rates to multi-Gigabit rates.
"As OEMs are adding voice and video functions to both enterprise and infrastructure equipment, the greater routing, provisioning, and quality-of-service requirements are driving a convergence of control and data plane processing," commented Linley Gwennap, principal analyst at The Linley Group. "OCTEON's multiple general-purpose processors and extensive data-plane acceleration support this convergence at multi-gigabit speeds while improving time to market with a simple programming model."
OCTEON EXP Multi-core MIPS Processor for Control and Data Plane Applications
For high-performance demands of control plane applications, the OCTEON EXP
offers an unparalleled 9.6GHz and 19.2 Billion instructions/s of general-purpose
processing available across 16, dual-issue, memory coherent, MIPS64 Release
2 based cores. Each 600MHz MIPS64 core in OCTEON is built from the ground-up
with additional instructions for packet acceleration and a 32K I-cache,
8K D-cache and 2KB write-back buffer.
For high-performance network throughput, OCTEON EXP integrates dedicated packet processors for layer 2 - layer 4 parsing, error checking, tagging and memory allocation. Additionally, OCTEON EXP has three (3) high-performance, on-chip memory controllers. The first memory interface supports 144bit wide, ECC-protected DDR II DRAM up to 800Mbits/s data-rate, with capacity of up to 16Gbits/s and bandwidth greater than 100Gbits/sec. Two additional memory interfaces support 18bit wide low-latency RLDRAM2 / FCRAM2, with low latency access and capacity of up to 1GB. This same memory interfaces can be used to connect a TCAM(s) for offloading lookups to an external hardware device. For higher layer data-plane processing, OCTEON has dedicated hardware for TCP acceleration and flow management to scale performance across multiple cores.
To reduce BOM cost, OCTEON EXP has integrated multiple standard external networking interfaces with 4 to 8 Gigabit Ethernet ports (RGMII) or dual (2x) SPI-4.2 interfaces along with a host/slave PCI-X 64bi,t 133MHz interface that can be used as both a data and control interface. OCTEON EXP also offers auxiliary interfaces such as GPIO, Flash, MDIO, dual UARTs and 2wire serial interfaces.
"The OCTEON EXP is built using proven technology first delivered to market with the highly successful Cavium Networks OCTEON NSP family," said Rajiv Khemani, Vice President of Marketing, Cavium Networks. "With OCTEON EXP, we have optimized the product specifically for integrated control and data applications by reducing power, delivering solutions for TCAM connectivity and critical 3rd -party control plane software. Furthermore, OCTEON EXP is available worldwide without any security-related export control restrictions".
Scalable Family and Standard Software Programming Model
The OCTEON EXP family offers a complete software and footprint compatible,
scalable architecture that scales from 4 to 16 MIPS64-based cores on a single
chip with an array of integrated networking interfaces, memory controllers
and co-processors to enable 2Gbps to 10Gbps of application performance at
under 10Watts to 25Watts . Standard software programming model includes
C/C++, MIPS64 and MIPS32 compatibility, Linux operating system support,
GNU tool-chain and development environment along with support for third
party commercial operating systems and tools for porting proprietary operating
systems.
analogZONE Says . . .
I've always been impressed by Cavium, both by the very powerful chips they cobble together and the fact that they (almost always) deliver on their performance and availability claims. That's one of the reasons I was so eager to review their new Octeon EXP family, a radical variant of the original NSP series of network services processors I reviewed around this time last year (2004). Like its predecessors it employs a cluster of highly-customized MIPS64 RISC cores tightly connected to a set of specialized logic elements to do wicked-fast, wicked-smart packet inspection and processing, but it's targeted at a very different set of applications. Whereas the earlier NSP series was designed to provide "one-stop inspection" for Layer-3-7 threats including viruses, spyware, intrusion protection, firewall, and VPN/IPsec/SSL support, the EXP series' architecture was developed to provide a single chip that performs both control and data plane processing at much higher speeds than traditionally thought possible.
By offering a one-chip multi-Gigabit solution for both control and data plane functions in so-called "high-touch" applications, they hope to lower the BOM costs of routers, smart (L4+) switches, multi-service access equipment border and multi-protocol storage switches. And, as we'll see in a bit, they also think the chip's power and flexibility makes it a good candidate for use in more specialized IP-oriented equipment like session gateways (for wireless infrastructure) where multimedia services require QoS and priority awareness. Equally important their use of an industry-standard instruction set allows it to be programmed using many industry-standard tools to keep development cost and schedule to a minimum.
While using a single chip to process both tasks is not new, this approach has usually been limited to much lower-bandwidth (sub-100 Mbit/s) processors used in SoHo, SME and edge applications. Until now most designs running much above the 100 Mbit/s level have split the data plane traffic off for processing in a specialized network processor or ASIC and passed the control plane functions off-chip to a conventional general-purpose processor. Cavium has gone a different route that allows any of its 64-bit embedded MIPS RISC cores (up to 16) to be assigned at will to either data plane or control plane tasks.
Needless to say the Octeon EXP packs lots of "crunch," but many data plane-only network processors can also boast the same. The difference here is that Cavium has added lots of architectural features which enhance both its packet processing and control plane abilities, and enable the two functions to occur smoothly within the same array of processors. On the data plane side the chip's already-voracious packet processing capacity is further enhanced by the specialized packet processing functions they've added to the to the standard MIPS instruction set. In addition to the standard MIPS Release-2 instruction extensions (such as bit field insert/extract, byte swapping, and 8/16 bit sign extension) they've added several custom packet analysis and manipulation instructions usually found only in network processors or other dedicated packet processing engines.
With all that horsepower surging through its innards,
Octeon's architecture does not leave the task of keeping all those processors
working in close harmony to chance. Rather than rely on software to ensure
the packet streams being processed in the multi-core chip emerge on-time
and in the right order, the Octeon uses a dedicated schedule/synchronizer/ordering
block (see Fig. 1) . It works in conjunction with the packet input processor
block to steer data to the proper CPU using a "work tag" that's
appended onto the incoming packet. This tagging scheme allows packets to
be assigned to a particular processor (or group of processors) while maintaining
"atomic ordering" without a lot of software overhead. When it's
set up correctly it lets the MIPS cores run full-bore without stalling on
empty or clogged pipelines. On the back end, a packet output processor uses
the same queuing tags to take the packets from the MIPS cores and re-order
them in the proper sequence and pass them to the proper port -- all without
burdening the processor cores.
The Octeon also has several specialized hardware accelerator cores hanging off its high-speed I/O bus. Of particular note is a compress/decompress engine that runs the GZIP protocol allowing the processor to inspect the payload of compressed application-layer data streams. Depending on the version of the GZIP algorithm being used, the core can handle up to a 4:1 compression ratio. Running full tilt it can process at 4 Gbit/s at its input while producing 8 - 10 Gbit/s worth of decompressed data at its output.
Some critics will rightly point out that using general-purpose processors (even with added custom instructions) may not be as efficient at data plane packet processing as a dedicated network processor. But while I cannot verify it without running a benchmark test, Cavium's claims that they have significantly narrowed the performance seems quite believable. I'd also agree with them that any remaining disadvantage is far outweighed by the huge assortment of development tools and code libraries available for the MIPS core that can dramatically slash development time and cost. Beyond the commercial tools, Octeon's creators have also done their part on the software front by seeing to it that all of their MIPS cores' standard and custom instructions are integrated into the complete set of GNU development and debug tools that come with the chip.
Cavium's tool set also allows you to program the EXP in several different ways to take full advantage of its multi-processor architecture. In the symmetric multi-processor (SMP) mode you write your application for a single Linux kernel running across all 16 MIPS cores. When running in SMP all processors share the same address and data space and refer back to a single central scheduler work assignment. SMP also supports a special symmetric mode called "processor affinity" in which a particular process or group of tasks is dedicated to a specific core, or group of cores.
The tool set also supports an "asymmetric" mode where each core has its own kernel and is independent from the other processors. In this mode the chip can support either a "run to completion" (RTC) or a "pipeline processing" configuration. In RTC mode a group of packets is inspected, parsed, and operated on at all levels before passing it on. In the pipeline configuration each processor only handles a certain layer, or group, of functions. In both cases separate set of cores is dedicated to running control plane processes and handling any exceptional packets that the data plane processors "punt" over to them. According to Cavium, its early customers tend to prefer developing in RTC mode, most likely because it's easier to program and makes migrating "legacy" software from earlier designs in low-end equipment relatively straightforward.
This unified approach to data plane and control plane processing is promising both because of the potential cost and power savings, and the fact that many emerging multimedia applications require tighter coupling between control and data streams. We're also seeing increasing performance requirements drive many functions previously handled in the control plane over to the data plane where they can be handled more quickly. This is most apparent in areas like application-layer routing, and application-layer provisioning where the data plane executes bandwidth allocation based on tables maintained in the control plane using wire-speed packet inspection. Cavium says there is also significant interest in using their converged processing architecture in multi-service access gateways that make Layer-3+ decisions at line speed for converged media with varying priorities and QoS requirements.
To be fair there have been several other honest attempts at integrating control and data plane functions in multi-Gigabit products. One excellent example is Agere's PayloadPlus processor family (An early family member, the NP5, was reviewed here in 2002) which supports some data plane functions with hardware cores. But, unlike the Cavium chip, it really only supports control plane operations through L3 at reasonable rates and an external processor required for extensive processing higher layers. In its defense the more specialized architecture of Agere's NPx series excels at internetworking applications; but that's another story for another time...
About the only direct competition I am aware of is Seaway's innovative content processor which won one of analogZONE's Product of the Year Awards back in 2003. While it's intended to handle similar inspection, processing and control plane tasks all the way to Layer 7, its older design and specialized architecture seem to make it more costly and more difficult to program than Cavium's newer chip. But given the fact that Freescale just acquired Seaway we can expect their content processing technology to be incorporated into their venerable Power QUICC processor families. This could change things radically as these new computing elements are supported by the full ecosystem of development tools currently available to the rest of Freescale's processor family. In fact, in an interview this week with Freescale indicated that they intend to begin embed elements of Seaway's content processing technology as cores with in the next generation of PowerQUICCs "some time next year." My guess is that we'll see a content-aware QUICC some time in late Q2 or early Q3 of 2006, something that will help position Freescale to compete in the race to own the L4-7 market, but still substantially behind Cavium.
Of course no single architecture can solve all problems and as I noted earlier there will always be better solutions for specific tasks. But Cavium seems to hit an architectural "sweet spot" that strikes a good balance between the efficiency of specialized engines and the flexibility offered by widely-supported general-purpose processors. In doing so they have created a chip that efficiently supports L4-7 processing at much lower bucks-per-Gigabit than previously possible.
Cavium's big challenge in maintaining its lead will be to make sure that their development tools and reference code collection are up to the task of making such a complex, powerful chip relatively easy to program. I suspect that it will be some time before the tools are mature enough to allow all of the features and modes of the processor to be fully exploited, but developers will at least have a solid base to build on because the EXP will re-use much of the work already invested for their earlier NSP series.
There are 5 different parts sampling in the Octeon EXP family. The CN38xx family offers 4, 8, 12 or 16 MIPS64-based cores in a footprint compatible package. Pricing for the family ranges from $350 to $650 in 10-k piece lots.
|
| ||||