This in-depth Product Review is brought to you by... |
![]() |
Bay Microsystems Says . . .
Bay Microsystems Delivers Montego, Industry's First
Single Chip OC-192c/10g Network Processor And Traffic Manager
Bay Microsystems announced first customer shipment (FCS) of its
flagship programmable packet processor device Montego, which is the world's
first single chip OC192c/10G network processor and traffic manager. Part
of Bay's Internetworking Processor (InP) family, Montego raises the industry
bar by setting a new standard for packet processing performance and functional
integration.
Montego is ideally suited to scale from access and metro-edge to metro-core
and long-haul markets for carrier class products such as access concentrators;
voice, wireless and xDSL gateways; multi-service switches and routers; cable
head ends and intelligent optical (DWDM, SONET) transport equipment. Montego
enables network systems OEMs to build next generation equipment that meets
the carriers' needs for guaranteed performance levels with support for both
legacy and emerging protocols.
Village Networks (Eatontown, NJ) is one of the many companies who have selected
Bay's solution. "We evaluated ten incumbent and start up network processor
vendors for our forthcoming 10 Gbps solutions," said Jon Anderson,
Vice President of Engineering at Village Networks. "Our evaluation
looked at many factors including processing performance, packet forwarding,
traffic management flexibility, peripheral chip count, and ease of use;
in all cases Bay's Montego processor was the clear-cut winner."
"With so many companies making so many promises, a vendor with working
10Gbps silicon really stands out. Bay is the first company to deliver a
single chip network processor and traffic manager at this speed level,"
said Linley Gwennap, principal analyst, The Linley Group. "With this
impressive combination of high performance and integration, Montego is getting
lots of attention from equipment makers."
"At Bay Microsystems, we are challenging the status quo in network
processing," said Chuck Gershman, a co-founder and Sr. Vice President
of Bay Microsystems. "With our deterministic architecture that guarantees
line rate operation, we will once again legitimize the network processor
as an enabling technology." Bay co-founders Man Trinh (Chief Architect)
and Tony Chiang (V.P. of Engineering) added, "With Montego, we've developed
and implemented a technology that not only satisfies current market requirements
but also builds a foundation for a family of products that could span a
multitude of applications that this industry segment demands."
Determinism Guarantees Performance, Enables Integration
The key to Montego's performance is its deterministic pipeline architecture.
Montego achieves guaranteed sustainable packet processing of 31.25 million
packets per second regardless of traffic patterns or networks services,
while supporting data throughput of up to 16 Gbps. In order to guarantee
a line rate at minimum packet size, a deterministic processor engine is
required. Most 2.5 Gbps (OC48) class network processors available on the
market today are based on parallel, multi-threaded RISC architectures, which
are not deterministic and therefore cannot guarantee sustained line rate
performance.
The deterministic pipeline architecture also enables the functional integration
of an OC192c/10G network processor and traffic manager in a single chip
solution. Designed with a 0.18 micron CMOS process, Montego dramatically
reduces chip count and associated power for today's highest performance
carrier class applications from access to long haul.
For applications that require 'high touch' processing, Montego's programmable
AnyMapping capability supports line rate transformation and forwarding of
any legacy protocol such as Sonet, ATM, Ethernet, IPv4 and Frame Relay and
any proprietary or emerging protocols such as MPLS and IPv6.
Complete Application Development Environment
To support its customers design efforts, Bay has also introduced a robust
simulation and emulation design environment as well as a complete system
reference design. Called the Internetworking Development System (IDS), the
Java GUI-based design environment not only provides a complete device level-programming
tool, but also enables system emulation, simulation and debug. In addition,
the complete application development environment includes NEXTware (Network
Engine XTension software suite), a cycle/pipeline accurate simulator, performance
and functional analysis tools and application library modules.
analogZONE Says . . .
Architecture Shoot-Out - Bay's 10-Gbit Power Pipeline Promises to Punish Poly-Processor Packet Pushers by Delivering Deterministic Data
After staying up till 2 AM several nights last week to get the i/oZONE ready for press, I realized I could not write a fourth product review. But when I got a hold of this story about Bay Microsystems' new chip that incorporates a network processor and a traffic manager, I was sorely tempted. Bay's novel, and potentially powerful architecture provides an excellent counterpoint to the arguments made by Internet Machines' 64-CPU RISC architecture that I reviewed last week. Having these two well-conceived network processors being released so close to each other gives us a great opportunity to closely compare the chips, and the radically different architectural philosophies that they are designed around.
The first way Bay diverges from Internet Machines (IM), and for that matter, much of the NP industry, is that it chooses to not rely on the vagaries of an array of programmable RISC engines for its critical, time-bounded, packet processing functions. They say that they chose their deterministic pipeline architecture because it was the only way to guarantee a sustained line rate at 16 Gbit/s for all traffic patterns and conditions.
Bay also says that their pipeline architecture uses far fewer gates than a RISC array of similar processing power. They assert that besides the CPU taking up less real estate, the pipeline architecture eliminates need for complex arbitrator logic to juggle tasks and keep data aligned between CPUs. They say that the savings leaves them enough room on the chip to implement a full-blown traffic manager.
Bay has defined the classes of network processing tasks it performs as: Classification, Transformation, and Traffic Management. Passing packets and their associated headers through a fixed number of stages provide deterministic performance - the certainty a given task will be performed within a specific time frame. Displaying ambition that seems extreme by even Silicon Valley standards, they have incorporated five wire-speed functions on the chip - a classifier, a packet editor, a SAR, a queue manager, and a traffic manager. Now that they have some working Alpha silicon, they say that they are confident that all functions will run at wire speed - including the SAR.
Time, space, and the limits of my intellectual capacity prevent me from giving you a fully detailed account of the Bay Montego architecture, but I'll do my best to touch on a few highlights. You can refer to the simplified block diagram as we take a quick trip through this formidable chunk of packet-processing silicon.
The pipeline design breaks processing tasks up and assigns them to separate dedicated engines. Each engine has its own instruction set, and an assigned set of states that it works through while operating on a packet - Bay stresses, the execute and dwell time for a packet does not vary, regardless of the task being performed on it.
Incoming packet headers are extracted and passed on internally between engines via the control bus with no buffering. The packets themselves are only buffered once, in a bank of external SD or FC DRAM, that sits between the policy engine and forwarding engine. The traffic manager handles flow IDs and Queue Ids, while the actual traffic resides in payload buffer until the forwarding engine calls for it.
Separate external memories are used to provide CAM for the classifier, and instructions for the policy engine, forwarding engine, and traffic manager. While slightly more costly, the separate memories eliminate any possibility of bus contention between processing engines.
The resulting architecture is fast enough that it can handle equivalent of a full-duplex 10G Ethernet connection - except it is unidirectional. An interesting side note is that Bay has a design that makes their chip into a bi-directional 10 Gbit switch - inquire with them for further details.
The result of all this is that you get some extraordinary performance from a single chip design, while retaining a fair amount of flexibility - although not as much as with Internet Machines, or other RISC-based designs.
Rather than editorialize further, I'll simply pass on the following specs and features that Bay claims for its alpha silicon:
Classification
Policy Engine
Of course, you're probably wondering how you'd program a little monster like this, which uses five separate flavors of microcode. It is interesting to note that while Bay has a Superscalar architecture with multiple pipelines, they employ a single-threaded programming model that is similar to their rival, Internet Machines. An internal traffic flow manager coordinates the multiple pipelines automatically, allowing the developer to write code for the parallel pipelines as if it were just a single, fast processing element. They say that their software development tools allow you to program it like a router on a per-flow basis. This package has been working on simulations in their lab for months, and on their Alpha chips for weeks.
Another religious argument made by Bay, and other members of the pro-pipeline camp, is that multi-thread architectures can be difficult to program, and hard performance limits difficult to identify. I would argue that Internet Machines' efforts in developing cycle-accurate simulation and analysis tools have negated many of these objections. Nevertheless, I do admit that unlike Bay's clearly defined performance parameters, Internet Machines does leave the task of finding their processor's limits for specific applications as an exercise for the customer.
Taking yet another shot at the Motorolas, Cognigines, and Internet Machines of the NP universe, Bay also questions the scalability of most multi-CPU architectures. They point out that adding more processors represents an N2 complexity issue for managing the flows, while widening their pipeline is only a linear increase in complexity. I think that there are some ingenious ways for multi-ISC architectures to get around this scaling problem, but I do think that the pipelined approach is more efficient in terms of the amount of silicon required for a given task. Of course, this assumes you're willing to give up the flexibility that a fully-programmable solution offers
Quibble as I might with Bay, they do substantiate many of their performance claims with the working silicon (Alpha silicon is shipping to a customer TODAY - after having been fully tested for 3 weeks.) It's also interesting to note that as fast as it is, the chip's bottlenecks are not in the processing elements, which can actually run 42+ million packets/s. Right now, the long pole in the tent is the memory interface at the payload buffer which limits it to around 31.25 M packets/s (I suggested that they look into TriCN's memory interfaces for their next design, and they refused to comment.)
Working silicon and some reasonable proof that they are making good on many of their claims earns Bay a very low Vapor Index Rating for such an ambitious, and complex product.
Montego is sampling now in a BGA-1600 epoxy flip-chip
and will be priced at less than $1200 volume. Volume production will be
in Q3 of this year.
![]() |