networkZONE Products for the week of September 13, 2004
picoChip Says
Infinite Monkeys Get New Typewriters: picoChip's 20-GIPS
Multi-DSP Array Chip Gets Reference Designs For HSDPA Picocell and WiMAX
Basestation
picoChip announced the availability of complete software implementations
for HSDPA picocells and 802.16 (WiMAX) basestations, the PC8218 and PC8520
respectively. Both "off the shelf" systems run on the company's
PC102 picoArray, the world's most powerful DSP. The PC8520 and PC8218 will
speed time to market for OEMs with robust, "carrier class", highly-integrated
solutions that significantly reduce the system BOM.
Doug Pulley, co-Founder and CTO at picoChip, said: "Many OEMs are cautious about trusting reference designs, but we have demonstrated the capabilities and performance of our solutions. We believe these are leading solutions for 802.16 and HSDPA. Our partners are benefiting from the improved performance, higher integration, lower BoM and time to market savings that they enable; what is more, they can easily upgrade as standards evolve."
These WiMAX and HSPDA products were market driven, and were designed
together with industry leaders. Both the PC8520 and PC8218 are being designed
into commercial basestations, based on the PC102, which is currently shipping
in production quantities.
Carolyn Gabriel, Research Director at Rethink Research Associates, added,
"In these times there is a real necessity for speedy entry to market
combined with product differentiation to succeed. picoChip have made this
goal readily available to OEMs working with 802.16 and HSDPA. This is the
first time anyone can easily harness the full power of the picoArray with
fully characterised and interoperable solutions. Both markets value flexibility
and upgradeability - negating the fear of obsolescence will be very attractive
to operators. picoChip's growing customer base is evidence that manufacturers
are paying attention"
The picoChip PC8520 WiMAX basestation solution provides a software-defined implementation of the 802.16-2004 PHY, for the 256OFDM mode with "carrier-class" reliability and performance. The PC8520 enables basestations to be WiMAX-certified. The reference design supports includes subchannelization and antenna diversity.This is a flexible solution; the software upgrade to 802.16e for mobility, including scalable PHY and advanced FEC, will be available next year.
The PC8218 HSDPA picocell replaces the company's previous reference design by integrating the complex Iub interface, significantly reducing system BOM and simplifying integration. HSDPA increases the speed of WCDMA basestation up to 14.4Mbps and is increasingly seen as a critical requirement by operators. The PC8218 implements all baseband processing for WCDMA FDD 500m 32 user picocell for voice, DCH data or HSDPA services.
Delivering 200GIPS and 40GMACs the PC102 picoArray device is the highest performance DSP on the market. In a typical basestation, a single PC102 replaces a number of conventional DSPs and FPGAs, yet is easy to programme in assembler or standard ANSI C. The comprehensive toolchain allows applications to be easily and efficiently programmed, simulated, verified and tested in a single environment -- significantly reducing development time and cost.
Time to market is further accelerated through the use of picoChip's extensive
software libraries, which provide a speedy, cost-competitive and low risk
route to market for fully compliant wireless infrastructure. The system
is also well suited to any other advanced wireless technology, including
4G research and software defined radio (SDR or JTRS), smart antennas and
MIMO.
analogZONE Says . . .
Since picoChip somehow neglected to brief me on their massive configurable DSP array when it was originally announced late last year, I'm taking advantage of the release of their new wireless reference designs to do a little catch up work. The arrival of a pair of complete reference designs and evaluation kits that support a 3G HSDPA (high-speed downlink packet access) picocell and an 802.16 WiMAX base station demonstrate the power and versatility of the chip. Perhaps more important, these "plug-and-play" solutions should help overcome at least some of the formidable obstacles that an upstart technology must overcome in order to gain a critical chunk of market share. But before we get into any more evaluation of these new platforms, perhaps we should take a closer look at the chip itself.
In a manner much like the mythical room containing an infinite number of monkeys at typewriters, the PC102 can be used to assemble an almost infinite array of DSPs. The chip itself is an array of 300 fully programmable LIW DSPs on a single honking-large chip. In truth, there are actually more processors than this, but the device has some redundancy that allows it to survive at least one defect and still be 100% functional. Even as a standalone device it delivers up to 200 GIPS, but it's been designed to be ganged into multi-chip arrays with minimal scaling loss. The largest array they've assembled so far is a 16-chip cluster that puts 5000 C5x-class processors at your beck and call, but the theoretical limit is much bigger than that. And although the apps engineers are still having difficulty developing an algorithm to produce Shakespearean manuscripts, it does seem to have the raw processing power to do darned near anything else.
Drilling down another layer to look at the DSPs themselves, we find that they are a 16-bit Harvard Architecture LIW 160 MHz machine, each with its own local memory. The cores are very similar (in terms of processing power and instruction set) to the sort of DSP you'd find in a cell phone (ADI 2181 or TI C5x machine). Unlike the relatively simple state machine-based elements found in some compute arrays, each processor can actually run a complex application, language, or OS (such as C or Linux) on its own. Things get even more interesting because the cores come in three different "flavors". The "Standard" core which comprises the bulk of the processors has a small memory but is equipped with an additional set of CDMA-oriented instructions to handle most of the compute-intensive tasks. The array also is sprinkled with a scattering of "Mem" processors that handle local control functions using the generic instruction set and a larger memory. Finally, there are a handful of "Control" processors which have a standard instruction set and a very large (64 k) memory that handle system-level control tasks.
There is also some specialized silicon tucked aboard to offload many of the most common algorithmically-intensive tasks. This includes four- and eight-bit chip rate processors (for spread/de-spread functions), a matched rate convolutional processor (for CDMA filtering functions), Viterbi co-processor, and 14 turbo code processors that can be run separately, cascaded or run in parallel.
But even the tastiest ingredients don't make a cake unless they are mixed and baked properly. Multi-processor arrays are not new -- we've seen several generations of them in both signal and network processing, but they've had varying degrees of success at getting the processors to work together in an efficient manner. Many of the big challenges in a multiprocessor design are concerned with ensuring that the CPUs have access to the data, I/O, and memory resources they need without getting in each other's way. But silicon is only half the story -- you also need to have development tools which allow you to program the device efficiently without having to constantly worry about the details of the architecture. From a first glance, it would appear that picoChip has succeeded on both counts here.
PicoChip's
interconnect scheme is unusual because it deviates from the usual fixed
bus architecture and actually allows the developer to configure the chip
by building connections between related processors -- much the way an FPGA
builds up functions by linking logic elements. An FPGA-like crossbar interconnect
layer uses 32-bit wide bus segments and a TDM switching scheme to provide
any-to-any connection between computing elements (see Figure 1). The chip's
control registers allow it to store a sequence of up to 1024 connect patterns
that can be changed with each cycle of the 160-MHz system clock.
Programming the PC102 is equally unique. It avoids problems with many parallel architectures by providing an efficient abstraction layer that removes many of the issues that made generating code for multi-processors bulky, labor-intensive and often inefficient. They've adopted techniques similar to programming FPGAs that allows the engineer to create either a graphic block diagram, or more usually a text (HDL) description of the functions (ie viterbi decode, FFT, FIR, turbo decode Reed Solomon, etc) they want, and the desired connections between the functions. The two-stage compiler first performs an automatic place-and-route function that defines functional blocks and optimizes their location in relation to other functions to create a "floorplan" before the interconnect scheme is generated. If some functions are really complex or need faster execution, they can be distributed across several nearby DSPs.
The connections to the DSPs and associated hardware accelerators are then assigned, and the functional code for each processor generated. As I mentioned earlier, the inter-processor connections are controlled by a 1024-stage state machine which allows you to re-configure the chip once each clock cycle. A processor even has some limited run-time control over the data path it uses: as long as the requisite paths are reserved when the application is initially compiled.
As with any crossbar architecture trying to provide such connections, there is some potential for blockage. PicoChip says their compiler minimizes this by making sure that tightly-related compute elements are placed close together to reduce the potential for blocking. They claim that most service models indicate that you can achieve an average of 90% resource utilization.
If it's really as practical to program as they claim, picoChip's architecture can offer some significant advantages in terms of efficiency. This is because each processor and the task that runs on it is "orthogonal" to the rest of the array -- ie it is well defined and there is no interaction with other processors other than via messages passed along the data paths. Besides extracting the most processing horsepower, I think that that this compartmentalization allows for more deterministic timing of operations than running the whole task on a single large DSP where tasks interact in difficult to define ways. It's almost as good as having a state machine, but more flexible.
But don't sell your Texas Instruments stock just yet. The PC102 is a tightly-targeted product and only challenges general-purpose DSPs in a narrow set of applications -- at least for the moment. The current product has been optimised for high performance, complex DSP-oriented operations on relatively "slow" (10-100 Mbit/s) data streams like those found in the air interface section of wireless base stations and access points. In these scenarios, I am reasonably confident in picoChip's claims of delivering 10x the MIPs/$ of a Xilinx Virtex II or an ADI Tiger SHARC.
PicoChip's new reference designs put their chip's strengths to good use by providing platforms that support the WCDMA (European/Japanese/ATT) 3G system and the HSDPA (high-speed downlink packet access) that is anointed to be WCDMA's successor in most civilized parts of the world. The access point reference design is intended for smaller units, often used in hotels, airports, and other locations requiring pinpoint coverage. It can support up to 32 channels with any combination of either WCDMA or HAPDA. A reference design for larger full-up base stations of up to 128 will also be supported using a slightly different software load that's expected some time early next year. I think they were smart to introduce the design for the smaller unit first because of the higher unit volumes I expect they will have.
The other reference design is a WiMax 802.16 base station. It's surprisingly similar to the cellular base station, running on the same SDR development board. It delivers 35 Mbit/s (Full duplex) worth of 802.16d (fixed use) traffic across tens of kilometers to fixed locations: ideal for delivery of "wireless DSL" services. PicoChip anticipates it will offer a software upgrade to provide mobility/roaming functionality as soon as the 802.16e spec that supports these services is finalized.
Another ingenious application of FPGA technology -- and another example of how FPGAs and configurable ASSPs are displacing ASICs in many markets.
While the PC102 is not a general-purpose chip, I do expect it could nibble away at other applications that have traditionally been owned by ADI, Agere, TI, and the other traditional DSP architectures. These include niche markets in image, video, and perhaps even radar processing, but I expect that it could find some excellent high volume opportunities in next-generation DSL equipment.
Despite its significant technical merits picoChip will face several substantial hurdles to overcome before gaining any market share. I've seen quite a few excellent multi-processor designs struggle to gain market acceptance in both the network processor and signal processor arenas and, more often than not, they've failed. Some of this was due to the huge user base enjoyed by major manufacturers that reduced development risks and provided a large base of skilled programmers and vendors to support a design effort. They have already addressed at least a good chunk of these barriers by offering easy to use programming tools with familiar user interfaces and a series of reference designs for products in key markets.
PicoChip must, however, still face the cultural inertia that makes most level-headed engineers suspicious of a "revolutionary" products with an unconventional architecture. Even an industry giant like Freescale is facing resistance in gaining acceptance for its reconfigurable Compute Fabric (a recipient of our 2003 Product of the Year award) and has had a long, hard pull making designers and their managers comfortable enough to commit to using it in new designs.
Happily, the raw power and ease of programming that the PC102 makes available has won it some major players in the 3G base station market (none of whom want to be cited publicly yet) as early adopters. PicoChip's credibility is further enhanced by the fact that the chip is shipping in significant volumes to some of these manufacturers as they begin to ramp up shipments to support the global transition to 3G. If the tool set is as easy and efficient to use as picoChip claims (I didn't have a chance to actually use it myself), this momentum from a small, but respected, user base could help propel this architecture into the mainstream.
The PC102 processor array is available now in volumes. Pricing is expected to be comparable to a high-end C6x-class DSP.
Datasheets
(PC102/reference designs)
|
| ||||