networkZONE Products for the week of April 26, 2004


Stretch, Inc. Says…
Stretch's Software-Configurable Processor Embeds Programmable Logic Within the Processor To Accelerate Performance, Cut Development Time
Developers Can Use C/C++-Only Environment To Create Ripping-Fast Code For Communications Processing And Other Compute-Intensive Applications

Stretch Inc.has introduced its S5000 family of software-configurable processors -- the first to embed powerful programmable logic within an off-the-shelf processor -- and a comprehensive suite of development tools that enable developers to automatically configure and optimize the processor using only their C/C++ code. Stretch's software configurable processors can be tailored quickly and easily to address compute-intensive applications in markets as diverse as consumer, telecommunications, networking, medical and military.

Stretch's S5000 software-configurable processors combine the best of two previously divergent semiconductor worlds -- the ease of software development associated with GPPs (general-purpose processors) and DSPs, with the parallelism and flexibility of FPGAs. Stretch achieves this by embedding programmable logic entirely inside the processor architecture -- an industry first.

Every S5000 processor chip is powered by the Stretch S5 engine, which incorporates the widely accepted Tensilica Xtensa RISC processor core and the powerful Stretch Instruction Set Extension Fabric (ISEF). The ISEF is a software-configurable data-path based on proprietary programmable logic. Using the ISEF, system designers extend the processor instruction set and define the new instructions using only their C/C++ code. As a result, developers get the performance of logic with C/C++ development simplicity -- achieving unprecedented performance, easy and rapid development and significant cost savings.

The Need for a New Kind of Processor
Today, embedded system developers are forced to make painful compromises when addressing compute-intensive applications. Their choices include using banks of DSPs or GPPs, resulting in costly and difficult-to-program multiprocessor systems; selecting fixed-function chips, which do not allow them to address changing standards or differentiate their products; or mixing processors and FPGAs or ASICs, which requires the design of custom hardware, greatly increasing time-to-market and development costs. "With the introduction of the Stretch S5000 family of software-configurable processors, embedded system developers no longer need to trade-off performance, time-to-market and system costs," said Gary Banta, Stretch CEO. "Developers program and automatically configure our processors using pure C/C++, achieving unprecedented performance, easy and rapid development, tremendous cost savings, and the flexibility to address diverse markets and changing application needs."

Stretch: First Company to Embed Programmable Logic within a Processor
By embedding powerful programmable logic within a processor, Stretch has uniquely combined the best qualities of GPPs, DSPs, ASPs (application-specific processors), FPGAs and ASICs -- creating an off-the-shelf processor chip that can cost-effectively address virtually any compute-intensive application. With Stretch's new processors, embedded system designers can bypass painful trade-offs between flexibility, performance, cost and time-to-market.

Stretch's software-configurable processors and software development tools provide significant breakthroughs and advantages:

For conventional processors such as DSPs, optimization of hot spots is usually done by a programmer using low-level assembly code, which directly represents the sequence of processor operations one by one. Compilers automate this task, but only with a significant loss in performance. Further, because each operation is very simple, tens to hundreds of assembly instructions are needed to implement each hot spot.

On a Stretch S5000 processor, an entire hot spot -- expressed only in C/C++ -- is reduced to a single instruction. First, the software developer identifies hot spots using Stretch's profiling tool. Then the C/C++ source code from the hot spot is automatically compiled into an ISEF configuration, creating a single custom instruction that implements the entire hot spot. Not only is this process easy, but the performance gain can be huge: tasks that require tens to hundreds of instructions on conventional processors becomes just one instruction on the Stretch S5000.

Improved Time-to-Market
Stretch's "home-field" is the replacement of banks of processors, or processors plus FPGAs, with one S5000 processor and a simple C-compiler Design flow. Extraordinary performance is achieved without need for a complex and time-consuming multi-chip, software and hardware development process. Stretch's processor-based development model ensures that the embedded system development remains entirely in C/C++ -- slashing months from the development time.

Reduced Development and System Cost
Software-only development eliminates the high costs and problems associated with hardware/software co-development, the cost of two development teams, difficulty making bug fixes, inability to adjust to new standards and the very long development cycle.

Replacing multiple DSPs or combinations of processors and FPGAs saves chip costs, board space and development costs.

analogZONE Says . . .

The release of Stretch Inc.'s novel configurable computing architecture marks a significant milestone in the quest to bring this powerful technology into the mainstream. If they succeed in making it sufficiently easy and cost-effective to implement, it could offer communications designers a highly-attractive alternative to conventional architectures which currently implement their applications using combinations of DSP and conventional processors.

Of course we've seen reconfigurable computing come in and out of style several times over the past few years, with each successive generation of products promising to solve the difficulties involved with programming these devices and making them cost-effective for real-word applications. Yet despite their promise, the added development costs and coding complexity associated with reconfigurable products has kept them tantalizingly out of reach for most applications.

There has been some significant movement towards practical reconfigurable computing products of late, most notably with Motorola's compute fabric introduced (and reviewed here last year). Both Motorola and Stretch have gone to great lengths to create both architectures and development tools that ease the pain involved with writing code on an "alien" system, and to make hardware integration as painless as possible -- although "painless" is a relative term in any development effort. I think both products stand a good chance at finding welcome homes in different sectors of the communication market. Motorola's raw power and scalability will lend itself to large infrastructure applications, while Stretch's more general-purpose architecture makes it more suitable for a wide variety of consumer, commercial, medical, and telecom applications. But enough market comparisons. Let's get down to business and se what makes the Stretch processor so special.

This architecture inverts conventional approaches to using programmable logic for processing. In most applications, FPGAs sit outside the processor and offload the compute-intensive tasks before the CPU sees them. In Stretch's case however, the logic is actually built into the processor's data path. The Stretch processors are direct extension of the popular Tensilica Extensa architecture and have added an instruction set extension fabric (ISEF) that runs in parallel and is transparent to the processor's normal operation.


This is accomplished by use of the Extensa's reserved op codes. Instructions intended for the ISEF fall within the range of these op codes and are automatically recognized and executed by the extra hardware. All instructions work in concert with the Tensilica core, executing the same way whether they have an associated extension or not. In other words, every operation you want to run (including multiple inner loops) is JUST AN INSTRUCTION that is generated by the same C++ compiler. When you combine the processor with its compiler, your high-level code generates a custom instruction set that performs one-clock execution of wide and deep operations normally found only in DSPs, array processors and other specialized processing silicon.

I got some details on how the ISEF is constructed in an NDA briefing, and I'm impressed at how the array is able to efficiently run as a part of the Tensilica machine. Some of the things that account for its blazing performance I'm able to talk about include the fact that the ISEF's has 128-bit wide data path running coupling it to the Tensilica part, allowing for extremely wide load/store operations. It can handle arbitrary data alignments and length, which lets it make extremely efficient use of the pipeline.

As the name implies, the array's instructions can be changed during operation quickly. The 2-piece array allows one bank of logic to be configured offline via DMA (across a separate bus) from a variety of memory sources while the other is running. Switching between the two halves of the array is relatively quick (well under 1 ms), allowing you to swap between them during program execution with minimal degradation in system performance.

The ISEF breaks lots of bottlenecks by being an almost perfect bit sink, soaking up instructions and data as fast as the system can shovel it. Some of the key elements here are a wide register file that can handle up to 3x 128-bit operands at one time. The registers are coupled to a massive compute fabric that, depending on how it's configured, can support 100s to 1000s of simultaneous pipelined operations.

While I cannot reveal all the details, my discussions showed how Stretch makes much more efficient use of silicon real estate than normal FPGAs. This last advantage gives Stretch a leg up in terms of both cost and power over external programmable arrays.

The processor's memory bus is also designed for extreme performance. It is segmented to allow simultaneous transfer of high-speed, medium-speed, and low-speed traffic with the fastest pipeline moving 128 bit-wide words at 300 MHz.

The other half of Stretch's power lies in its programming tools. They have got around many of the programming complexity issues by developing a 2-branch C++ compiler. It parses tasks between the RISC processor and the ISEF by identifying and extracting "hot spots" for implementation in programmable logic. The first branch produces machine code for the RISC processor and creates new extension instructions that will be used to invoke functions on the ISEF. The second pass takes the hot spot code and synthesizes it into an EDA flow which is used to synthesize the necessarily logic within the ISEF. As mentioned earlier, all programming is done using C++, allowing Stretch to hit a sweet spot that delivers GPP or FPGA-like turn-around with the performance of a custom solution.

The resulting processor is blinding fast for many operations. Running the EEMBC benchmark at a mere 300 MHz, the Stretch even beats the Intrinsity FastMath processor (reviewed earlier here) running at 2 GHz

Of course, there is no such thing as a universal solution and the Stretch processor does have its limits. One significant area is in "low touch" operations such as network processors. While it can certainly do the relatively simple packet inspection and transformation that switch fabrics and network processors normally handle, it is really much better suited to the heavy-duty calculation- and manipulation-intensive tasks found in "high touch" applications such as video compression. For example, H.263/264 motion estimation is capable of producing very high-quality video from a relatively small bit stream, but requires lots (and lots) of raw processing horsepower. Happily, the Stretch processor is only too happy to oblige, churning out a SAD (sum-absolute difference) operation on a tile-full of pixels for H.263 video in 43 ns (H.264 takes 83 ns).

If Stretch can deliver on its promises of easy-to-use development tools, and manage to pry companies away from their old habits, I expect their chips will displace a good number of DSPs and even some RISC units in a variety of applications. These will most likely start with high-ticket, low-volume products in the military, medical, wireless base station, and pro video markets, and eventually work its way down towards consumer applications in cell phones, video cameras, and networking gear that demands high-performance security and encryption.

Product Specifications, Pricing and Availability
Stretch's off-the-shelf S5000 software-configurable processor family debuts with three members, all based on the S5 engine. The products differ only in their I/O and packaging, allowing the products to even more precisely match the needs of specific markets.

Data Sheets

 

Lee's Saltshaker Rating

   





acquisitionZONE - audio/videoZONE - greenZONE - hf/rfZONE - i/oZONE - networkZONE - powerZONE - in the ZONE
home

analogZONE
(c) 2004. All rights reserved.