SECTION II
SYSTEM LEVEL DESIGN
3
Tools and Methodologies for System-Level Design
Shuvra Bhattacharyya
University of Maryland College Park, Maryland
Wayne Wolf
Princeton UniversityPrinceton, New Jersey
3.1 Introduction
3.2 Characteristics of Video Applications
3.3 Other Application Domains
3.4 Platform Characteristics
Custom System-on-Chip Architectures • Platform Field-Programmable Gate Arrays
3.5 Models of Computation and Tools for Model-Based Design
Dataflow Models • Dataflow Modeling for Video Processing • Control Flow • Ptolemy • Compaan • CoWare • Cocentric System Studio • Handel-C • Simulink • Prospects for Future Development of Tools
3.6 Simulation
3.7 Hardware/Software Cosynthesis
3.8 Summary
3.1 Introduction
System-level design has long been the province of board designers, but levels of integration have increased to the point that chip designers must concern themselves about system-level design issues. Because chip design is a less forgiving design medium — design cycles are longer and mistakes are harder to correct — system-on-chip (SoC) designers need a more extensive tool suite than may be used by board designers.
System-level design is less amenable to synthesis than are logic or physical design. As a result, system-level tools concentrate on modeling, simulation, design space exploration, and design verification. The goal of modeling is to correctly capture the system’s operational semantics, which helps with both implementation and verification. The study of models of computation provides a framework for the description of digital systems. Not only do we need to understand a particular style of computation such as dataflow, but we also need to understand how different models of communication can reliably communicate with each other. Design space exploration tools, such as hardware/software codesign, develop candidate designs to understand trade-offs. Simulation can be used not only to verify functional correctness but also to supply performance and power/energy information for design analysis.
We will use video applications as examples in this chapter. Video is a leading-edge application that illustrates many important aspects of system-level design. Although some of this information is clearly specific to video, many of the lessons translate to other domains.
The next two sections briefly introduce video applications and some SoC architecture that may be the targets of system-level design tools. We will then study models of computation and languages for system-level modeling. Following this, we will survey simulation technique. We will close with a discussion of hardware/software codesign.
3.2 Characteristics of Video Applications
The primary use of SoCs for multimedia today is for video encoding — both compression and decompression. In this section, we review the basic characteristics of video compression algorithms and the implications for video SoC design.
Video compression standards enable video devices to inter-operate. The two major lines of video compression standards are MPEG and H.26x. The MPEG standards concentrate on broadcast applications, which allow for a more expensive compressor on the transmitter side in exchange for a simpler receiver. The H.26x standards were developed with videoconferencing in mind, in which both sides must encode and decode. The advanced video codec (AVC) standard, also known as H.264, was formed by the confluence of the H.26x and MPEG efforts.
Modern video compression systems combine lossy and lossless encoding methods to reduce the size of a video stream. Lossy methods throw away information as a result of which the uncompressed video stream is not a perfect reconstruction of the original; lossless methods do allow the information provided to them to be perfectly reconstructed. Most modern standards use three major mechanisms:
• The discrete cosine transform (DCT) together with quantization
• Motion estimation and compensation
• Huffman-style encoding
The first two are lossy while the third is lossless. These three methods leverage different aspects of the video stream’s characteristics to encode it more efficiently.
The combination of DCT and quantization was originally developed for still images and is used in video to compress single frames. The DCT is a frequency transform that turns a set of pixels into a set of coefficients for the spatial frequencies that form the components of the image represented by the pixels. The DCT is preferred over other transforms because a two-dimensional (2D) DCT can be computed using two one-dimemsional (1D) DCTs, making it more efficient. In most standards, the DCT is performed on an 8 × 8 block of pixels. The DCT does not by itself lossily compress the image; rather, the quantization phase can more easily pick out information to acknowledge the structure of the DCT. Quantization throws out fine details in the block of pixels, which correspond to the high-frequency coefficients in the DCT. The number of coefficients set to zero is determined by the level of compression desired.
Motion estimation and compensation exploit the relationships between frames provided by moving objects. A reference frame is used to encode later frames through a motion vector, which describes the motion of a macroblock of pixels (16 × 16 in many standards). The block is copied from the reference frame into the new position described by the motion vector. The motion vector is much smaller than the block it represents. Two-dimensional correlation is used to determine the position of the macroblock’s position in the new frame; several positions in a search area are tested using 2D correlation. An error signal encodes the difference between the predicted and the actual frames; the receiver uses that signal to improve the predicted picture.
MPEG distinguishes several types of frames: I (inter) frames, which are not motion-compensated; P (predicted) frames, which have been predicted from earlier frames; and B (bidirectional) frames, which have been predicted from both earlier and later frames.
The results of these lossy compression phases are assembled into a bit stream and compressed by using lossless compression such as Huffman encoding. This process reduces the size of the representation without further compromising image quality.
It should be clear that video compression systems are actually heterogeneous collections of algorithms. We believe that this is true of other applications of SoCs as well. A video platform must run several algorithms; those algorithms perform very different types of operations, imposing very different requirements on the architecture.
This has two implications for tools: first, we need a wide variety of tools to support the design of these applications; second, the various models of computation and algorithmic styles used in different parts of an application must at some point be made to communicate to create the complete system.
Several studies of multimedia performance on programmable processors have remarked on the significant number of branches in multimedia code. These observations contradict the popular notion of video as regular operations on streaming data. Fritts and Wolf [1] measured the characteristics of the MediaBench benchmarks.
They used path ratio to measure the percentage of instructions in a loop body that were actually executed. They found that the average path ratio of the MediaBench suite was 78%, which indicates that a significant number of loops exercise data-dependent behavior. Talla et al. [2] found that most of the available parallelism in multimedia benchmarks came from inter-iteration parallelism.
3.3 Other Application Domains
Video and multimedia are not the only application domains for SoCs. Communications and networking are the other areas in which SoCs provide cost/performance benefits. In all these domains, the SoC must be able to handle multiple simultaneous processes. However, the characteristics of those processes do vary. Networking, for example, requires a lot of packet-independent operations. While some networking tasks do require correlating multiple packets, the basic work is packet independent. The large extent of parallelism in packet-level processing can be exploited in the micro-architecture. In the communications world, SoCs are used today primarily for baseband processing, but we should expect SoCs to take over more traditional high-frequency radio functions over time. Since radio functions can operate at very high frequencies, the platform must be carefully designed to support these high rates while providing adequate programmability of radio functions. We should expect highly heterogeneous architectures for high-frequency radio operations.
3.4 Platform Characteristics
Many SoCs are heterogeneous multiprocessors and the architectures designed for multimedia applications are no exceptions. In this section, we review several SoCs, including some general-purpose SoC architectures as well as several designed specifically for multimedia applications.
Two very different types of hardware platforms have emerged for large-scale applications. On the one hand, many custom SoCs have been designed for various applications. These custom SoCs are customized by loading software onto them for execution. On the other hand, platform field-programmable gate arrays (FPGAs) provide FPGA fabrics along with CPUs and other components; the design can be customized by programming the FPGA as well as the processor(s). These two styles of architecture represent different approaches for SoC architecture and they require very different sorts of tools: custom SoCs require large-scale software support, while platform FPGAs are well suited to hardware/software codesign.
3.4.1 Custom System-on-Chip Architectures
The Viper chip [3], shown in Figure 3.1, was designed for high-definition video decoding and set-top box support. Viper is an instance of the Philips NexperiaTM architecture, which is a platform fo...