II
System-Level Design
Tools and Methodologies for System-Level Design | 3 |
Shuvra Bhattacharyya and Marilyn Wolf
CONTENTS
3.1 Introduction
3.2 Characteristics of Video Applications
3.3 Platform Characteristics
3.3.1 Custom System-on-Chip Architectures
3.3.2 Graphics Processing Units
3.3.3 Platform FPGAs
3.4 Abstract Design Methodologies
3.5 Model-Based Design Methodologies
3.5.1 Dataflow Models
3.5.2 Dataflow Modeling for Video Processing
3.5.3 Multidimensional Dataflow Models
3.5.4 Control Flow
3.5.5 Integration with Finite-State Machine and Mode-Based Modeling Methods
3.5.6 Video Processing Examples
3.6 Languages and Tools for Model-Based Design
3.6.1 CAL
3.6.2 Compaan
3.6.3 PREESM
3.6.4 Ptolemy
3.6.5 SysteMoc
3.7 Simulation
3.8 Hardware/Software Cosynthesis
3.9 Summary
References
3.1 INTRODUCTION
System-level design, once the province of board designers, has now become a central concern for chip designers. Because chip design is a less forgiving design medium—design cycles are longer and mistakes are harder to correct—system-on-chip (SoC) designers need a more extensive tool suite than may be used by board designers, and a variety of tools and methodologies have been developed for system-level design of SoCs.
System-level design is less amenable to synthesis than are logic or physical design. As a result, system-level tools concentrate on modeling, simulation, design space exploration, and design verification. The goal of modeling is to correctly capture the system’s operational semantics, which helps with both implementation and verification. The study of models of computation provides a framework for the description of digital systems. Not only do we need to understand a particular style of computation, such as dataflow, but we also need to understand how different models of communication can reliably communicate with each other. Design space exploration tools, such as hardware/software codesign, develop candidate designs to understand trade-offs. Simulation can be used not only to verify functional correctness but also to supply performance and power/energy information for design analysis.
We will use video applications as examples in this chapter. Video is a leading-edge application that illustrates many important aspects of system-level design. Although some of this information is clearly specific to video, many of the lessons translate to other domains.
The next two sections briefly introduce video applications and some SoC architectures that may be the targets of system-level design tools. We will then study models of computation and languages for system-level modeling. We will then survey simulation techniques. We will close with a discussion of hardware/software codesign.
3.2 CHARACTERISTICS OF VIDEO APPLICATIONS
One of the primary uses of SoCs for multimedia today is for video encoding—both compression and decompression. In this section, we review the basic characteristics of video compression algorithms and the implications for video SoC design.
Video compression standards enable video devices to interoperate. The two major lines of video compression standards are MPEG and H.26x. The MPEG standards concentrate on broadcast applications, which allow for a more expensive compressor on the transmitter side in exchange for a simpler receiver. The H.26x standards were developed with videoconferencing in mind, in which both sides must encode and decode. The Advanced Video Codec standard, also known as H.264, was formed by the confluence of the H.26x and MPEG efforts. H.264 is widely used in consumer video systems.
Modern video compression systems combine lossy and lossless encoding methods to reduce the size of a video stream. Lossy methods throw away information such that the uncompressed video stream is not a perfect reconstruction of the original; lossless methods do allow the information provided to them to be perfectly reconstructed. Most modern standards use three major mechanisms:
1. The discrete cosine transform (DCT) together with quantization
2. Motion estimation and compensation
3. Huffman-style encoding
Quantized DCT and motion estimation are lossy, while Huffman encoding is lossless. These three methods leverage different aspects of the video stream’s characteristics to more efficiently encode it.
The combination of DCT and quantization was originally developed for still images and is used in video to compress single frames. The DCT is a frequency transform that turns a set of pixels into a set of coefficients for the spatial frequencies that form the components of the image represented by the pixels. The DCT is used over other transforms because a 2D DCT can be computed using two 1D DCTs, making it more efficient. In most standards, the DCT is performed on an 8 × 8 block of pixels. The DCT does not itself lossily compress the image; rather, the quantization phase can more easily pick out information to throw away thanks to the structure of the DCT. Quantization throws out fine detail in the block of pixels, which correspond to the high-frequency coefficients in the DCT. The number of coefficients set to zero is determined on the level of compression desired.
Motion estimation and compensation exploit the relationships between frames provided by moving objects. A reference frame is used to encode later frames through a motion vector, which describes the motion of a macroblock of pixels (16 × 16 in many standards). The block is copied from the reference frame into the new position described by the motion vector. The motion vector is much smaller than the block it represents. Two-dimensional correlation is used to determine the position of the macroblock’s position in the new frame; several positions in a search area are tested using 2D correlation. An error signal encodes the difference between the predicted and the actual frames; the receiver uses that signal to improve the predicted picture.
MPEG distinguishes several types of frames: I (inter) frames are not motion compensated, P (predicted) frames have been predicted from earlier frames, and B (bidirectional) frames have been predicted from both earlier and later frames.
The results of these lossy compression phases are assembled into a bit stream and compressed using lossless compression such as Huffman encoding. This process reduces the size of the representation without further compromising image quality.
It should be clear that video compression systems are actually heterogeneous collections of algorithms; this is also true of many other applications, including communications and security. A video computing platform must run several algorithms; those algorithms may perform very different types of operations, imposing very different requirements on the architecture.
This has two implications for tools: first, we need a wide variety of tools to support the design of these applications; second, the various models of computation and algorithmic styles used in different parts of an application must at some point be made to communicate to create the complete system. For example, DCT can be formulated as a dataflow problem, thanks to its butterfly computational structure, while Huffman encoding is often described in a control-oriented style.
Several studies of multimedia performance on programmable processors have remarked on the significant number of branches in multimedia code. These observations contradict the popular notion of video as regular operations on streaming data. Fritts and Wolf [1] measured the characteristics of the MediaBench benchmarks.
They used path ratio to measure the percentage of instructions in a loop body that were actually executed. They found that the average path ratio of the MediaBench suite was 78%, which indicates that a significant number of loops exercise data-dependent behavior. Talla et al. [2] found that most of the available parallelism in multimedia benchmarks came from interiteration parallelism. Exploiting the complex parallelism found in modern multimedia algorithms requires that synthesis algorithms be able to handle more complex computations than simple ideal nested loops.
3.3 PLATFORM CHARACTERISTICS
Many SoCs are heterogeneous multiprocessors and the architectures designed for multimedia applications are no exception. In this section, we review several SoCs, including some general-purpose SoC architectures as well as several designed specifically for multimedia applications.
Two very different types of hardware platforms have emerged for large-scale applications. On the one hand, many custom SoCs have been designed for various applications. These custom SoCs are customized by loading software onto them for execution. On the other hand, platform field-programmable gate arrays (FPGAs) provide FPGA fabrics along with CPUs and other components; the design can be customized by programming the FPGA as well as the processor(s). These two styles of architecture represent different approaches for SoC architecture and they require very different sorts of tools: custom SoCs require large-scale software support, while platform FPGAs are well suited to hardware/software codesign.
3.3.1 CUSTOM SYSTEM-ON-CHIP ARCHITECTURES
The TI OMAP family of processors [3] is designed for audio, industrial automation, and portable medical equipment. All members of the family include a C674x DSP; some members also include an ARM9 CPU.
The Freescale MPC574xP [4] includes 2 e200z4...