9512.net

甜梦文库

甜梦文库

当前位置：首页 >> >> # EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION OPTIMIZATION FOR VIDEO ENCODING UNDER ENER

EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION OPTIMIZATION FOR VIDEO ENCODING UNDER ENERGY CONSTRAINTS

A Thesis presented to the Faculty of the Graduate School University of Missouri-Columbia

In Partial Ful?llment Of the Requirement for the Degree Master of Science

by WENYE CHENG Prof. Zhihai He, Thesis Supervisor AUGUST 2007

The undersigned, appointed by the Dean of the Graduate School, have examined the thesis entitled EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION OPTIMIZATION FOR VIDEO ENCODING UNDER ENERGY CONSTRAINTS Presented by Wenye Cheng A candidate for the degree of Master of Science And hereby certify that in their opinion it is worthy of acceptance.

Professor Zhihai He

Professor Justin Legarsky

Professor Ye Duan

To my wife Ming Qian and daughter Xi Cheng.

—–Without their love, support and sacri?ces, this work could not have been accomplished.

ACKNOWLEGEMENTS

I would hereby like to whole-heartedly thank my advisor, Prof. Zhihai He, for providing excellent guidance throughout the course of this work. I would like thank members of my thesis committee, Prof. Justin Legarsky and Prof. Ye Duan, for their thorough review of this thesis. I express my deepest and most sincere gratitude to my committee members. I would like also thank Mr. Jim Fischer for his assistance in setting up the experiment environment of this thesis. He was very patient and cooperative every time I asked for his help. To my colleagues, graduate students Xi Chen, Jay Eggert, and Xiwen Zhao, I would like to express my thanks for their help in experiments. Specially, I express my thanks to Xi Chen for his great help in our collaboration on the energy-aware video encoder design. I would like also thank York Chung for his help during my thesis writing. I would have taken a much longer time in typewriting the thesis without his help.

ii

Abstract

Wireless video communication over portable devices has become the driving technology of many important applications, experiencing dramatic market growth and promising revolutionary experiences in personal communication, gaming, entertainment, military, security, environment monitoring, and more. Portable devices are powered by batteries. Video encoding schemes are often computationally intensive and energy-demanding, even after being fully optimized with existing software and hardware energy-minimization techniques. As a result, the operational lifetime of current portable video systems, such as handheld video devices, is still very short, mostly in the range of a few hours. Therefore, one of the central challenging issues in portable video communication system design is to minimize the energy consumption of video encoding so as to extend the operational lifetime of devices. In this work, we develop an operational power-rate-distortion (P-R-D) approach to minimizing the video encoding energy under rate-distortion constraints. We will demonstrate that extending the traditional rate-distortion analysis to P-R-D analysis will give us another dimension of ?exibility in resource allocation and performance optimization for wireless video communication over portable devices. Theoretically, we will analyze the energy saving gain of P-R-D optimization. Practically, we will develop an adaptive scheme to estimate the P-R-D model parameters and perform on-theiii

?y energy optimization for real-time video compression. Our results show that, for typical videos with non-stationary statistics, using the proposed P-R-D optimization technology, the encoder energy consumption can be signi?cantly reduced. This has many important applications in energy-e?cient portable video communication system design.

iv

Contents

ACKNOWLEDGEMENTS Abstract List of Figures List of Tables 1 Introduction 1.1 Major Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background and Relate Work 2.1 Video Coding Complexity . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Dynamic Scalable Voltage . . . . . . . . . . . . . . . . . . . . . . . . ii iii viii xi 1 3 3 5 5 8

2.3 Joint Hardware and Application Adaptation . . . . . . . . . . . . . . 10 2.4 Importance of This Work . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Energy-Scalable Video Encoder Design and Operational P-R-D Analysis 14

3.1 An Operational Approach to P-R-D Analysis . . . . . . . . . . . . . . 14 v

3.2 Complexity Control Parameters . . . . . . . . . . . . . . . . . . . . . 17 3.2.1 3.2.2 The Complexity Control Parameter νX . . . . . . . . . . . . . 17 The Complexity Control Parameter νY . . . . . . . . . . . . . 19

3.3 P-R-D Modeling for Energy-Aware Video Encoding . . . . . . . . . . 21 3.4 Analytical P-R-D Models . . . . . . . . . . . . . . . . . . . . . . . . . 24 4 Energy Saving Analysis 33

4.1 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Traing-Classi?cation Approach to P-R-D Video Encoder Control . . . 39 5 Power-Rate-Distortion Control for Energy-Aware Video Encoding 45 5.1 A Training-Classi?cation Approach to P-R-D Video Encoder Control 45

5.2 Lagrangian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3 P-R-D Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.4 Summary of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6 Joint Hardware and Video Encoder Adaptation 67

6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.2.1 6.2.2 CPU Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.3 Application-Layer Video Encoder Adaptation . . . . . . . . . . . . . 71 6.3.1 6.3.2 Practical Dynamic Voltage Scaling (PDVS) . . . . . . . . . . 72 . . 76 80

Cross-layer Adaptation for Single-Stream Video Encoding

7 Energy-Aware Embedded Video Encoding System Design

7.1 Tier architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 vi

7.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2.1 7.2.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.3 Power Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.3.1 7.3.2 Signal Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 State Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 90

8 Conclusion and Future Work

8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 References 92

vii

List of Figures

2.1 Generalized block diagram of a hybrid video encoder . . . . . . . . . 6

2.2 The framework of the GRACE . . . . . . . . . . . . . . . . . . . . . . 11 3.1 Power consumption model with DVS. . . . . . . . . . . . . . . . . . . 16 3.2 (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates. . . . . 23 3.3 Akiyos (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates. 25

3.4 News (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates. . 26 3.5 Salesman (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates. 27 3.6 Car (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates. . . 28 3.7 Foreman (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates. 29 3.8 Coastguard (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.9 Football (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates. 31 4.1 The relationship of λ and λF . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 The common video scene estimation parameter λF . . . . . . . . . . . 42 4.3 A parts of common video scene estimation parameter λF . . . . . . . 42 4.4 A parts of common video scene estimation parameter λF . . . . . . . 43 4.5 The distribution of scene activity parameters of all video segments. . 44

viii

5.1 The discrete Lagrangian Optimization principle . . . . . . . . . . . . 47 5.2 The multiplier λ, rate constraints and complexity at D = 38db . . . . 52 5.3 The multiplier λ, rate constraints and complexity at D = 37db . . . . 53 5.4 The multiplier λ, rate constraints and complexity at D = 36db . . . . 53 5.5 The multiplier λ, rate constraints and complexity at D = 35db . . . . 54 5.6 The multiplier λ, rate constraints and complexity at D = 34db . . . . 54 5.7 The multiplier λ, rate constraints and complexity at D = 33db . . . . 55 5.8 The multiplier λ, rate constraints and complexity at D = 32db . . . . 55 5.9 The multiplier λ, rate constraints and complexity at D = 31db . . . . 56 5.10 The multiplier λ, rate constraints and complexity at D = 30db . . . . 56 5.11 The multiplier λ, rate constraints and complexity at D = 29db . . . . 57 5.12 Comparison of the non scaling coding and scaling coding at D = 37db, λ = 300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.13 Comparison of the non scaling coding and scaling coding at D = 37db, λ = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.14 Comparison of the non scaling coding and scaling coding at D = 34db, λ = 300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.15 Comparison of the non scaling coding and scaling coding at D = 34db, λ = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.16 Comparison of the non scaling coding and scaling coding at D = 30db, λ = 300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.17 Comparison of the non scaling coding and scaling coding at D = 30db, λ = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7.1 Tier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

ix

7.2 The Video Sensor Tiers . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.3 Software constructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.4 De?nition of the Connecter Pins . . . . . . . . . . . . . . . . . . . . . 88

x

List of Tables

2.1 CPU Occupancy (In Percentage) of the Major Encoding Function . . 8

4.1 The point value for categorizing Video Segment . . . . . . . . . . . . 41 5.1 Sample choices of of multiplier λ. . . . . . . . . . . . . . . . . . . . . 52 5.2 The Probability of the Fi . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.3 The Lagragian Multiplier λ and correspoding QoS and rate constaint 5.4 The Lagragian Multiplier λ and corresponding P SNR ≈ 37db and rate constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.5 The Lagragian Multiplier λ and corresponding P SNR ≈ 34db and rate constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.6 The Lagragian Multiplier λ and corresponding P SNR ≈ 34db and rate constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.7 The Energy Saving Ratio . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1 Stargate Power consumption. . . . . . . . . . . . . . . . . . . . . . . 70 6.2 The Energy Saving Ratio1 . . . . . . . . . . . . . . . . . . . . . . . . 79 6.3 The Energy Saving Ratio 2 . . . . . . . . . . . . . . . . . . . . . . . . 79 6.4 The Energy Saving Ratio 3 . . . . . . . . . . . . . . . . . . . . . . . . 79 59

xi

Chapter 1 Introduction

Wireless video communication over portable devices has become the driving technology of many important applications, experiencing dramatic market growth and promising revolutionary experiences in personal communication, gaming, entertainment, military, security, environment monitoring, and more [37,38]. Portable devices are powered by batteries. Video encoding schemes are often computationally intensive and energy-demanding, even after being fully optimized with existing software and hardware energy-minimization techniques [14, 63]. As a result, the operational lifetime of current portable video systems, such as handheld video devices, is still short, mostly in the range of a few hours. This has become a bottleneck for technological progress in portable video electronics. Video encoding is computationally intensive and energy-consuming. However, a mobile video application system, powered by batteries, has limited energy supply for the video data processing. One of the central challenging issues in such system design is to minimize the energy consumption of video data processing so as to extend the operational lifetime of the system while supporting multimedia Quality

1

of Service (QoS) requirements. In energy-aware video encoding system design, the energy consumption of video data compression should be minimized while satisfying the same QoS requirements. In the age of desktop computing and wired communication, people worried about bits, the storage space or transmission bandwidth. Therefore, the ultimate goal in this type of video communication system design is to optimize video quality under rate constraints. To analyze, model, control, and optimize the performance of a signal processing and communication system under rate constraints, rate-distortion (R-D) theories and algorithms have been developed [5, 36, 50, 65]. With recent technological advances in circuit design and wireless communication, storage space and network bandwidth have experienced dramatic growth, having been improved by hundreds of times during the past decade. Currently, in many portable communication applications, energy has become a much more scarce and critical resource than bandwidth or storage space. Therefore, how to incorporate the energy consumption into the existing R-D performance analysis framework so as to optimize the video communication system performance under rate and energy constraints emerges as a new research task. In this work, we study the energy consumption of video encoders and incorporate the third dimension of power consumption into existing rate-distortion (RD) analysis framework so as to establish a power-rate-distortion (P-R-D) analysis framework for energy-aware video encoding. More speci?cally, we ?rst develop an energy-scalable video encoder which is fully scalable in energy consumption. We then develop an operational approach to study its P-R-D behavior. Both theoretically and theoretically, we demonstrate that with the proposed energy-scalable video encoder design and P-R-D optimization, we are able to signi?cantly reduce 2

the energy consumption of video encoding over portable video devices.

1.1

Major Contributions

The major contributions of this work include: 1. Developing an energy-aware video encoding system for wildlife behavior monitoring. Studying power consumption of video encoding systems and develop a video encoding scheme which is fully scalable in energy consumption. 2. Developing an operational approach to modeling the P-R-D behavior of an energy-scalable video encoder. Developing an analytical P-R-D model. Developing a scheme to estimate P-R-D model parameters from video encoding statistics. 3. Studying the characteristics of P-R-D functions of various test video segments. Developing a training-classi?cation approach for real-time P-R-D control using P-R-D clustering.

1.2

Thesis Organization

The rest of the thesis is organized as follows: Chapter 2 reviews the existing research work related to energy-aware video encoding over portable communication devices. Chapter 3 introduces a fully energy-scalable video encoder with the P-R-D framework and presents our operational approach to P-R-D modeling. An analytical PR-D model is then established. We also develop a scheme to estimate the P-R-D model parameter from video encoder statistics. 3

Chapter 4 studies the problem of energy-saving using P-R-D optimization. Theoretically, we demonstrate that the energy consumption of a video encoder can be saved if the scene activity of input video is non-stationary. We also study how the complexity control parameters of the video encoder can be con?gured to achieve this energy saving. Chapter 5 optimizes the P-R-D computational complexity scaling parameters with the discrete version Lagrangian optimization algorithm. Chapter 6 discusses how the P-R-D optimal scalable encoder can be used in practical energy-aware embedded video system design. Joint hardware and video encoder adaptation technique is discussed. Chapter 7 discusses energy-aware video encoding in practical system design and explained how the proposed technologies can be used in a practical setting. Chapter 8 concludes the thesis. Future work is also discussed in this chapter.

4

Chapter 2 Background and Relate Work

In this chapter, we review existing research work related to energy-aware video coding and communication system design. We then justify the importance of this work.

2.1

Video Coding Complexity

There are two types of portable video devices: encoder (e.g., video cell phones, wireless video cameras, etc) and player(e.g., iPod video). In this research, we focus on energy minimization for portable video encoding devices. This is because, on portable video devices, the fraction of energy consumption by video encoding (typically 60-85%) is much higher than that of video decoding [31]. For video encoding, video data is voluminous. It has to be very e?ciently compressed. Otherwise, the amount of transmission energy or required storage space will be tremendous. During the past decades, many video compression algorithms and international standards, such as MPEG-2, H.263, MPEG-4, and H.264 [54, 55],

5

have been developed for e?cient video compression. An e?cient video compression system is often computationally intensive and energy-consuming, since it involves many sophisticated operations in spatiotemporal prediction, transform, quantization, mode selection, and entropy coding [66]. Recent studies [39, 63] and our experimental analysis show that, in typical scenarios of video communication over portable devices, video encoding consumes a signi?cant portion (up to 40-60%) of the total energy. Standardized video encoding techniques like H.263, H.264/AVC, MPEG-1, 2, 4 are based on hybrid video coding [31], which shows in the Fig. 2.1.

Figure 2.1: Generalized block diagram of a hybrid video encoder The input image is divided into macroblocks. Each macroblock consists of the three components Y, Cr and Cb. Y is the luminance component which represents the rightness information. Cr and Cb represent the color information. A macroblock consists of one block of 16 by 16 picture elements for the luminance component and of two blocks of 8 by 8 picture elements for the color components. The encoder compresses the video with each macroblock. In this view, the macroblock is the basic 6

coding unit [5]. The behavior of the video encoder has been theoretically analyzed in [66]. Typical video encoders, including all the standard video encoding systems, such as MPEG-2 [1], H.263 [26], MPEG-4 [54], and H.264/AVC [31], employ a hybrid motion compensated DCT encoding scheme. Speci?cally, as shown in Fig. 2.1, they have the following major encoding modules: motion estimation (ME) and motion compensation (COMP), DCT, quantization (QUANT), entropy encoding (ENC) of the quantized DCT coe?cients, inverse quantization (DQUANT), inverse DCT (IDCT), picture reconstruction (RECON), and interpolation (INTERP) [54]. For the ease of exposition, the DCT, IDCT, QUANT, DQUANT and RECON modules are collectively referred to as PRECODING. In this way, the video encoder has only three major modules: ME, PRECODING, and ENC. The PRECODING can be considered as the data representation module. The run-time complexity of major encoding modules is shown in Tab.2.1, where the percentages of CPU occupancy for the major encoding modules are listed. It can be seen that ME is the most computation-intensive module, consuming about one-third of the processor cycles. The PRECODING modules collectively consume about 50% of the total processor cycles. The ENC module uses a relative small amount of the total CPU time, especially at low coding bit rates. During the past decades, many algorithms, software and hardware techniques have been developed to reduce the computational complexity of video encoding, speed up the video encoder, and reduce its energy consumption. For example, a statistical modeling approach is proposed in [21] to predict the zero DCT coe?cients after quantization. Based on the prediction, the DCT computation for those zero coe?cients can be saved. Fast and low-power motion estimation algorithms have been developed to reduce the computational complexity of motion estimation [64]. 7

Component ME COMP DCT QUANT ENC DQUANT IDCT RECOD INTERP RC other

Akiyo News Carphone 30.4% 32.6% 33.1% 9.1% 8.4% 8.7% 10.5% 9.2% 9.2% 4.9% 4.6% 5.1% 4.7% 5.4% 3% 1.9% 1.5% 2.0% 2.3% 2.9% 2.6% 7.5% 6.9% 7.2% 14.3% 12.8% 13.2% 7.4% 7.9% 7.6% 6.5% 7.3% 6.7%

Table 2.1: CPU Occupancy (In Percentage) of the Major Encoding Function Since there is no motion estimation for INTRA macroblocks (MB’s), the INTRA ratio parameter, which is the fraction of INTRA MB’s in the video frame, can be us to control the motion estimation complexity in the video encoder [50]. A parametric scheme for scalable motion estimation and DCT has been proposed in [60]. Hardware implementation technologies have also been developed to improve the video encoding speed [47] [5]. Recently, researchers at University of Illinois at Urbana-Champaign and IBM have realized the importance of power aware computing for video data compression and are investigating software and hardware techniques to reduce the energy consumption of video compression [14]. However, little research has been done to establish a theoretical framework for modeling and minimizing the energy consumption of video compression for battery-powered communication devices.

2.2

Dynamic Scalable Voltage

Many algorithms developed in the literature are able to control or optimize the computational complexity of the video encoder. To translate the complexity control 8

and reduction into energy control and saving, we need to consider energy-scaling technologies in hardware design. To dynamically control the energy consumption of microprocessors on the portable device, a CMOS circuits design technology, named dynamic voltage scaling (DVS), has been recently developed [29, 42]. In CMOS circuits, the power consumption P is given by P = V 2 · fCLK · CEF F ,

(2.1)

where V , fCLK , and CEF F are the supply voltage, clock frequency, and e?ective switch capacitance of the circuits, respectively [51]. Since the energy is power multiplied by time, and the time to ?nish an operation is inversely proportional to the clock frequency. Therefore, the energy per operation Eop is proportional to V 2 (Eop ∝ V 2 ). This implies that lowering the supply voltage will reduce the energy consumption of the system in a quadratic fashion. However, lowering the supply voltage also decreases the maximum achievable clock speed. More speci?cally, it has been observed that fCLK is approximately linearly proportional to V [65]. Therefore, the result is

3 P ∝ fCLK , 2 and Eo p ∝ fCLK .

(2.2)

It can be seen that the CPU can reduce its energy consumption substantially by running more slowly. However, it is not so slow for the real-time operation of video coding. This is the key idea behind the DVS technology. Variable chip makers, including AMD [22] and Intel [23], have recently announced and sold processors with this energy-scaling feature. In conventional system design with ?xed supply voltage and clock frequency, clock cycles, and hence energy, are wasted when the CPU workload 9

is light and the processor becomes idle. Reducing the supply voltage in conjunction with the clock frequency eliminates the idle cycles and saves the energy signi?cantly.

2.3

Joint Hardware and Application Adaptation

Achieving high QoS requirements with low energy consumption is challenging. Cross-layer adaptation provides an e?cient way to address such issues. First, system resources are being designed with the ability to trade o? performance for energy. For example, mobile processors on the market today (such as Intel XScale PAX255 [24], Intel Pentium-M [25] and AMDAthlon [3]) can already change the speed and power at runtime using DVS. Second, multimedia applications can gracefully adapt to resource changes while maintaining acceptable service quality. That is, multimedia applications allow a tradeo? between output quality and resource demands. Finally, the operating system can also provide ?exible resource management to support the tradeo? between QoS and resource demands or to balance the demands on di?erent resources (e.g., CPU time and network bandwidth). Researchers have proposed adaptation approaches to address the high QoS and low energy challenge in mobile devices. Adaptation can happen in di?erent layers from hardware to operating system to applications. The hardware adaptation dynamically recon?gures hardware resources such as the processor to save energy while providing the requested resource service and performance [6,10,34,45,46]. The operating system adaptation changes the policies of allocation and scheduling in response to application and resource variations [18, 20, 28, 44, 48, 61]. The application layer adaptation, possibly with the support of the operating system or middleware, changes the QoS parameters such as rate to trade o? output quality for resource

10

usage or to balance usage of di?erent resources [8, 13, 17, 35, 57, 66]. The above adaptation approaches have been shown to be e?ective for both QoS provisioning and energy saving. However, most of them adapt only a single layer or two joint layers (e.g., the operating system and applications [9, 43] or the operating system and hardware [30, 41, 59]).

Figure 2.2: The framework of the GRACE The Global Resource Adaptation through Cooperation (GRACE) project [49] is to develop and demonstrate an integrated cross-layer adaptive system where hardware and all software layers cooperatively adapt to changing system resources and application demands, seeking to maximize user satisfaction while meeting resource constraints of the energy, time, and bandwidth. The centerpiece of the GRACE is a cross-layer adaptation framework that enables coordination of the adaptations of the di?erent system layers for the best QoS possible. Fig. 2.2 shows the framework of GRACE. The key characteristics of the coordinator are: 1. Individual application components adapt locally, without knowledge of the internals of other parts of the system. 2. Each software con?guration is characterized by its cost (to represent its resource usage) and its utility (to represent user satisfaction). An application software 11

component uses these metrics to drive its local adaptations (with the goal of minimizing cost for maximal utility). 3. A software con?guration determines its cost using dynamic feedback from the hardware and possibly other application components (e.g., the network). 4. Since all resources are capable of adaptation, each resource o?ers multiple operating points and hence multiple possible costs (e.g., multiple combinations of execution time and energy) for a given software con?guration. 5. The resource manager receives requests from multiple applications with multiple associated costs and utilities. The resource manager selects the software and hardware con?gurations that will maximize overall system utility and meet the system constraints. Thus, the resource manager is not concerned with how an optimal con?guration is reached within a layer, but is simply a mediator for ensuring that the selected con?gurations maximize overall system performance within the given constraints. 6. Once the resource manager allocates a reservation, di?erent system components are free to adapt locally without going through the resource manager, as long as they do not exceed the provided reservations. The utility of GRACE can bene?t in the real-time and dynamic nature of the applications, which often result in some computational slack. It also provides a possibility to tradeo? between requirement of the quality and resource usage.

2.4

Importance of This Work

Recently, there are a lot of algorithms, software and hardware energy-minimization techniques, including low-complexity encoder design [16, 21, 33], low-power embed-

12

ded video encoding [27, 32], adaptive power control [39, 60, 63], and joint encoder and hardware adaptation [13, 14, 56] to have been developed. These algorithms focus on encoder complexity (and power consumption) reduction through heuristic adaptation or control instead of systematic energy optimization. This is because they lack an analytic model to characterize the optimum trade-o? between energy saving and encoding performance [66]. In addition, even with existing energy saving technologies, the operational lifetime of portable video electronics is still very short, which has become one of the biggest impediments to our technology future. Therefore, developing new energy optimization methods has become an urgent research task. In this work, we propose to develop a new P-R-D analysis framework to characterize the inherent relationship between energy consumption and encoder R-D performance. This will enable us to perform systematic energy minimization. More importantly, given a video encoder, which has already been fully optimized using existing software and hardware energy optimization technologies, the P-R-D analysis framework will enable us to achieve additional signi?cant energy saving.

13

Chapter 3 Energy-Scalable Video Encoder Design and Operational P-R-D Analysis

In this chapter, we introduce our scheme for energy-scalable video encoder design, present our operational P-R-D analysis framework, introduce the analytical P-R-D model, and explain how the model parameter can be estimated from video encoding statistics.

3.1

An Operational Approach to P-R-D Analysis

In the original work of P-R-D analysis [66], we have introduced a set of complexity control parameters into an MPEG-4 encoder, studied the R-D behavior of each parameter, and obtained the P-R-D function for a simple MPEG-4 encoder. We observe that this analytical approach is not easily extendable to other video encoders,

14

such as H.264 video coding [55], and the direct R-D analysis of complexity control parameters becomes very di?cult when the video encoding mechanism becomes more sophisticated. In this work, we propose an operational approach for o?ine P-R-D analysis and modeling which can be applied to generic video encoders. The operational P-R-D modeling has the following three major steps. In the ?rst step, we group the encoding operations into several modules, such as motion prediction, pre-coding (transform and quantization), mode decision, and entropy coding, and then introduce a set of control parameters Γ = [γ1 , γ2, · · ·, γL] to control the power consumption of these modules. Therefore, the encoder complexity C is then a function of these control parameters, denoted by C (γ1, γ2 , · · ·, γL). Within the DVS (dynamic voltage scaling) design framework [40], the microprocessor power consumption, denoted by P , is a function of computational complexity C , therefore, also a function of Γ, denoted by P=Φ(C ) = P (γ1, γ2 , · · ·, γL), whereΦ(·) is the power consumption model of the microprocessor. For example, according to our measurement, the power consumption model of the Intel PXA255 XScale processor used in our DeerCam system is depicted in Fig. 3.1 (solid line). It can be well approximated by the following expression P = Φ(C ) = β × C γ , γ = 2.5,

(3.1)

where β is a constant. In the second step, we execute the video encoder using different con?gurations of complexity control parameters and obtain the corresponding R-D data, denoted by D (R; γ1 , γ2 , · · ·, γL ). Note that this step is computationally intensive and is intended for o?ine analysis to obtain the P-R-D model only. Once the model is established, in Section 3.4, we will discuss how the model parameter

15

can be estimated online during video encoding.

1.4 Actual Approximation

1.2

1

Computing Energy (W)

0.8

0.6

0.4

0.2

0 0

50

100

150

200

250

300

350

400

Work Load (M cycles)

Figure 3.1: Power consumption model with DVS. In the third step, we perform optimum con?guration of the power control parameters to maximize the video quality (or minimize the video distortion) under the power constraints. This optimization problem can be mathematically formulated as follows:

{γ1 ,γ2 ,···,γL }

min

D = D (R ; γ 1 , γ 2 , · · · , γ L ),

s.t. P (γ1 , γ2, · · ·, γL) ≤ P,

(3.2)

where P is the available power consumption for video encoding. Given the R-D data set {D (R; γ1 , γ2, · · ·, γL)}, The optimum solution, denoted by D (R, P ), describes the P-R-D behavior of the video encoder. The corresponding optimum complexity control parameters are denoted by {γi? (R, P }, 1 ≤ i ≤ L. In the following, we explain how to de?ne the complexity control parameters, 16

design an energy-scalable video encoder, and obtain the P-R-D function using the above operational P-R-D analysis approach.

3.2

Complexity Control Parameters

In this work, two complexity scalable parameters, νX and νY , are introduced in the video encoder. The complexity control parameter for the motion compensation module is the number of SAD (sum of absolute di?erence) computations per frame, denoted by νX . The νY presents the complexity control parameter of pre-coding module, which is the number of the non-zero MB’s in the video frame. Here, “nonzero” means the MB has non-zero DCT coe?cients after quantization.

3.2.1

The Complexity Control Parameter νX

The complexity control parameter for the motion prediction and compensation module is νX . This is based on the observation that the ME process is simply a sequence of SAD (sum of absolute di?erence) computations to ?nd the MB position of the minimum SAD. Therefore, the computational complexity of ME, denoted by CM E , is simply given by CM E = νX · CSAD . (3.3)

The existing video encoder uses a block-based motion prediction scheme. The objective of motion estimation is to ?nd the best match in the reference frame for every MB in the current frame. The search for the SAD-optimal motion vector problem can be formulated as (x0 , y0 ) = arg min SAD (x, y ), (3.4)

17

where SAD (x, y ) represents the sum of absolute di?erence (SAD) between the current MB and the reference MB at a relative position of (x, y ). We can see that the ME process is simply a sequence of SAD computations to ?nd the motion vector which has the minimum SAD. It is assumed that the computational complexity of each MB SAD is a constant. Therefore, the overall computational complexity of the ME module is linearly proportional to the number of SAD computations νX . At the frame-level, the νX · SAD computations are allocated among the MB’s in the video frame to optimize the picture quality. The dynamic allocation of SAD computations is used in the complexity scalable of motion estimation. It is well known that moving objects in the video scene contribute most to the overall visual quality. This suggests that in motion estimation under energy constraints, we need to allocate the available νX · SAD computations among the MB’s according to their motion characteristics to optimize the overall picture quality. Let (mvx , mvy ) be the motion vector of the MB. The block motion activity (BMA) factor of the MB, denoted by ma is de?ned as

ma = |mvx | + |mvy |.

(3.5)

At the frame level, we introduce a motion history matrix (MHM), denoted by M = [mij ]M R×M C , where MR and MC are the numbers of MB’s per row and per column, respectively. Initially, we set mij = 1. After a frame id coded, each entry is updated as follows: ? ? ? mij + 1 if ma = 0; mij = ? ? 0 else.

(3.6)

Here, ma is the BMA factor of the (i, j )-th MB in the coded frame. The larger

18

the value of mij , it is of higher probability that this MB is a static block, and less SAD computations can be allocated to this MB. Note that each entry of the MHM is linearly scaled and represented by the gray level of a MB, ranging from 0 to 255. We can see that the MHM captures not only the motion history but also the locations of the object motion. Most importantly, this MHM approach has very low computation overhead and is very cost-e?ective in practice. Using the MHM, we can allocate the νX · SAD computations among the MB’s. The number of SAD computations allocated to the (i, j )-th MB, denoted by nsadij , is determined by 1 1? N ?1 mij

(k,l)≥(i,j )

nsadij =

mkl

· Nsad,

(3.7)

where N is the number of MB’s left so far that need to perform the motion estimation, and Nsad is the available number of SAD computations. Initially, Nsad is set to be νX . Suppose the motion search range is SR. If nsadij ≥ (2 · SR + 1)2 , it means the computational power is enough to perform a full search for this block. Otherwise, the diamond motion search algorithm in [4] is used to ?nd the motion vector, whose complexity, indicated by the number of search layers, is controlled by nsadij .

3.2.2

The Complexity Control Parameter νY

By analyzing the encoding architecture of the video encoding system, we ?nd that it is possible to control the computational complexity of all the pre-coding modules using one single parameter , which is the number of the non-zero MB’s in the video frame. Here, ”non-zero” means the MB has non-zero DCT coe?cients after quantization. Let CN ZM B and CP RE be the pre-coding computational complexity 19

of one non-zero MB (NZMB) and the whole video frame, respectively. We will see that, CP RE = νY · CN ZM B . (3.8)

A parametric complexity scalable parameter νY is given, which is to collectively control the computational complexity of the pre-coding modules, namely, the DCT, QUANT, DQUANT, IDCT, and RECON modules. In typical video encoding as illustrated in Chapter 1 Fig.2.1, DCT is applied to the di?erence MB after motion estimation and compensation, or the original MB if its coding mode is INTRA. After the DCT coe?cients are quantized, DQUANT, IDCT, and RECON are performed to reconstruct the MB for motion prediction of the next frame. In transform coding of videos, especially at low coding bit rates, the DCT coe?cients in the MB might become all zeros after quantization. We refer to this MB as an all-zero MB (AZMB). Otherwise, it is called a non-zero MB (NZMB). In international standards for video encoding, such as MPEG-2, H.263, and MPEG-4, ”non-zeros” also means the CBP (coded block pattern) value of the MB is non-zero. If we can predict an MB to be AZMB, all the above pre-coding operations can be skipped, because the output of DQUANT and IDCT of an AZMB is still an AZMB, and the reconstructed MB is exactly the reference MB used in motion estimation and compensation. Therefore, the encoder can simply copy over the reference MB to reconstruct the current MB. This is a unique property of the AZMB, which can be used to reduce the computational complexity of the video encoder [7]. The unique property of the AZMB is used to design a complexity scalability scheme for the pre-coding modules. Let {xnk |0 ≤ n, k ≤ 7} be the coe?cients in

20

the di?erent MB after motion estimation. For INTRA MB’s, {xnk } are the original pixels in the video frame. Let {yij |0 ≤ i, j ≤ 7} be the DCT coe?cients. According to the de?nition of DCT, we have 1 2n + 1 yij = Ci Cj xnk cos iπ 4 16 n=0 k =0 where Ci = We can see that ? ? ? ? ? ?

7 7

cos jπ

2k + 1 16

,

(3.9)

1 √ 2

ifi = 0, 1;

Cj =

1 √ 2

ifj = 0, 1;

(3.10)

? ? else, |yij | ≤

2

7

7

? ? else. |xnk |2 . (3.11)

n=0 k =0

Note that the right-hand side is the SSD of the di?erence MB, which is already computed during the motion estimation. This suggests us that the SSD could be an e?cient and low-cost measure to predict the AZMB. After motion estimation and compensation, let {SSDi |1 ≤ i ≤ M } be the SSD values of the M MB’s in the video frame sorted in an ascending order. In the proposed complexity scalability scheme for pre-coding, we force the ?rst M ? νY MB’s to be AZMB’s, and treat the remaining νY MB ’s as NZMB’s to which the pre-coding operations are applied.

3.3

P-R-D Modeling for Energy-Aware Video Encoding

As discussed in the above, we start from a conventional video encoder, denoted by ΠA , introduce a set of complexity control parameters to control its computational complexity and in turn its power consumption. We refer to this complexity-scalable 21

or power-scalable video encoder as C-R-D encoder, denoted by ΠB . It should be noted that introducing the control parameters Γ to the existing video encoder ΠA , as ΠB will only scale down its computational complexity and power consumption. Let P A and P be the power consumption of encoders ΠA and ΠB , respectively. Obviously, 0 ≤ P ≤ P A . Therefore, we can normalize the power consumption P of the C-R-D encoder by P A . After this normalization, the value of P is between 0 and 1. Fig.3.2(a) shows the P-R-D curve D(R;P)with normalized power P. Fig.3.2(b) shows the D-P curves at di?erent bit rates R. The speci?c procedure to insert the complexity control parameters into the encoder may vary from one encoder to another. For example, in an MPEG-4 video encoder, we can use the number of SAD (sum of absolute di?erence) computations to control the complexity of its motion estimation module, and use the number of non-zero blocks to control the complexity of DCT and quantization modules. Similar procedures can be applied to MPEG-2 or H.263 encoders. For H.264 video encoders, we can use the number of SAD computations and reference frames, and the number of coding modes to control its complexity. It is not important how the complexity control parameters Γ = [ν1 , ν2 , . . . , νL ] are inserted into the video encoding modules. Our C-R-D video encoder design only requires that: ? Compatible. The existing non-energy-scalable video encoder ΠA is a special case of the P-R-D encoder ΠB when νi = 1 or P = 1. From the R-D analysis perspective

D (R, P )|P =1 = D (R),

(3.12)

where D (R, P ) is the P-R-D function of encoder ΠB , and D (R) is the R-D 22

(a)

(b)

Figure 3.2: (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates.

23

function of encoder ΠA . In other words, when the power supply is su?cient and a full encoding power is used, the energy-scalable P-R-D video encoder ΠB becomes the traditional video encoder ΠA . ? Scalable. When we reduce the values of complexity control parameters, the power consumption P of the P-R-D encoder decreases

3.4

Analytical P-R-D Models

In order to obtain an analytical P-R-D model for energy-aware video encoding, we perform the operational P-R-D analysis procedure over a wide range of test video sequences and ?nd determine the common characteristics of their P-R-D functions. The test video sequences used in this work include Akiyos, News, Salesman, Carphone, Foreman, Coastguard, and Football QCIF (176×144) video sequences. The C-R-D curves and the D-P curves for these test video sequences at di?erent bit rates are shown in Fig.3.3-Fig.3.9. It should be noted that these C-R-D curves at di?erent complexity control schemes share a similar pattern: as the bit rate and power consumption increase, the distortion decreases exponentially. More speci?cally, we can draw the following observations about the C-R-D functions: 1. When P = 0 and R = 0, the coding distortion should be the variance of the input video. This is because the encoder does not have any bit and energy resources to compress the video data, and the decoder has no choice but display blank pictures. In this case, the coding distortion, measured by the MSE (mean squared error) between the input video and the reconstructed one, is the variance of the input video. 24

Akiyos P?R?D

2.5

2

1.5

D(*10 )

6

1

0.5

0 0 0.2 0.4 0.6 Power 0.8 1 1

Akiyos P?R?D 2.5

0 0.2 0.4 0.6 0.8 Rate

2

1.5

D(*10 )

6

1

0.5

0

0

0.1

0.2

0.3

0.4

0.5 Power

0.6

0.7

0.8

0.9

1

Figure 3.3: Akiyos (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates.

25

News P?R?D

5.5 5 4.5 4 3.5

D(*10 )

6

3 2.5 2 1.5 1 0.5 0 0 0.2 0.4 0.6 Power 0.8 1

5

0 0.5 1 1.5

News P?R?D

Rate

4.5

4

3.5

3

D(*10 )

6

2.5

2

1.5

1

0.5

0

0

0.1

0.2

0.3

0.4

0.5 Power

0.6

0.7

0.8

0.9

1

Figure 3.4: News (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates.

26

Salesman P?R?D

6

5

4 D(*106)

3

2

1

0 0 0.2 0.4 0.6 Power 0.8 1

2

0 0.5 1 1.5

Salesman P?R?D

Rate

1.8

1.6

1.4

1.2

D(*10 )

6

1

0.8

0.6

0.4

0.2

0

0

0.1

0.2

0.3

0.4

0.5 Power

0.6

0.7

0.8

0.9

1

Figure 3.5: Salesman (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates.

27

Car P?D?R 5.5 5 4.5 4 3.5

D(*10 )

6

3 2.5 2 1.5 1 0.5 0 0 0.2 0.4 0.6 Power 0.8 1 2.5 2 1.5 Rate

Car P?R?D 3

1

0.5

0

2.5

2

D(*10 )

6

1.5

1

0.5

0

0

0.1

0.2

0.3

0.4

0.5 Power

0.6

0.7

0.8

0.9

1

Figure 3.6: Car (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates.

28

Foreman P?R?D

20 18 16 14 12

D(*106)

10 8 6 4 2 0 0 0.2 0.4 0.6 Power 0.8 1

20

3

2.5

Foreman P?R?D

2

1.5 Rate

1

0.5

0

18

16

14

12

D(*106)

10

8

6

4

2

0

0

0.1

0.2

0.3

0.4

0.5 Power

0.6

0.7

0.8

0.9

1

Figure 3.7: Foreman (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates.

29

Coastguard P?R?D

20 18 16 14 12

D(*10 )

6

10 8 6 4 2 0 0 0.2 0.4 0.6 Power 0.8 1 4 3.5 3 2.5 2 Rate 1.5 1 0.5 0

Coastguard P?R?D 20

18

16

14

12

D(*106)

10

8

6

4

2

0

0

0.1

0.2

0.3

0.4

0.5 Power

0.6

0.7

0.8

0.9

1

Figure 3.8: Coastguard (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates.

30

Fb P?R?D 40

35

30

25

D(*106)

20

15

10

5

0

0

0.1

0.2

0.3

0.4

0.5 Power

0.6

0.7

0.8

0.9

1

Figure 3.9: Football (a)The P-R-D curve; (b)The D-P curves at di?erent bit rates.

31

2. As we can see from Fig.3.2, the relationship between the distortion D and power consumption P is approximately exponential. 3. As suggested by the classical R-D models, the relationship between the coding bit rate R and distortion D is also exponential. Based on the above observations, we propose the following analytic expression for the P-R-D model: D (R, P ) = σ 2 2?λR·g(P ) , (3.13)

where, σ 2 represents the picture variance of the video segment after motion compensation and λ represents the resource utilization e?ciency of the video encoder. g (P ) is the normalized power consumption model of the microprocessor with g (P ) = 0 and g (P ) = 1. In general, g (P ) is a monotonically increasing function. The analysis in [6] suggests that g (P ) = P γ .

1

(3.14)

The model 3.13 describes the relationship of the P , R, and D . It is convenient to analyze the power consumption behavior of the video encoder with this model.

32

Chapter 4 Energy Saving Analysis

In this chapter, we will study how the energy consumption of the video encoder can be minimized using P-R-D optimization. We will then present a trainingclassi?cation approach to con?gure the complexity control parameters of the video encoder so as to achieve the energy saving.

4.1

Theoretical Analysis

Let DA (R) be the R-D function of encoder ΠA , and DB (R, P ) the P-R-D function of encoder ΠB . As explained Chapter3, we have

DB (R, P )|P = 1 = DA (R),

and

DB (R, P ) ≤ DA (R),

(4.1)

where 0 ≤ P ≤ 1. Let S be the video sequence to be encoded. The time duration of S is denoted by T , which corresponds to the operational lifetime of device. We partition S into a number of segments, denoted by {Sn |1 ≤ n ≤ N }. Suppose the picture variance and encoder e?ciency in the P-R-D model 3.13 for each video 33

segment are σ 2 and λn , respectively. Let Rn and Pn be the number of bits and power used by video encoder ΠB to compress Sn . Let RT be the total number bits that can be generated by the video encoder. We assume that the video sequence is encoded at a constant quality D, In general, this assumption is reasonable because most applications require that the videos are encoded at a target quality. Rewriting model 3.13 as the equation 4.2, for video segment Sn , we have an γ, Rn

Pn = where

(4.2)

an =

1 σ2 log2 n λn D

γ

,

(4.3)

is called scene activity level. We can see that within the P-R-D analysis framework, the encoder power consumption Pn depends on the encoding bit rate RN . Our objective is to minimize the total power consumption of encoder ΠB in compressing all video segments. The energy minimization problem can be formulated as follows

N {Rn } N

min P =

n=1

Pn =

n=1

an γ, Rn

N

s.t.

n=1

Rn = RT .

(4.4)

Lemma 1 The solution to the minimization problem is given by (γan ) γ+1

N 1 i=1 (γai ) γ +1

1

Rn = and the minimum power is given by (

· RT ,

(4.5)

P =

N γ +1 γ +1 n=1 an ) . γ RT

1

(4.6)

34

Proof. We solve the energy minimization problem using a Lagrange multiplier approach. Let

N

J=

n=1

an Rn ? RT ), γ + β( Rn n=1

N

(4.7)

where β is the Lagrange multiplier. For the minimum solution, the following condition holds: ?J an = ?γ γ +1 + β = 0. ?Rn Rn This is Rn = Since

N n=1

(4.8)

γ · an β

1 γ +1

.

(4.9)

Rn = RT , we have

N n=1 (γ

β From equation 4.9, we have

1 γ +1

=

· an ) γ+1

1

RT

.

(4.10)

Rn =

(γ · an ) γ+1

N i=1 (γ

1

· ai ) γ+1

1

· RT ,

(4.11)

which is the optimum bit allocation. The minimum power consumption is then giver by

N

N

P =

i=1

Pn =

n=1

an γ Rn

35

=

1 an γ RT n=1

N

N

N i=1 (γ

· ai ) γ+1

γ

1

γ

(γ · an ) γ+1 ai

1 γ +1

1 = γ RT 1 = γ RT

γ N

an

n=1 γ N

γ γ+1 (γ · an ) γ+1

1 γ

γ

i=1 N

ai

i=1 N i=1

1 γ +1

γ +1 an

n=1

1 γ +1

γ +1

ai

=

γ RT

,

(4.12)

which proves Lemma 1. Lemma 2. The P-R-D video encoder ΠB consumes less energy than the standard non-energy-scalable video ΠB , and the energy saving ratio Λe is

N i=1 N i=1

1

γ +1

Λe =

B PT = A PT

aiγ+1 ai

1 γ

γ

≤ 1, ·N

(4.13)

A B where PT and PT are the power consumption of encoders ΠA and ΠB , respectively.

The equality holds if the scene activity level ai for each video segment is constant (stationary). Proof. In the P-R-D video encoder design, encoder ΠA is a special case of the P-RD video encoder ΠB when a full encoder power is used. In other words, the power consumption of encoder ΠA on video segment Si , denoted by PiA , is equal to 1. If D is the target video encoding quality. According to equation 4.2, the corresponding

1

A A number of encoding bits of ΠA , denoted by Ri , is given by Ri = aiγ . Therefore, A the total number of bits generated by encoder Ri is

36

N

1

RT =

i=1

aiγ ,

(4.14)

and the total power consumption of encoder ΠA is given by

N N

PT =

i=1

PiA

=

i=1

1 = N.

(4.15)

For the same number of encoding bits RT , according to equation 4.6 and 4.14, the minimum power consumption of the P-R-D encoder ΠB is

N i=1

1 γ +1

γ +1 N i=1

ai

ai

1 γ +1

γ +1

B PT

=

γ RT

=

N i=1

ai

1 γ

γ

.

(4.16)

The energy saving ratio of the P-R-D video encoder ΠB over the existing video encoder ΠA is given by

N i=1 N i=1

1 γ +1

γ +1

PB Λe = T = A PT

ai

ai

1 γ

γ

. ·N

(4.17)

To prove that Λ ≤ 1, we need to use the Holder’s inequality.

Given two N-

dimensional vector (x1 , x2 , . . . , xN ) and (y1 , y2 , . . . , yN ), according to Holder’s inequality, we have

N N

1 p

N q yi i=1

1 q

|xi · yi | ≤

i=1 i=1

xp i

,

(4.18)

where p, q > 1 and

37

1 1 + = 1. p q The equality holds when

(4.19)

(x1 , x2 , · · · , xN ) = α · (y1 , y2, · · · , yN ) , where α is a constant. We let

1

(4.20)

xi = aiγ+1 , p= γ+1 , γ

y i = 1, q = γ + 1.

(4.21) (4.22)

Using the Holder’s inequality, we have

N

|ai

i=1

1 γ +1

N

· 1| ≤

i=1 N

ai

1

1 γ +1

γ +1 γ

γ γ +1

N

1 γ +1

·

i=1

1

(1)γ +1

γ γ +1

=

i=1

aiγ

· N γ+1 ,

(4.23)

that is

N γ +1

1 γ +1

N

γ

ai

i=1

≤

i=1

ai

1 γ

· N.

(4.24)

Therefore

N i=1 N i=1

1

γ +1

Λe =

B PT = A PT

aiγ+1 ai

1 γ

γ

≤ 1. ·N

(4.25)

38

According to equation 4.20, the equality holds when

(a1 , a2 , · · · , aN ) = α (1, 1, · · · , 1) ,

(4.26)

which implies that the scene activity level ai of each video segment is constant. So far, Lemma 2 is proved. From Lemma 2, we can see that, incorporating another dimension, the power consumption P, into the traditional R-D analysis gives us another dimension of freedom in resource allocation and performance optimization. Accordingly, the PR-D video encoder is able to save energy by intelligently allocating its bit and energy resources. Furthermore, the result from Lemma 2 shows that the energy saving is possible if the scene activity of the input video scene is non-stationary, i.e., αi is time-varying. In practice, typical video application system will be able to operate for hours. Within this long operational time period, the video content captured by the system often exhibits a large variation in its scene activity levels. In this case, the P-R-D video encoder is able to save energy signi?cantly.

4.2

Traing-Classi?cation Approach to P-R-D Video Encoder Control

The activity parameter αi is very important. It includes the resource utilization e?ciency parameter λ of the P-R-D video encoder, the picture variance σ 2 which represents the amount of scene activities, and the target video quality D set by the application requirement. In real-time video encoding, the picture variance σ 2 can be obtained directly from the video encoder. Therefore, we only need to estimate

39

λ. The P-R-D model is rewritten as: D (R, P ) = σ 2 2?λR·C ,

γ

1 ≤ γ ≤ 3.

1

(4.27)

3 Considering P ∝ fCLK and C ∝ fCLK , it is observable that C ∝ P 3 . Hence, from

the P-R-D model

1 γ

D (R, P ) = σ 2 2?λR·P , the C-R-D model 4.27 is obtained.

1 ≤ γ ≤ 3,

(4.28)

With the C-R-D model, the λ can be estimated. In Chapter 3, we have presented an operational approach to obtain the P-R-D curve. Based on these P-R-D curves, we can determine the value of λ using statistical ?tting. This approach is computationally intensive and only suitable for o?ine C-R-D analysis. In real-time video compression, it is desirable to develop a low-complexity scheme which is able to estimate λ from statistics of current or previous video frames, which are directly available from the video encoder. We de?ne the following encoder statistics λF

2 1 σi , γ log2 R · CF Di

λF =

(4.29)

where CF = C (ν1 = 1, ν2 = 1, . . . , νL = 1) represents the encoder complexity where no complexity control is applied. Fig.4.1 shows that the P-R-D model parameter λ is highly correlated with the existing R-D model λF . So far, the video scene activity can be identi?ed with parameter λF . It dose not add further complexity for calculating parameter λF . In this work, we assume that the scene activity parameter does not change sig-

40

8 data 1 linear 7

6

5

4

λ

3

2

1

0

0

0.005

0.01

0.015

0.02

0.025 λF

0.03

0.035

0.04

0.045

0.05

Figure 4.1: The relationship of λ and λF ni?cantly within the video segment. In practice, with proper scene change detection, a long non-stationary video input can be often partitioned into multiple video segments with relatively stationary scene activities. Fig. 4.2 illustrates the scene activity estimation parameter λF in the standard test video sequences. It shows that the time-varying video sequence can be partitioned into video segments, each with a relatively stationary scene activity level.

Lavel λF Range 1 0.0472 0.045 2 0.0331 0.045-0.030 3 0.0246 0.030-0.022 4 0.0132 0.022-0.012 5 0.0097 0.012-0.009 6 0.0070 0.009-0.006 7 0.0048 0.006

Table 4.1: The point value for categorizing Video Segment We implement and test the proposed P-R-D analysis procedure over an input video sequence which has 3590 frames and partitioned into 57 segments. The results are shown in Fig. 4.5. In our experiments, we classify the input video sequence into the seven clusters, with each of them sharing similar P-R-D behaviors. The 41

The activity of a video sequence

300

250

Model Parameter λ

F

200 150 100 50 0

0

500

1000

1500 2000 Number of frame

2500

3000

3500

Figure 4.2: The common video scene estimation parameter λF

The activity of a video sequence (1) 35 30 25 20

Model Parameter λ

F

15 10 5 0 ?5

?10

600

650

700

750 Number of frame

800

850

900

Figure 4.3: A parts of common video scene estimation parameter λF

42

The activity of a video sequence (2)

300

250

Model Parameter λF

200

150

100

50

0

1500

1600

1700

1800 Number of frame

1900

2000

Figure 4.4: A parts of common video scene estimation parameter λF classi?cation feature λF are listed in the Table 4.1. The result shows that the P-RD model parameters of di?erent video segment exhibit a strong clustering behavior. Based on this property, we will develop a training-classi?cation approach to P-R-D video encoder control in the next chapter.

43

The frequency of the activity level 16

14

12

10

Frequency

8

6

4

2

0

1

2

3

4 Activity level

5

6

7

Figure 4.5: The distribution of scene activity parameters of all video segments.

44

Chapter 5 Power-Rate-Distortion Control for Energy-Aware Video Encoding

In this chapter, we study how the constrained optimization for P-R-D analysis in (3.2) can be solved and how the complexity control parameters of video encoders can be con?gured during real-time video encoding to achieve the optimized P-R-D performance.

5.1

A Training-Classi?cation Approach to P-R-D Video Encoder Control

In Chapter 3, we introduce two complexity control parameters, νX and νY , to control the motion prediction and pre-coding modules of the video encoder so as to establish a complexity-scalable or equivalently energy-scalable video encoder. The encoding bit rate R, video distortion D , and power consumption P are all functions of these two complexity control parameters. In energy-aware video encoding, these 45

two parameters should be optimally selected so as to minimize the total power consumption under rate and distortion constraints or minimize the total video coding distortion under rate and power constraints, as shown in (3.2). As discussed in Chapter 4, the input video segments exhibit a strong clustering behavior in their P-R-D model parameters. Based on this observation, we have propose a training-classi?cation approach to P-R-D control and optimization during real-time video encoding. First, in the training stage, we cluster the training video segments according to their scene activity parameter. During the training stage, we collect a set of training video segments with a wide range of scene activities. We partition them into a number of clusters according to their scene activity parameter λF . According to our simulation experience, 5 to 7 clusters will be su?cient. For each cluster of video segments, we ?nd their average P-R-D function and optimum encoder complexity control parameters by solving the constrained optimization problem. In this chapter, we use Lagrangian optimization to obtain the optimum control parameters νX and νY of each cluster. These optimum encoder complexity control parameters for all clusters are then stored in a database. During real-time video encoding, we compute the scene activity parameter λF of the video segment,

2 determine its cluster based on the value of σm , and then use the average optimum

encoder complexity control parameters of that cluster to control the video encoder.

5.2

Lagrangian Optimization

In Chapter 4, we have proposed to partition a non-stationary input video sequence into multiple video segments which have relatively stationary scene activities. These video segments are then classi?ed into several clusters and the distribution of these

46

video segments in the cluster seems to follow a Gaussian distribution, as shown in Fig. 4.5. For video segments in each cluster, we use a discrete Lagrangian optimization approach [5] to obtain the optimum complexity control parameters, since during our operational P-R-D analysis, what we have obtained are discrete PR-D points, D = D (R, P |Γ(ν1, ν2 , . . . , νL )) at di?erent con?gurations of complexity control parameters.

Figure 5.1: The discrete Lagrangian Optimization principle The basic idea of discrete Lagrangian optimization is as follow. We start with a simple rate-distortion optimization without considering the third dimension of power consumption to demonstrate the basic procedure of discrete Lagrangian optimization . We ?rst introduce a Lagrange multiplier λ ≥ 0, a non-negative real number and consider the Lagrangian cost Jij (λ) = dij + λ · rij for our P-R-D analysis. Refer to Fig. 47

5.1 for a graphical interpretation of the Lagrangian cost. Here, i is the index of video segment and j is the index of the complexity control parameter. As the parameter j increases, the coding bit rate rij decreases and the coding distortion dij increases. The Lagrange multiplier allows us to select speci?c trade-o? points. Minimizing the Lagrangian cost Jij = dij + λ · rij when λ = 0, is equivalent to minimizing the coding distortion. In other words, it selects the point closer to the x-axis in Fig. 5.1. Conversely, minimizing the Lagrangian cost function when λ becomes arbitrarily large is equivalent to minimizing the coding bit rate, and thus ?nding the point closest to the y -axis in Fig. 5.1. Intermediate values of the multiplier λ determine intermediate operating points. Please see [5] for detail. Consider the following R-D optimization problem:

N N

min J =

j ∈? i=1

dij + λ · rij ,

s.t.

i=1

rij ≤ RT ,

(5.1)

where j is a parameter for coding control. If the mapping j = x? (i) for i = 1, 2, . . . , N , minimizes

N

dix(i) + λ · rix(i) .

i=1

(5.2)

Then it is also the optimal solution to the problem of the 5.1, for the particular case where the total budget is:

N

RT = R(λ) =

i=1

rix(i) ,

(5.3)

so that

48

N

N

D (λ ) =

i=1

d

ix? (i)

≤

i=1

dix(i) ,

(5.4)

for any x satisfying the equation 5.4 with R given by (5.3). Since the budget constraint of (5.3) has been removed, for a given multiplier λ, (5.1) can be rewritten as

N N

min(

j ∈? i=1

dij + λ · rij ) =

i=1

min(dij + λ · rij ).

j ∈?

(5.5)

In this case, the minimum point can be computed independently for each coding unit. Note that for each coding unit i, the point on the R-D characteristic that minimizes dij + λ · rij is that point at which the line of absolute slope λ is tangient to the convex hull of the R-D characteristic. For this reason we normally refer to λ as the slope, and since λ is the same for every coding unit on the sequence, we can refer to this algorithm as a constant slope optimization [5].

5.3

P-R-D Optimization

In the following, we apply discrete Lagrangian optimization to solve the P-R-D optimization problem. (5.1) can be rewritten

[ν1 ,ν2 ,...,νL ]∈?

min

J = D + λ1 · R + λ2 · C,

(5.6)

where the J is the Lagrangian cost, the λ1 and λ2 are multipliers for bit rate constraint R and complexity constraint C , respectively. For (5.6), the optimal objective is to obtain best video quality of service under rate and complexity constraints. Note that the power consumption P is the function of the computational complexity C 49

which is related to DVS. In this chapter, for ease of discussion, we use C-R-D model instead of the P-R-D model, since they are equivalent under DVS. Within a cluster of video segment, as mentioned in 4, the Lagrangian cost function can be written as

N

N

N

J=

i=1

(Di + λ1i · Ri + λ2i · Ci ),

and

i=1

? Ri ≤ R,

i=1

? Ci ≤ C,

(5.7)

where N is a number of video segments. There are two multipliers in the Lagrangian cost function. So, it is more di?cult to determine the optimum values of both. In order to simple the optimization procedure, we assume that the video quality is constant. This assumption is reasonable, since in many practical applications, users often specify a target video quality for the video encoding and communication task. ? , where D ? is the target video quality set by users. Now, In this case, we set Di = D the cost function becomes

N N

J=

i=1

(Ci + λi · Ri ),

and

i=1

? Di ≈ D. ? Ri ≤ R,

(5.8)

As mention in the 4, the input video segments are classi?ed into seven clusters. In this case, the cost function can be re-written as

N

7

Ni

min(

k =1

(Ci + λk · Rk )) =

i=1 j =1 N

Cij (i) + λi · Rij (i) , ? Ri ≤ R,

i=1

(5.9)

under

? Di ≈ D.

50

Note that the each video segment in the cluster is independent so that the optimization procedure can be performed on each cluster. Using the constant slope optimization algorithm [5], Eq. (5.10) is rewritten as

7

Ni

7

Cij (i) + λ · Rij (i) ,

i=1 j =1 N

N=

i=1

Ni ,

(5.10)

under

i=1

? Ri ≤ R,

? Di ≈ D.

Based on this formulation, the optimization problem can be solved using the following two steps. Step 1: Calculating the minimum Lagrangian cost Ji with di?erent λ ∈ [λ1 , λ2 , . . . , λM ] for each cluster. We run the P-R-D scalable encoder mentioned 4 with at di?erent con?guration of encoding parameters [X, Y, Q] and the corresponding encoding results [C, R, D] are recorded. Here, [X, Y, Q] are scalable parameter of motion estimation search, pre-coding, and quantization step size. [C, R, D] are encoding complexity, coding bit rate, and video distortion, respectively. Note that we assume the video coding distortion is constant. We group the results [C, R, D] based their distortion values. For each group, we apply the optimization procedure outlined in (5.11):

[X,Y,Q]

min ji = Ci + λ · Ri ,

N

i = 1, 2 , . . . , 7 , (5.11)

under

i=1

? Ri ≤ R,

? Di ≈ D.

51

The multiplier λ controls the optimization to match the rate budget. Some sample choices of λ are listed in Table 5.1. λ λ 300 400 500 600 700 800 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 12000 14000 Table 5.1: Sample choices of of multiplier λ. The Fig. 5.2 to Fig. 5.11 show the minimum Lagrangian cost at di?erent values of λ for each of the seven clusters. From these results, we can see that each optimal pair of rate and complexity corresponds a multiplier λ. This pair of rate and complexity corresponds a group of the scalable parameters [X, Y, Q]. With the parameters [X, Y, Q], the optimal coding can be performed.

1.28 1.26 1.24 x 10

6

PSNR=38

Rate

1.22 1.2 1.18 1.16 1.14 0 2000 4000 6000 8000 10000 12000 14000

λ

9.6 9.4 9.2

x 10

8

Complexity

9 8.8 8.6 8.4 8.2 0 2000 4000 6000 λ 8000 10000 12000 14000

Figure 5.2: The multiplier λ, rate constraints and complexity at D = 38db Step 2: Predicting the scene activity parameter of the input video segment. In Step 1, each cluster’s minimum Lagrangian cost is calculated at di?erent values of λ. This implies that the minimum computational complexity is obtained under a certain rate constrain for a speci?c value of λ. As discussed in Chapter 4, each 52

1.35 1.3 1.25

x 10

6

PSNR=37

Rate

1.2 1.15 1.1 1.05 0 2000 4000 6000 λ 10 9.5 x 10

8

8000

10000

12000

14000

Complexity

9 8.5 8 7.5 7 0 2000 4000 6000 λ 8000 10000 12000 14000

Figure 5.3: The multiplier λ, rate constraints and complexity at D = 37db

1.15 1.1 1.05

x 10

6

PSNR=36

Rate

1 0.95 0.9

0

2000

4000

6000 λ

8000

10000

12000

14000

9.5 9

x 10

8

Complexity

8.5 8 7.5 7

0

2000

4000

6000 λ

8000

10000

12000

14000

Figure 5.4: The multiplier λ, rate constraints and complexity at D = 36db

53

10 9.5 9

x 10

5

PSNR=35

Rate

8.5 8 7.5 7 0 2000 4000 6000 λ 9.5 9 x 10

8

8000

10000

12000

14000

Complexity

8.5 8 7.5 7 6.5 0 2000 4000 6000 λ 8000 10000 12000 14000

Figure 5.5: The multiplier λ, rate constraints and complexity at D = 35db

8.5 8 7.5

x 10

5

PSNR=34

Rate

7 6.5 6 5.5 0 2000 4000 6000 λ 9.5 9 x 10

8

8000

10000

12000

14000

Complexity

8.5 8 7.5 7 6.5 0 2000 4000 6000 λ 8000 10000 12000 14000

Figure 5.6: The multiplier λ, rate constraints and complexity at D = 34db

54

7

x 10

5

PSNR=33

6.5

Rate

6

5.5

5

0

2000

4000

6000 λ

8000

10000

12000

14000

9.5 9

x 10

8

Complexity

8.5 8 7.5 7 6.5 0 2000 4000 6000 λ 8000 10000 12000 14000

Figure 5.7: The multiplier λ, rate constraints and complexity at D = 33db

5.5

x 10

5

PSNR=32

5

Rate

4.5

4

3.5

0

2000

4000

6000 λ

8000

10000

12000

14000

9.5 9

x 10

8

Complexity

8.5 8 7.5 7 6.5 6 0 2000 4000 6000 λ 8000 10000 12000 14000

Figure 5.8: The multiplier λ, rate constraints and complexity at D = 32db

55

5

x 10

5

PSNR=31

4.5

Rate

4

3.5

3

0

2000

4000

6000 λ

8000

10000

12000

14000

9 8.5 8

x 10

8

Complexity

7.5 7 6.5 6 5.5 0 2000 4000 6000 λ 8000 10000 12000 14000

Figure 5.9: The multiplier λ, rate constraints and complexity at D = 31db

4

x 10

5

PSNR=30

3.5

Rate

3 2.5

0

2000

4000

6000 λ

8000

10000

12000

14000

8 7.5

x 10

8

Complexity

7 6.5 6 5.5

0

2000

4000

6000 λ

8000

10000

12000

14000

Figure 5.10: The multiplier λ, rate constraints and complexity at D = 30db

56

3.2 3

x 10

5

PSNR=29

Rate

2.8 2.6 2.4 2.2 2 0 2000 4000 6000 λ 7.5 x 10

8

8000

10000

12000

14000

7

Complexity

6.5

6

5.5

0

2000

4000

6000 λ

8000

10000

12000

14000

Figure 5.11: The multiplier λ, rate constraints and complexity at D = 29db video segment is classi?ed into clusters according to their scene activity parameters. Hence, if the distribution of the clusters is known, the Lagrangrian multiplier λ can be ?xed for a given rate budget. Fortunately, the distribution of the clusters of the video segments in term activity scene can be often considered to be a Gaussian distribution. Let x be the scene activity parameter. Let f (x) be the probability density function of x among these video clusters. Let r (x) be the corresponding bit rate obtained from optimization. Therefore, the expected value of output bit rate is given by

∞ ?∞

f (x) · r (x)dx.

(5.12)

Considering a discrete-time case of this problem. Let xi be the scene activity parameter of the i-th cluster. Let ? = [xi |i = 1, 2, . . . , 7]. Let Fi be the probability for sample i. The probability of the sample i, Fi , is list in Table 5.2. Let r (ji ) be the rate of the j -th video segment that is classi?ed into i-th cluster. In this case,

57

the expected bit rate of j -th video segment is predicted by

7

Fi · r (i|λ).

i=1

(5.13)

The overall coding rate Rp is predicted by

N 7 7 7

Rp =

j =1 i=1

Fi · r (i|λ) =

i=1

(N · Fi ) · r (i|λ) =

i=1

Ni · r (i|λ),

(5.14)

where Ni is the number of the video segments that are classi?ed into the i-th cluster.

Cluster P rob. Cluster1 0.053 Cluster2 0.105 Cluster3 0.140 Cluster4 0.281 Cluster5 0.211 Cluster6 0.140 Cluster7 0.070

Table 5.2: The Probability of the Fi Step 3: Fix the Lagrangian multiplier λ and obtain the optimal encoder control parameters [X, Y, Q]. In Step 1, the minimum Lagrangian cost of a video segment in the i-th cluster is calculated at di?erent values of λ. In other words, the minimum computational complexity is obtained for a given rate constraint and a speci?c value of λ. As discussed in Chapter 4, an input video segment is classi?ed into a cluster based on its scene activity parameter. The distribution of the clusters in the video sequence can be predicted by Step 2. Accordingly, the Lagrangrian multiplier λ can be ?xed with the certain rate budget. For example, we can consider the video quality levels of 37db, 34db, and 30db. In order to obtain the λ, ?rst, we list the relationship between the expectation of the rate and the λ in Table . Then, the average rate budget for the video segment compression is calculated with the total rate budget with

58

r ?=

? R . N

7000 1153689.24 940163519.19 1.1130e+006 1.1161e+009 7000 593037.21 900865197.85 6.1253e+005 1.0609e+009 7000 259710.16 738226926.15 3.0192e+005 878845555

(5.15)

37db

34db

30db

λ RP CP Rr Cr λ RP CP Rr Cr λ RP CP Rr Cr

300 1264347.14 826501723.62 1.2381e+006 972837842 300 802745.57 663130614.32 8.0704e+005 788791942 300 390217.42 572981357.51 4.1154e+005 680497548

600 1209230.67 844671507.38 1.2112e+006 990056977 600 705773.85 706261434.62 7.3882e+005 850711799 600 331753.54 597513566.14 3.6938e+005 701901814

2000 1162620.71 902107565.82 1.1217e+006 1.0679e+009 2000 602751.87 855651058.05 6.1917e+005 1.0121e+009 2000 266147.48 709343178.71 3.0327e+005 854763964

14000 1153165.68 946846809.28 1.1127e+006 1.1217e+009 14000 592451.51 908359867.79 6.1233e+005 1.0750e+009 14000 255222.04 783359359.26 2.9966e+005 935804923

Table 5.3: The Lagragian Multiplier λ and correspoding QoS and rate constaint The encoder control parameters [X, Y, Q] are obtained at the minimum Lagrangian cost. In this case, (5.11) can be rewritten as

Ni [X,Y,Q]

7

min J =

j =1 i=1

min(Cij + λ · Rij ),

7

Ni = N · Fi , ? D ≈ D.

(5.16)

under R = N ·

i=1

? Fi · ri ≤ R,

Here, Fi is the probability of the i-th cluster. The overall computational complexity of a video sequence for a given rate constraint is

7 Ni 7

Cp =

i=1 j =1

Cij ,

and R = N ·

i=1

? D ≈ D, ? Fi · ri ≤ R,

(5.17)

? is the total rate budget. where Cp is the total minimum computational complexity, R 59

5.4

Summary of Algorithm

We implement the optimization on common video sequence which has 3590 frames and partitioned into 57 segments and the following steps are performed: 1. Assuming the video QoS and rate evaluation list in the Table 5.3. The N is 57. 2. Calculating corresponding expectation rate rre with equation 5.13. Note that the probability of the Fi is in the Table 5.2. 3. Obtaining the each corresponding scaling parameters with Table 5.4 PSNR: 37db λ Clust1 X 0.20 300 Y 0.20 Q 5 X 0.20 600 Y 0.20 Q 5 X 0.20 2000 Y 0.20 Q 5 X 0.20 7000 Y 0.20 Q 5 X 1.00 14000 Y 0.20 Q 5 Clust2 0.10 0.40 4 0.10 0.40 4 0.10 0.40 4 0.10 0.40 4 0.10 0.40 4 Clust3 0.10 0.20 3 0.10 0.20 3 0.10 0.20 3 1.00 0.20 3 1.00 0.20 3 Clust4 1.00 0.40 3 1.00 0.40 3 0.50 0.80 4 0.60 0.85 4 0.60 0.85 4 Clust5 1.00 0.60 3 1.00 0.60 3 1.00 0.60 3 1.00 0.60 3 1.00 0.60 3 Clust6 0.50 0.80 3 1.00 0.80 3 1.00 0.80 3 1.00 0.80 3 1.00 0.80 3 Clust7 0.10 0.80 3 0.10 0.80 3 1.00 0.80 3 1.00 0.80 3 1.00 0.80 3

Table 5.4: The Lagragian Multiplier λ and corresponding P SNR ≈ 37db and rate constraint The multiplier λ can be obtained with certain QoS and rate constrains. For example, if the requirement of the QoS is 37db and average bit rate is less than 1162620.71/57 = 20396.85 bit, the λ is chosen as 2000. With the argument QoS being 37db and being 2000, the scaling parameters are obtained from the Table 5.4: 60

PSNR: 34db λ Clust1 X 0.10 300 Y 0.10 Q 8 X 0.10 600 Y 0.10 Q 8 X 0.10 2000 Y 0.10 Q 8 X 0.10 8000 Y 0.10 Q 8 X 1.00 14000 Y 0.10 Q 8

Clust2 0.10 0.10 5 0.10 0.10 5 0.10 0.10 5 0.50 0.20 6 0.50 0.20 6

Clust3 0.10 0.10 3 0.10 0.20 6 0.10 0.20 6 1.00 0.20 6 1.00 0.20 6

Clust4 0.10 0.40 6 0.10 0.40 6 0.10 0.40 6 1.00 0.40 6 1.00 0.40 6

Clust5 0.80 0.40 4 0.80 0.40 4 1.00 0.80 6 1.00 0.80 6 1.00 0.80 6

Clust6 0.10 0.70 5 0.80 0.70 5 1.00 0.75 5 1.00 0.75 5 1.00 0.75 5

Clust7 0.10 0.70 4 0.10 0.90 5 0.80 1.00 5 0.80 1.00 5 0.80 1.00 5

Table 5.5: The Lagragian Multiplier λ and corresponding P SNR ≈ 34db and rate constraint PSNR: 30db λ Clust1 X 0.10 300 Y 0.10 Q 18 X 0.10 500 Y 0.10 Q 18 X 0.10 3000 Y 0.10 Q 18 X 0.10 8000 Y 0.10 Q 18 X 0.10 12000 Y 0.10 Q 18

Clust2 0.10 0.10 12 0.10 0.10 12 0.10 0.20 14 0.10 0.20 14 0.10 0.20 14

Clust3 0.10 0.10 10 0.10 0.10 10 0.10 0.10 10 0.10 0.20 12 0.90 0.20 12

Clust4 0.10 0.20 10 0.10 0.20 10 0.90 0.20 12 1.00 0.20 12 1.00 0.20 12

Clust5 0.80 0.20 8 1.00 0.20 8 1.00 0.20 8 1.00 0.20 8 1.00 0.40 10

Clust6 0.10 0.40 8 0.10 0.40 8 0.80 0.70 10 1.00 0.80 10 1.00 0.80 10

Clust7 0.10 0.90 8 0.10 0.90 8 0.30 0.95 8 0.30 0.95 8 0.30 0.95 8

Table 5.6: The Lagragian Multiplier λ and corresponding P SNR ≈ 34db and rate constraint 61

[0.20, 0.20, 5.00], [0.10, 0.40, 4.00], [0.10, 0.20, 3.00], [0.50, 0.80, 4.00], [1.00, 0.60, 3.00], [1.00, 0.80, 3.00], [1.00, 0.80, 3.00]. They are correspond to Cluster1, Cluster2, Cluster3, Cluster4, Cluster5, Cluster6, Cluster7, respectively. Hence, the coding is performed with these scaling parameters. In this way, the ?fteen individual video encoding are performed and the real results of them are list in Table 5.3. The Rp presents the rate constrains corresponding complexity denoted Cp . In order to compare with the non-scalable encoder, we encode same video sequence with the no complexity control under the certain QoS requirement. Fig. 5.12 to Fig. 5.17 show the results of the reconstruction of the video sequence with both of the scalable and non-scalable encoder at QoS =37db, 34db, and 30db.

λ =300 50 no scaling scaling 45

40

PSNR

35

30

25

20

0

10

20

30 No.GoP

40

50

60

Figure 5.12: Comparison of the non scaling coding and scaling coding at D = 37db, λ = 300 In Fig. 5.12 and Fig. 5.13, the QoS is 37dB. Fig. 5.14, Fig. 5.15 and Fig. 5.16, Fig. 5.17 correspond to QoS is 34db and 30db, respectively. The results are listed in Table 5.7. 62

λ=2000 46 no scaling scaling 44

42

40

38

PSNR

36

34

32

30

28

26

0

10

20

30 No.GoP

40

50

60

Figure 5.13: Comparison of the non scaling coding and scaling coding at D = 37db, λ = 2000

λ=300 45 no scaling scaling

40

35

PSNR

30

25

20

15

0

10

20

30 No.GoP

40

50

60

Figure 5.14: Comparison of the non scaling coding and scaling coding at D = 34db, λ = 300

63

λ=2000 42 no scaling scaling 40

38

36

34

PSNR

32 30 28 26 24 0

10

20

30 No.GoP

40

50

60

Figure 5.15: Comparison of the non scaling coding and scaling coding at D = 34db, λ = 2000

λ=300 40 no scaling scaling

35

30

PSNR

25 20 15 0

10

20

30 No.GoP

40

50

60

Figure 5.16: Comparison of the non scaling coding and scaling coding at D = 30db, λ = 300

64

λ=2000 40 no scaling scaling

35

30

PSNR

25 20 15 0

10

20

30 No.GoP

40

50

60

Figure 5.17: Comparison of the non scaling coding and scaling coding at D = 30db, λ = 2000

PSNR 37db 34db 30db

Non-scalable R C 8.7077 × 105 1.3049 × 109 5 4.7797 × 10 1.3003 × 109 2.1925 × 105 1.2944 × 109

R 1.2381 × 106 8.0704 × 105 4.1154 × 105

Scaling1 C 9.7284 × 108 7.8879 × 108 6.8049 × 108

Ratio 0.41 0.22 0.15

R 1.1217 × 106 6.1917 × 105 3.0327 × 105

Scaling2 C 1.0676 × 109 1.0121 × 109 8.5476 × 108

Ratio 0.55 0.47 0.29

Table 5.7: The Energy Saving Ratio

65

The energy saving ratio is calculated:

Ratio11 = Ratio21 = Ratio31 =

0.9728×109 1.3049×109 0.7888×10 1.3003×109 0.6805×10 1.2944×109

9 9

3

≈ 0.41

3

Ratio12 = Ratio22 = Ratio32 =

1.0679×109 1.3049×109 1.0121×10 1.3003×109 0.8548×10 1.2944×109

9

3

≈ 0.55

3

P N SR = 37db P N SR = 34db P N SR = 30db

≈ 0.22

3

≈ 0.47

3

≈ 0.15

9

≈ 0.29

Note that the results in the table 5.7 are obtained with the theoretical evaluation. The achievable energy saving, in a practical setting, is discussed in Chapter 6.

66

Chapter 6 Joint Hardware and Video Encoder Adaptation

In this chapter, we study how the hardware-layer system scheduling and applicationlayer P-R-D control can be jointly designed to achieve energy saving in practical portable video communication systems.

6.1

Background

In Chapters 2 to 4, we have developed a P-R-D analysis framework, which extends the traditional R-D analysis by considering the third dimension of power consumption. We have demonstrated that the video encoding energy can be signi?cantly saved using P-R-D optimization. To realize this energy saving in practical system design, we need to consider the power consumption behavior of the hardware computing platform and study joint hardware-application adaptation. The DVS provides a mechanism to adapt the CPU operating voltage and clock frequency

67

according to the work load of the P-R-D video encoder. It translates complexityscalability and encoder complexity reduction into energy-scalability and encoder energy reduction. The encoder complexity minimization in previous chapters is performed at the application layer and does not consider the hardware-layer system scheduling. In cross-layer adaptation, the CPU clock frequency, circuit voltage supply in DVS, encoding bit rate, video encoding distortion and power consumption, are jointly optimized.

6.2

System Model

In order to perform cross-layer adaptation for the complexity encoder to save power, system models at the hardware and application layers are needed.

6.2.1

CPU Model

In the hardware layer, we consider the video application system with a single adaptive CPU that supports multiple speeds, f1 , f2 , . . . , fk , trading o? performance for energy. At a lower speed, the CPU consumes less power. However, a lower speed may increase the power consumption of other resources such as memory. Two major observations regarding CPU energy can be made here. First, the CPU is one of the most energy-consuming components in a portable video device. Depending on the application workload, For example, the CPU might consume up to 52% of the total energy in the Stargate [58]. Second, many processors used in embedded system design today, for example, Intel XScale PAX255 [24], Intel Pentium-M [25], and AMD Athlon [3], allow software to change the speed through the Advanced Con?guration and Power Interface (ACPI) standard [26], thereby enabling a cross-layer system 68

control. In general, there are two approaches to reduce CPU energy consumption. The ?rst one is dynamic power management (DPM) [34], which puts the idle processor into the lower-power sleep state. The second approach is dynamic frequency / voltage scaling (DVS) [53], which reduces the operating speed and voltage of the active processor. DPM will be used in Chapter 6 for more energy saving in our embedded system design. In this chapter, we focus on hardware adaptation through DVS (dynamic voltage scaling). The CPU power consumption typically consists of three major parts: the dynamic power, short circuit power, and leakage power, as shown in the following: Ccp × f × V 2 +

dynamic power

V × Isc

short circuit power

+ V × Ileak ,

leakage power

(6.1)

where Ccp is the loading capacitance, f is the speed, V is the voltage, Isc is the short circuit current, and Ileak is the leakage current [2]. When the speed decreases, the CPU can operate at a lower voltage and thus reduce its power consumption. Furthermore, the CPU power consumption model is generally a convex function of the speed. Consequently, the CPU energy (i.e., the product of the power and time) also decreases as the speed decreases even at the cost of longer execution time. In particular, for ideal processors, we assume that their power is dominated by the dynamic power and the voltage is proportional to the speed; that is, the CPU power is proportional to the cubic of the speed on which the result of power saving is calculated. Unfortunately, such assumption does not hold for real embedded systems such as Stargate which uses Intel XScale PXA255, whose power is not proportional to the cubic of the speed (see Table 6.1). Furthermore, a lower speed may increase the 69

energy consumption of other resources, such as memory and display. For example, when the CPU operates at a lower speed, an application needs to run for a longer time; consequently, it needs to use the memory for a longer time, often increasing the memory energy consumption. Since our goal is to save the total energy of the whole device, we are more interested in the total power consumption of the device. In general, the relationship between the clock speed f and the total system power P (f ) can be obtained via measurements like the data in the Table 6.1. Therefore, it is without loss of generality to assume that the total power decreases as the CPU speed decreases. This assumption holds for the system Stargate used in our project. Freq.(MHz) Power (W) 400 300 200 100 1.89 1.67 1.62 1.45

Table 6.1: Stargate Power consumption. Besides the fact that system power consumption is not proportional to the cubic of the clock speed, in practice, the clock speed of a real CPU cannot be adjusted continuously. Instead, the CPU often has several candidate clock speeds for us to choose. In this case, the CPU clock speed cannot be adjusted to the frequency that exactly matches how computational complexity the video application job demands.

6.2.2

Task Model

The nature of the video application task is periodic that release a job every period. For example, the new computational complexity control encoder sends the video segment coding jobs periodically. Each job has a soft deadline, typically de?ned as the end of the period. Each job consumes a certain amount of CPU cycles. Di?erent jobs of the same task, like the video segments mentioned in the previous chapter, will need di?erent amount of CPU cycles. In practice, the instantaneous 70

cycle demand for individual video segments may change signi?cantly over the task period. However, the probability distribution of cycle demand of the video segment coding is often stable because of both of the periodic nature of its jobs and the probability distribution of a video segment classi?ed into a cluster. In the other words, the probability distribution of cycle demand of each video segment coding can be estimated based on previous encoding statistics. The probability that a job demands no more than a certain amount of cycles in a period is denoted by

Pr (X ≤ x),

(6.2)

where X is the random variable associated with the number of cycles demanded by each job. In Chapter 4, an input video segment is classi?ed into one of those seven cluster {Fi }, i = 1, 2, . . . , 7 and the video segments in the i-th cluster require Ci operation cycals. Assuming that the time interval of each video segment is approximately TP . The probability Pr becomes

j

P r (X ≤

i=1

·

Ci )= TP

j

Fi .

i=1

(6.3)

Table 5.2 list the probability distribution of these clusters.

6.3

Application-Layer Video Encoder Adaptation

In system scheduling, we need to make sure that the performance and CPU requirements of each task are met. In particular, the scheduler used a so-called soft real-time algorithm to periodically allocate cycles to each task based on its statistical cycle demand, e.g., the 95th percentile of cycle demand of all of its jobs. The statistical 71

demand can be speci?ed based on the task’s characteristics. The purpose of this statistical, rather than the worst-case based, allocation is to improve CPU utilization while providing soft (statistical) performance guarantees. The scheduler then enforces the allocation through an earliest deadline ?rst (EDF) based algorithm. This algorithm makes admission control to ensure that the total CPU utilization (at the highest CPU speed fk ) of all concurrent tasks is no more than one, i.e.,

n

i=1

Ci /fk ≤ 1. Tpi

(6.4)

Here, the system has n tasks and the i-th task demands Ci cycles per period Tp . Among all tasks, the scheduler ?rst executes the task with the earliest deadline and positive allocation. As the task is executed, its cycle allocation is decreased by the number of cycles it consumes. When its allocation is exhausted, the task is preempted to run in best-e?ort mode for overrun protection. This algorithm only protects the system overrun in a best-e?ort mode rather than provides a mechanism to adjusting the CPU frequency for minimum power consumption. An improved algorithm called practical voltage scaling (PDVS) is used to minimize the total energy of the such multimedia system [62].

6.3.1

Practical Dynamic Voltage Scaling (PDVS)

The goal of PDVS is to minimize the total energy of the mobile device while providing soft performance guarantees to each multimedia task. For this, PDVS extends the soft real-time scheduling algorithm by changing the CPU speed of task execution. The purpose is to save energy since the CPU may run slower without a?ecting application performance. Assume that there are n tasks and each task is allocated 72

Ci cycles per period Tp and all concurrent tasks run at a uniform CPU frequency. The total CPU demand of all tasks is 1 fk

n

i=1

Ci ≤ 1. Tpi

(6.5)

The uniform frequency can be the lowest frequency that is no less than the total demand. It is min f : f ∈ f1 , f2 , . . . , fk , and f ≥

i=1 n

Ci . Tpi

(6.6)

If each task used its allocated cycles exactly, this uniform speed would minimize CPU energy due to the convex nature of the frequency-power relationship [15, 52]. There is also some energy waste although the frequency of CPU is adapted because the cycle allocation of a task is based on the some percentile of the number of cycles demanded by its jobs. For example, if this percentile is 95% then about 95% of its jobs will complete earlier. Early completion may eventually cause the CPU to be idle, which, in turn, results in energy waste since the device still consumes energy during the idle time. Note that the more energy can be saved by avoiding or reducing the idle time. Therefore, PDVS dynamically adapts the CPU speed of each job execution in a way that minimizes the total energy consumed during the job execution while bounding the job’s execution time. The reason for bounding the execution time is not to miss the deadline of the job or other jobs executed after the job. In other words, each job should ?nish within a certain amount of time. Therefore the time budget is allocated to each job. Speci?cally, if there are n concurrent tasks and each task is

73

allocated Ci cycles per period Tpi , then the i-th task is allocated Ci n Ci i=1 Tpi

Ti =

,

(6.7)

time units per period Tpi (i.e. for each of its jobs). That is, we distribute the time among all tasks based on their cycle demands. Note that if there is only a single task, then its time budget equals to its period. Using joint allocation of cycles and time, we can adapt the execution speed for each job as long as the job can use its allocated cycles within its allocated time. How is the execution speed dynamically adapted for jobs of each invitational task? Without loss of generality, we consider a speci?c task with cycle allocation C and time allocation T per period Tp . The basic idea behind PDVS is to minimize the total energy consumed during each job execution while limiting the job’s execution time within its time budget. To do this, PDVS sets a speed for each of the cycles allocated to the job. If a cycles x, 1 ≤ x ≤ C is executed at speed f (x), its execution time is

1 . f (x)

The energy consumed by the system during this time interval is p(f (x)) 1 × p(f (x)) = , f (x) f (x)

(6.8)

where p(f (x)) is the total power consumed by the whole device at speed f (x). Because multimedia tasks demand cycles statistically, the cycle x is executed with a probability and its expected energy is p(f (x)) , f (x)

F (x)

(6.9)

74

where F (x) is the tail of the distribution function:

F (x) = 1 ? Pr (X ≤ x).

(6.10)

By setting a speed for each of the cycles allocated the job, the minimization of the total energy consumed during the job execution while bounding its execution time is, more formally, with the speed adaptation problem as p(f (x)) F (x) min + f (x) x=1

C C

T?

x=1

F (x) s.t.

1 f (x)

C

· Pidle ,

(6.11)

x=1

1 ≤ T, f (x)

f (x) ∈ f1 , f2 , . . . , fk , 1 ≤ x ≤ k,

where T is the time budget allocated to the job and Pidle is the device power when the CPU is idle at the lowest speed. In practice, using the constrained optimization (6.12) is often not feasible because of two major reasons. First, the number of needed cycles, C , may be very large for multimedia tasks (e.g., in millions). Consequently, the computational overhead for solving the optimization problem may be unacceptably large. Second, we cannot set a speed for individual cycles since each speed change may take tens of microseconds [62]. To reduce the cost, a piece-wise approximation technique is applied that groups the allocated cycles and sets a same speed within each group. Speci?cally, we ?rst use a set of points, b1 , b2 , . . . , bk+1 , to divide the allocated cycles into k groups, each with size gi , 1 ≤ x ≤ k . We then ?nd a speed f (bi ) for each group [bi , bi+1 ). In this

75

way, we rewrite the above constrained optimization as

k

min

i=1

F (bi )p(f (bi )) gi + f (bi )

k

T?

i=1

gi

k

F (bi ) f (bi ) gi

· Pidle ,

(6.12)

s.t.

i=1

1 ≤ T, f (bi )

f (bi ) ∈ f1 , f2 , . . . , fk , 1 ≤ i ≤ k.

This leads to an optimization problem over a discrete set, which can be solved at least by brute-force search. Certainly, some fast algorithms, such as gradient search may be applied.

6.3.2

Cross-layer Adaptation for Single-Stream Video Encoding

In the following, we consider PDVS adaptation for a single video encoding task. In Section 6.3.1, the PDVS adaptation algorithm is considered for more generic adaptation problem with multiple CPU tasks. Within the context of single-stream video encoding, this PDVS adaptation problem can be further simpli?ed. First, in our system, video encoding is the dominant CPU task while other system management tasks require a very small amount of cycles and they can be considered as constant over the system running time. Second, the time budget for this one task is not changed considering no delay allowed and the adaptation is in coarse time granularity that is same as the time of a video segment encoding, denoted as TGoP . In this way, Eq. (6.13) can be simpli?ed as p(f (bi )) + f (bi ) 1 f (bi )

gj

TGoP ? gj

· Pidle ,

(6.13)

76

s.t.

gj

1 ≤ TGoP , f (bi )

j = 1, 2 , . . . , 7 ,

f (bi ) ∈ f1 , f2 , . . . , fk , 1 ≤ i ≤ k.

Here the TGoP is the time budget which equals to the total encoding time fo the video segment. gj is average computational complexity the j -th cluster of video segments. This is solution for the one task and no delay video coding system. Note that the function 6.14 represents the average energy consumption for encoding the video segments that are classi?ed into the j -th cluster. The total energy consumption of the whole video sequence is given by

7

PeB

=

j =1

Pj g j gj

1 p(f (bi )) + TGoP ? gj f (bi ) f (bi ) j = 1, 2 , . . . , 7 ,

· Pidle ,

(6.14)

s.t.

1 ≤ TGoP , f (bi )

f (bi ) ∈ f1 , f2 , . . . , fk , 1 ≤ i ≤ k. This is the power consumption PeB with cross-layer adaptation. If cross-layer adaptation is not used, the expected power consumption, PeA is given by

7

PeA

=

j =1

m P j (g k · p(fk )) = TGoP p(fk ), m gk ≈ TGoP . fk

j = 1, 2 , . . . , 7 ,

(6.15)

s.t.

m Let gi denote the maximum cycles in the period TGoP under clock speed fi . The

energy saving ratio between these two cases with and without cross-layer adaptation

77

is PeB = PeA =

7 j =1

Pj gj p(ff((bbii))) + TGoP ? gj f (1 · pidle bi ) TGoP · p(fk )

7

1 TGoP

7

Pj

j =1

p(f (bi )) gj gj · + TGoP ? f (bi ) p(fk ) f (bi ) · pidle p(fk ) ,

·

pidle p(fk ) (6.16)

=

j =1

Pj

m gi

gj p(f (bi )) gj · + 1? m m gi p(fk ) gj 1 ≈ TGoP , f (bi )

s.t.

j = 1, 2 , . . . , 7 ,

f (bi ) ∈ f1 , f2 , . . . , fk , 1 ≤ i ≤ k.

Considering the test video sequence used in the Chapter 5, which has 3585 frams partitioned into 57 video segments. It is compressed with P-R-D optimal scaling encoder which runs on a Stargate microprocessor. The Stargate’s power consumption model is listed in Table 6.1. The probability of cluster is shown in Table 5.2. With the encoding results in the Chapter 5 Table 5.7, the real energy saving ratio can be computed with the (6.17). Assuming that the power consumption levels of Stargate at running and idle modes are the same. The energy saving results are shown in Table 6.2. From these results, we can see that the real energy saving ratio is not as good as theoretic evaluation in the Chapter 5. This is because the system power consumption not only includes CPU energy consumption, but also energy consumption by other system components which cannot be scaled by DVS.

78

Cluster gj m 37db gj /gi (1) fk pi /pk gj m 37db gj /gi (2) fk pi /pk

Cluster1 9.1741 × 106 0.856 200 0.857 9.1741 × 106 0.856 200 0.857

Cluster2 1.1255 × 107 0.689 300 0.884 1.1255 × 107 0.689 300 0.884

Cluster3 9.0274 × 106 0.844 200 0.857 9.0274 × 106 0.844 200 0.8571

Cluster4 1.5083 × 107 0.923 300 0.884 1.7524 × 107 0.797 400 1

Cluster5 1.7511 × 107 0.796 400 1 1.7511 × 107 0.796 400 1

Cluster6 1.7847 × 107 0.812 400 1 2.0118 × 107 0.915 400 1

Cluster7 1.6248 × 107 0.995 300 0.884 2.0844 × 107 0.948 400 1

R-Ratio 0.9193

0.9602

Table 6.2: The Energy Saving Ratio1

Cluster gj m 34db gj /gi (1) fk pi /pk gj m 34db gj /gi 2 fk pi /pk Cluster1 7.7220 × 106 0.721 200 0.857 7.7220 × 106 0.721 200 0.857 Cluster2 7.7990 × 106 0.729 200 0.857 7.7990 × 106 0.729 200 0.857 Cluster3 7.8556 × 106 0.735 200 0.857 8.8660 × 106 0.828 200 0.857 Cluster4 1.1267 × 107 0.690 300 0.884 1.4609 × 107 0.894 300 0.884 Cluster5 1.4371 × 107 0.880 300 0.884 1.9672 × 107 0.895 400 1 Cluster6 1.4754 × 107 0.903 300 0.884 1.9632 × 107 0.893 400 1 Cluster7 1.4893 × 107 0.911 300 0.884 2.1972 × 107 0.999 400 1 R-Ratio 0.8760

0.9248

79

Table 6.3: The Energy Saving Ratio 2

Cluster gj m 30db gj /gi 1 fk pi /pk gj m 30db gj /gi 2 fk pi /pk Cluster1 7.8002 × 106 0.729 200 0.857 7.8002 × 106 0.729 200 0.857 Cluster2 8.3190 × 107 0.778 200 0.857 8.3190 × 107 0.778 200 0.857 Cluster3 7.8587 × 106 0.791 200 0.857 7.8587 × 106 0.791 200 0.857 Cluster4 8.9538 × 106 0.836 200 0.857 1.0655 × 107 0.996 200 0.857 Cluster5 1.2106 × 107 0.742 300 0.884 1.3056 × 107 0.799 300 0.884 Cluster6 1.1623 × 107 0.712 300 0.884 1.4888 × 107 0.911 300 0.884 Cluster7 1.3818 × 107 0.847 300 0.884 1.7642 × 107 0.802 400 1 R-Ratio 0.8684

0.8764

Table 6.4: The Energy Saving Ratio 3

Chapter 7 Energy-Aware Embedded Video Encoding System Design

In this chapter, we describe energy-aware embedded video encoding system design. An energy-aware embedded system, named DeerNet video sensor system, has been designed. In embedded video communication system design for wildlife activity monitoring, the system is expected to operate over an extended period of time, say a few weeks or even months. Therefore, energy minimization of video encoding is very critical. In this chapter, we introduce the tier architecture in our design. Based on this tier architecture, the non-reducible power can be saved. In Chapter 6, we have explained how the video encoding energy can be saved through optimum complexity control of video encoders and cross-layer adaptation. It should be noted that the results obtained in Chapter 5 and Chapter 6 are a little di?erent. This is because, former results are obtained in an ideal case which assumes that the circuit voltage and clock speed can be adjusted continuously for the whole system. It is assumed that the power consumed by circuit is proportional to the

80

cubic of the running speed. In fact, practical systems, e.g., Stargate, consist of many modules which implement a variety of system functions. All of these modules cannot be adjusted in the voltage and/or speed. The other reason is the module that can be adapted has some static power consumption. Those two reasons cause the power consumption of the real system is not proportional to the cubic of the running speed. So that, the system organization method provides other way that results in more e?cient power consuming in the system, speci?cally for embedded system. The tier system architecture is one possible approach to saving power even more.

7.1

Tier architecture

We observe that the power assumption in the real system can be categorized into two classes: reducible power and non-reducible power. The reducible power is the amount of power that can be eliminated from a running system while maintaining the ability to do computation. On the other hand, the non-reducible power of system can not be eliminated from a running system, which dominates the lifetime of the battery, see results in Chapter 5. Common sources of non-reducible power include the power supply, on-board oscillators, memory and I/O buses, and the limited range of frequency and voltage scaling [19]. The amount of non-reducible power, however, varies on di?erent platforms. Typically, platforms are carefully optimized to provide their promised functionality at the lowest possible energy cost, and platforms that provide less functionality have smaller non-reducible power. For example, the laptop provides more functions than PADs and power consumption of the laptop is also more than that of the PDA. For-

81

tunately, there is signi?cant overlap in the functionality provided by high-power and low-power platforms. Therefore, if the tasks running on the system have di?erent computational complexity, the tasks can be assigned to di?erent platforms which have di?erent functionalities and non-reducible power. The system is organized in tiers, such system is composed of a set of tiers, each with a set of capabilities and a power modes. The system as a whole executes tasks by waking the tier that has the capabilities to execute the task in the most e?cient manner. Fig. 7.1 shows the tier system architecture.

Figure 7.1: Tier Architecture The tiers can be more than two and in our system two di?erent platforms, Stargate and MICA , are constructed in a whole system. For such two tiers system, the nature of the power consumption of the system is analyzed here. Considering 82

the system works like that the Stargate is waked up by the Mote to take video, then it goes back to sleep and the MICA runs all time to determine whether it needs to wake up Stargate.

S S If fA is the fraction of time the tier Stargate spends awake, PA is the power it S expends while awake, 1 ? fA is the fraction of time the system spends asleep, and S M PS is the power it expends while suspended, and fA is the fraction of time the tier M M MICA spends awake, PA is the power it expends while awake, 1 ? fA is the fraction M of time the system spends asleep, and PS is the power it expends while suspended,

the power consumption of the tier system, is

S S S S M M PT = fA · PA + (1 ? fA ) · PS + fA · PA .

(7.1)

Assuming a system consist of only Stargate, The power consumption of the system is

S S PS = fA · PA .

(7.2)

The power saving ratio is 1600 × 1/3 + 107 × 2/3 + 45 × 1 PT = = 0.409. PS 1600 × 1

Raio =

(7.3)

The average power consumption of Stargate is about 1600 mW during active modes and 107 mW during sleep. The lower tier MIAC’s average power consumption is about 45 mW. We assume that the video device captures 8 hours of video data per day.

83

7.2

System Design

The design of a power-aware embedded system, e.g., DeerNet video sensor, is composed of three parts: the hardware, the underlying system architecture, and the model for distributing applications across the tiers. In general, the design is similar to many distributed systems; each tier is under autonomous control while decisions are made in a distributed manner. Client applications reside at the most powerful tier, and tasks that support those applications are distributed among the various tiers.

7.2.1

Hardware

The DeerNet video sensor is designed in a strictly hierarchical manner, and the higher tier is more powerful than the lower tier. Two tiers can communicate each other. Communication occurs via a local port and the tiers are connected to a common power source. Moreover, lower tier has the ability to draw its superior tier out of a suspended mode. An overview diagram of our design is shown in Fig. 7.2. The higher tier is based on the Crossbow’s Stargate platform, which has an XScale PXA255 CPU (400 MHz) with 32MB ?ash memory and 64MB SDRAM. PCMCIA and Compact Flash connectors are available on the main board. The Stargate also has a daughter board with Ethernet, USB and serial connectors. A Logitech QuickCam Pro 4000 webcam is connected through a USB connection for video capture [11]. The MICA plays the lower tier rule which works with a powerful Atmega128L micro-controller. The data, measurements, and other user-de?ned information are stored in a 4-Mbit serial ?ash (Atmel AT45DB041). It is connected to one of the

84

Figure 7.2: The Video Sensor Tiers USART on the ATMega128L [12]. This is very low-power platform. The average power consumption is about 45 mW. The two tiers are connected through a 51-pin port which includes a local link and a wakeup control (describing later).

7.2.2

Software

In the higher tier Stargate, the linux operating system manages all the tier resources including the power management and the video capture. The video capture module performances the new P-R-D optimization encoder which compresses the video sequence into CF card. The power management can put the Stargate into sleep if it is necessary. Once the Stargate is in sleep state, it will wait for the waking up signal from MICA at the lower tier. MICA at lower tier performs such tasks: determining the motion signal that re?ects the deer motion state, recording the deer motion state, communicating with the higher tier, and sending wakeup signal to higher tier Stargate. All tasks running 85

on MIAC are controlled by operating system called TinyOS. The software diagram is shown in Fig. 7.3.

Figure 7.3: Software constructure

7.3

7.3.1

Power Management

Signal Design

The goal of power management is to put the higher tier Starget to sleep at a proper time or situation. What is proper time or situation? DeerNet video sensor is used to track deer’s action and some situations do not need to record such as after sleep, repeating same action, and at night. In these situations and time, the sensor needs to go sleep. On Stargate, a timer is designed. It tracks what time it is and how long it will last. Therefore, once the timer shows night coming, the Stargate is put into sleep. Another signal comes from MICA which determines motion state. If it 86

determines that the deer is not moving, MICA sends a signal to Stargete that tells Stargate to stop capturing video and go to sleep. The clock signal can be obtained by the Stargate. It can be implemented by a read clock command. The motion wakeup signal comes MIAC through local link which is a UART included a 51 pin connecter that connects both Stargate and MIAC. The signal can transmit to Stargate as a massage. The video capture signal is triggered starting capture. So far, the signal is used to makes Stargate to sleep. How is the waking up signal generated? It is must come from MICA because the Stargate is sleeping. Waking up Stargate needs two steps. The ?rst step is to set a Stargate GPIO port into an interrupt allow state. Next, we need to let MICA triggers this GPIO port. Once the GPIO port is triggered by MICA, an interrupt routine is performed to wake up stargate. Fig. 7.4 shows the 51 pin connecter de?nition. The local link is provided by a UART port on pin 19 and pin 20 that communicates with MICA. Pin 5 connects to PXA 255 general input/output port 1 (GPIO1) which can generate an interrupt signal to wake up the Stargate.

7.3.2

State Control

Controlling Stargate state needs to initialize corresponding PAX 255 register e.g., interrupt control registers (ICMR, ICLR, ICCR, ICIP, ICPR) and power manager control registers (PMCR, PSSR, PWER, PRER, etc.) [24]. It is di?cult to read or write such registers directly. Fortunately, the operating system provides such control commands though the /proc ?le directory. In this way, the sleep and wakeup controlling become convenient. Making Stargate sleeping: The command is 87

Figure 7.4: De?nition of the Connecter Pins

88

echo 1 > /proc/sys/pm/suspend.

This command actually puts the Stargate to sleep and wait to awake up. In order to wake up a sleeping Stargate, the wakeup signal needs to set before Stargate in asleep. Because we set the pin5 GP1 M-INT1 as the waking up signal, the correlation register has to set with the command

echo w1 > /proc/platx/gpio/GPCTL.

We also put a timer to prevent the wakeup signal failure. If the wakeup signal fails, the timer can wake up Stargate also. A command to set the timer is

echo CONFIG_WAKEUPTIME=$time > /proc/platx/config.

89

Chapter 8 Conclusion and Future Work

8.1 Conclusion

In this work, based on power-rate-distortion (P-R-D) optimization, we develop a new approach for energy minimization for delay-tolerant video communication over portable devices. We have developed a parametric video encoding architecture which is fully scalable in power consumption. We have successfully extended the traditional R-D analysis by considering another dimension, the power consumption, and established the P-R-D analysis framework for embedded video encoding and communication under energy constraints. Using the P-R-D model, given a power supply level and a bit rate, the power-scalable video encoder is able to ?nd the best con?guration of complexity control parameters to maximize the video quality. The P-R-D analysis establishes a theoretical basis and provides a practical guideline in system design and performance optimization for wireless video communication under energy constraints. Theoretically, we demonstrated that extending the traditional rate-distortion (R-

90

D) analysis to P-R-D analysis gives us another dimension of ?exibility in resource allocation and performance optimization. We have analyzed the energy saving performance of P-R-D optimization. We have also developed an adaptive scheme to estimate the P-R-D model parameters and perform online energy optimization and control for real-time video compression. Our simulation results show that, for typical videos with non-stationary scene statistics, using the proposed P-R-D optimization technology, the video encoding energy can be signi?cantly reduced, especially for delay-tolerant video communication applications where the per-bit energy cost of wireless transmission is relatively low. This has a signi?cant impact on energye?cient portable video communication system design.

8.2

Future Work

In our future work, we shall study and evaluate resource allocation and energy minimization for real-time video encoding and communication on embedded computing platforms with dynamic voltage control. We shall also study how the P-R-D analysis could be used for joint hardware and encoder energy minimization. In addition, the follow issues need to be further investigated: (1) The training-classi?cation approach does not work well with some video segments with signi?cant change in scene activities. (2) We should exploit the communication delay as a system resource to further reduce the overall energy consumption. With such delay, the multi video encoding tasks are in the system that might cause the system work on the more e?cient state because the system can be optimized with function 6.13.

91

References

[1] Mpeg-2 video test model 5. ISO/IEC JTC1/SC29/WG11 MPEG93/457 (Apr 1993). [2] A.Chandrakasan, S.Sheng, and R.W.Brodersen. Low-power cmos digital design. IEEE Journal of Solid-State Circuits Vol.27 (Apr 1992), 473–484. [3] AMD. Mobile amd athlon 4 processor model 6 cpga data sheet. http://www.amd.com (Nov 2001). [4] A.M.Tourapis, O.C.Au, and M.L.Liou. Predictive motion vector ?eld adaptive search technique (pmvfast) - enhancing block based motion estimation. proceedings of Visual Communications and Image Processing 2001 (Jan 2001). [5] A.Ortega, and K.Ramchandran. Rate-distortion methods for image and video compression: An overview. IEEE Signal Processing Magazine Vol.15 No.6 (Nov 1998), 23 – 50. [6] A.R.Lebeck, X.Fan, H.Zeng, and C.S.Ellis. Power aware page allocation. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems Cambridge MA (Nov 2000). [7] B.Erol, F.Kossentini, and H.Alnuweiri. E?cient coding and mapping algorithms for software-only real-time video coding at low bit rates. IEEE Transactions on Circuits and Systems for Video Technology Vol.10 (Sep 2000), 843–856. [8] B.Li, and K.Nahrstedt. A control-based middleware framework for quality of service adaptations. IEEE J. Select. Areas Commun. vol.17 (Sep 1999), 1632–1650. [9] C.Efstratiou, A.Friday, N.Davies, and K.Cheverst. A platform supporting coordinated adaptation in mobile systems. In Proceedings of 4th IEEE Workshop on Mobile Computing Systems and Applications Callicoon NY (Jun 2003). 92

[10] C.Hughes, J.Srinivasan, and S.Adve. Saving energy with architectural and frequency adaptations for multimedia applications. [11] Crossbow Technology, I. Stargate developer’s guide. [12] Crossbow Technology, I. Mpr-mib users manual. [13] D.G.Sachs, S.Adve, and D.L.Jones. Cross-layer adaptive video coding to reduce energy on general-purpose processors. Proceedings of the International Conference on Image Processing ICIP ′03 Barcelona Spain (Sep 2003). [14] D.G.Sachs, W.Yuan, C.J.Hughes, A.F.Harris, S.V.Adve, D.L.Jones, R.H.Kravets, K.Nahrstedt, and Sidebar. Grace: A cross-layer adaptation framework for saving energy. IEEE Computer, special issue on PowerAware Computing vol.10 (Dec 2003), 50–51. [15] D.Mosse, H. R., and P.Alvarez. Dynamic and aggressive scheduling techniques for power-aware real-time systems. In Proc. of 22nd IEEE Real-Time Systems Symposium (Dec. 2001). [16] D.S.Turaga, der Schaar, M., and B.Pesquet-Popescu. Complexity scalable motion compensated wavelet video encoding. IEEE Transactions on Circuits and Systems for Video Technology Vol.15 (Aug 2005), 982–993. [17] et al, B. Agile application-aware adaptation for mobility. In Proceedings of 16th Symposium on Operating Systems Principles Saint Malo France (Dec 1997). [18] et al, P. The emergence of networking abstractions and techniques in tinyos. In Proceedings of First Symposium on Networked System Designe and Implementation San Francisco CA (Mar 2004). [19] G.CHINN, S.DESAI, E.DISTEFANO, K.RAVICHANDRAN, and S.THAKKAR. Mobile pc platforms enabled with intel centrino mobile technology. Intel Technology Journal Vol.7 (May 2003). [20] H.Zeng, X.Fan, C.Ellis, A.Lebeck, and A.Vahdat. Ecosystem: Managing energy as a ?rst class operating system resource. In Proceedings of 10th Intl. Conf. on ASPLOS San Jose CA (Oct 2002). [21] I.M.Pao, and M.T.Sun. Statistical computation of discrete cosine transform in video encoders. Journal of Visual Communication and Image Representation Vol.9, no.2 (Jun 1998), 163–170.

93

[22] Inc, A. Amd powernow!tm technology platform design guide for embedded processors. http://www.amd.com/epd/processors . [23] Inc, I. Intel xscale technology. [24] Intel. Intel pxa255 processor: Developer manual. [25] Intel. Pentium m processor. http://developer.intel.com/design/mobile/ datashts/261203.pdf (Apr 2004). [26] ITU-T. Video coding for low bit rate communications. ITU-T Recommendation H.263 version 1 and version 2 (Jan 1998). [27] J.Chen, and K.J.R.Liu. Low-power architectures for compressed domain video coding co-processor. IEEE Transactions on Multimedia Vol.2 (Jun 2000), 111–128. [28] J.Flinn, Lara, E., M.Satyanarayanan, D.S.Wallach, and W.Zwaenepoel. Reducing the energy usage of o?ce applications. In Proceedings of Middleware 2001 Heidelberg Germany (Nov 2001). [29] J.Lorch, and A.Smith. Improving dynamic voltage scaling algorithms with pace. Proceedings of the ACM SIGMETRICS 2001 Conference (Jun 2001). [30] J.Lorch, and A.Smith. Operating system modi?cations for task-based speed and voltage scheduling. In Proceedings of the 1st International Conference on Mobile Systems Applications and Services San Francisco CA (May 2003). [31] J.Ostermann, J.Bormans, P.List, D.Marpe, M.Narroschke, F.Pereira, T.Stockhammer, and T.Wesi. Video coding with h.264/avc: Tools, performance, and complexity. IEEE CIRCUITS AND SYSTEMS MAGAZINE (FIRST QUARTER 2004). [32] J.Villasenor, C.Jones, and B.Schoner. Video communications using rapidly recon?gurable hardware. [33] K.Hyungjoon, N.Kamaci, and Y.Altunbasak. Low-complexity ratedistortion optimal macroblock mode selection and motion estimation for mpeglike video coders. IEEE Transactions on Circuits and Systems for Video Technology Vol.15 (Jul 2005), 823–834. [34] L.Benini, A.Bogliolo, and G.D.Micheli. A survey of design techniques for system-level dynamic power management. IEEE Transactions on VLSI Systems Vol.8 (Jun 2000.).

94

[35] M.Mesarina, and Y.Turner. Reduced energy decoding of mpeg streams. In Proceedings of SPIE Multimedia Computing and Networking Conference San Jose CA (Jan 2002). [36] P.A.Chou, and A.Sehgal. Rate-distortion optimized receiver-driven streaming over best-e?ort networks. Packet Video Workshop Pittsburg PA. (Apr 2002). [37] page, W. http://www.abiresearch.com / products / market research / mobile broadcast video. [38] page, W. Panel report of nsf workshop on sensors for environmental observatories. Seattle, WA. Nov.30 (Dec.2 2004), http://www.wtec.org/seo. [39] P.Agrawal, J-C.Chen, S.Kishore, P.Ramanathan, and K.Sivalingam. Battery power sensitive video processing in wireless networks. Proceedings IEEE PIMRC’98 Boston (Sep 1998). [40] P.Pillai, and K.G.Shin. Real-time dynamic voltage scaling for low-power embedded operating system. Proceedings of 18th Symposium on Operating System Principles, Ban?, Canada (Oct 2001). [41] P.Pillai, and K.G.Shin. Real-time dynamic voltage scaling for low-power embedded operating systems. In Proceedings of 18th Symposium on Operating Systems Principles Ban? Canada (Oct 2001). [42] R.Min, T.Furrer, and A.Chandrakasan. Dynamic voltage scaling techniques for distributed microsensor networks. IEEE Computer Society Workshop on VLSI (Apr 2000), 43–46. [43] R.Rajkumar, C.Lee, J.Lehoczky, and D.Siewiorek. A resource allocation model for qos management. In Proceedings of 18th IEEE Real-Time Systems Symposium San Francisco CA (Dec 1997). [44] S.Banachowski, and S.Brandt. The best scheduler for integrated processing of beste?ort and soft real-time processes. [45] S.Gurumurthi, A.Sivasubramaniam, and M.Kandemir. Drpm: Dynamic speed control for power management in server class disks. In Proceedings of 30th Annual International Symposium on Computer Architecture San Diego CA (Jun 2003). [46] S.Iyer, L.Luo, R.Mayo, and P.Ranganathan. Energy-adaptive display system designs for future mobile environments. In Proceedings of International 95

Conference on Mobile Systems Applications and Services San Francisco CA (May 2003). [47] S.M.Akramullah, I.Ahmad, and M.L.Liou. Optimization of h.263 video encoding using a single processor computer: performance tradeo?s and benchmarking. IEEE Transaction on Circuits and System for Video Technology Vol.11 (Aug 2001), 901 – 915. [48] S.Mohapatra, and N.Venkatasubtramanian. Power-aware recon?gure middleware. In Proceedings of IEEE 23nd International Conference on Distributed Computing Systems Providence RI (May 2003). [49] S.V.Adve, A.F.Harris, C.J.Hughes, D.L.Jones, R.H.Kravets, K.Nahrstedt, D.G.Sachs, R.Sasanka, J.Srinivasan, and W.Yuan. The illinois grace project: Global resource adaptation through cooperation. Proceedings of the Workshop on self-Healin ,Adaptive and self-managed System (June 2002). [50] T.Berger. Rate Distortion Theory. Prentice Hall, Englewood Cli?s, NJ, 1984. [51] T.Burd, and R.Broderson. Processor design for portable systems. Journal of VLSI Signal Processing Vol.13 No.2 (Aug 1996), 203–222. [52] T.Ishihara, and H.Yasuura. Voltage scheduling problem for dynamically variable voltage processors. In Proc. of Intl. Symp. on Low-Power Electronics and Design (1998). [53] T.Pering, T.Burd, and R.Brodersen. Voltage scheduling in the lparm microprocessor system. [54] T.Sikora. The mpeg-4 video standard veri?cation model. IEEE Transaction on Circuits and System for Video Technology Vol.7 (Feb 1997), 19 – 31. [55] T.Wiegand. Text of committee draft of joint video speci?cation (itu-t rec. h.264 — iso/iec 14496-10 avc). Document JVTC167 3rd JVT Meeting Fairfax Virginia USA (May 2002). [56] V.Akella, der Schaar, M., and Kao, W.-F. Proactive energy optimization algorithms for wavelet-based video codecs on power-aware processors. Proceedings of of IEEE International Conference on Multimedia and Expo (Jul 2005), 566– 569. [57] V.Grassi, and R.Mirandola. Derivation of markov models for e?ectiveness analysis of adaptable software architectures for mobile computing. IEEE Transactions on Mobile Computing vol.2 (Jun 2003). 96

[58] V.Raghunathan, M.Srivastava, T.Pering, and R.Want. Stargate: Energy management techniques. Network and Embedded System Lab (NESL) and Ubiquirt SRP, Intel Reasearch . [59] V.Raghunathan, P.Spanos, and M.Srivastava. Adaptive power-?delity in energy aware wireless embedded systems. In Proceedings of IEEE Real Time Systems Symposium London UK (Dec 2001). [60] W.P.Burleson, P.Jain, and S.Venkatraman. Dynamically parameterized architecture for power-aware video coding: Motion estimation and dct. Proceedings of the Second USF International Workshop on Digital and Computational Video (2001). [61] W.Yuan, and K.Nahrstedt. Integration of dynamic voltage scaling and soft real-time scheduling for open mobile systems. In Proceedings of 12th International Workshop on Network and OS Support for Digital Audio and Video Miami Beach FL (May 2002). [62] W.Yuan, and K.Nahrstedt. Practical voltage scaling for mobile multimedia device. MM’04 (Oct 2004). [63] X.Lu, Y.Wang, and E.Erkip. Power e?cient h.263 video transmission over wireless channels. Proceedings of 2002 International Conference on Image Processing (Sep 2002). [64] Z.-L.He, C.-Y.Tsui, K.-K.Chan, and M.Lion. Low-power vlsi design for motion estimation using adaptive pixel truncation. IEEE Trans. On Circuits and System for Video Technology vol.10 (Aug 2000). [65] Z.He, and S.K.Mitra. A uni?ed rate-distortion analysis framework for transform coding. IEEE Transactions on Circuits and System on Video Technology vol.11 (Dec 2001), 1221–1236. [66] Z.He, Y.Liang, L.Chen, I.Ahmad, and D.Wu. Power-rate-distortion analysis for wireless video communication under energy constraint. IEEE Transaction on Circuits and System for Video Technology (May 2005).

97

- EMBEDDED SYSTEM DESIGN
- Design and Optimization on Dynamic Power System for Self-Powered Integrated Wireless Sensin
- Power-rate-distortion analysis for wireless video communication under energy constraint
- MultiFrequency Wrapper Design and Optimization for Embedded Cores Under Average Power Const
- A Low-Power Design of Motion Estimation Blocks for Low Bit-Rate Wireless Video Communicatio
- Design and Optimization of a Class-E Amplifier for a Loosely Coupled Planar Wireless Power System
- Rate-Distortion Models for FGS-encoded Video Sequences
- Rate_Distortion Optimized Motion_Compensated Prediction for Packet Loss Resilient Video Coding
- RATE DISTORTION ANALYSIS OF LEAKY PREDICTION LAYERED VIDEO CODING USING QUANTIZATION NOISE
- Code compression for low power embedded system design
- Embedded System Architecture Design Based on Real-Time Emulation
- Virtual ARM Platform for Embedded System Developers
- Introduction to Bootloader and Embedded System Development
- EMBEDDED SYSTEM DESIGN
- System-Level Power Management An Overview

更多相关文章：
更多相关标签：