Introduction
Low battery life is one of the most serious issues currently plaguing mobile devices in general and Ultrabook™ devices and tablets specifically. Users have become accustomed to streaming multimedia content to their mobile devices “on-demand” from content servers in the cloud. Because these devices have limited battery capacity, energy efficiency is important. Cyberlink PowerDVD 10* (PowerDVD*) is one of the top players in the industry for HD, and 3D movie playback. This app is often included as a pre-bundled application from OEMs. In this case study, we showcase how Intel and Cyberlink collaborated to optimize the PowerDVD* application to give best-in-class experience on Intel devices.
First, we'll talk about the challenges that Cyberlink encountered when adding content streaming features to PowerDVD and the tools and techniques Intel used to improve the power consumption of PowerDVD.
Then, we'll discuss the power consumption profile of a Cyberlink PowerDVD streaming media application and its impact on battery life for mobile devices. We also provide an analysis of PowerDVD behavior to identify issues such as decoding on CPU, large numbers of context switches, high interrupt rates, etc., causing increased power consumption. Finally, we'll provide the data that shows the reduced power consumption following optimization.
The optimization was a huge success. The Intel team was able to make the following improvements to PowerDVD:
Definitions
Acronym | Definition |
---|---|
BLA | Battery Life Analyzer |
GPU | Graphics processing unit |
WPA | Windows Performance Analyzer |
DLNA Server | Digital Living Network Alliance Server |
HD | High density |
SoC | System on Chip |
FPS | Frames per second |
SDK | Software development kit |
SKU | Stock Keeping Unit |
The Challenges of Optimizing Battery Life
PowerDVD offers new features for organizing, streaming media, mobile devices, and social media. In addition to functioning on a client, the latest software can turn a device into a DLNA server and stream multimedia content from a PC across a network to other devices. It can also stream content from external content servers. Adding content streaming came with a price, however. New capabilities, such as HD streaming, required running more processes, consuming much more memory and CPU cycles. This took a toll on battery life. We needed to answer the following questions:
After two months and three iterations of analysis and validation, the engineering teams improved battery life by making the following changes:
The following describes the process and tools that resulted in the optimized version of PowerDVD.
Optimization of Cyberlink PowerDVD for Power Consumption
Test System Configuration:
Validation and analysis showed:
First Step – Validation
To understand and address PowerDVD's impact on battery life, we used Intel Power Gadget and Battery Life Analyzer (BLA) to validate the application's SoC power usage. Figure 1 shows the Intel Power Gadget's UI on a Windows platform.
Figure 1. Intel® Power Gadget UI on Windows* Platform
As part of our validation of PowerDVD, we used Intel Power Gadget to determine power impacts during playback. Figure 2 shows the power output Intel Power Gadget recorded.
PowerDVD's power usage was ~6 W of SoC power during playback. Intel recommends a maximum of ~2.0 W on 4th generation Intel processors (low power processors typically used in Ultrabook devices).
Figure 2. Processor Power Usage during PowerDVD* Playback
To gain deeper insight into what other activities were affecting power, we used the Battery Life Analyzer (BLA) tool to understand the impact of media playback on residencies. Understanding residency is important as changing the SoC SKU can impact power.
BLA is a power management analysis tool developed by Intel to identify issues that impact battery life. BLA helps to identify a wide range of issues during software analysis such as:
Figure 3 shows package residency during 1080p HD video playback using Cyberlink PowerDVD.
Figure 3. Package Residency during 1080p HD Video Playback using PowerDVD*
The package residency includes CPU, Graphics, and UnCore events. More time in package C0 results in higher SoC power. Expected package C0 for Cyberlink PowerDVD 1080p playback is ~20% on 4th generation U-Processor. As we can see from Figure 3, package residency is far higher than it should be.
Both Intel Power Gadget and BLA confirmed higher power usage and ~4 hrs. of battery life on 42 Whr (Watt-hours) battery capacity with ~6 W SoC+3 W of display and 2+ W for other components.
Our next step was to analyze the application for power optimization.
Second Step – Analysis
For the analysis phase, we used two tools:
The following tables summarize the results of the analysis, which showed definite room for improvement.
Table 1. Intel® Power Gadget and BLA Results
Actual Results | Expected Results |
---|---|
Package C0 is pegged at 100% during media playback | Package C0 should be at 20% during media playback |
SoC power using Intel® Power Gadget is ~6 W | SoC power should be ~1.7 W on 4th generation Intel processor |
Table 2. Intel® Vtune™ and WPA Results
Analysis Tool | Observations |
---|---|
Intel VTune results |
|
Windows Performance Analyzer | Frequent wakeups (5 msec) occurred- expected frequency is 10 msec with audio playback |
The next figures provide a walkthrough of some of the important screenshots from our analysis.
Intel VTune analyzer was used to validate the PowerDVD application for the presence of spin waits, the presence of hardware acceleration, and hotspots (a micro-architecture issue). Figure 4 shows the steps for collecting the graphics call stacks.
Figure 4. VTune™ UI for Analyzing DirectX* Pipeline Events
Figure 5 shows the VTune summary with significant time spent in spin loop. GPU Usage shows no codec usage. Most of the time spent in the GPU is for display and other pre-processing algorithms during playback.
Figure 5. VTune™ Summary showing Spin Loop time
Digging deeper into the analysis, Intel VTune shows high CPU utilization during media playback, and instances where VSync (the red highlights in Figure 5) and GPU software queue are not occurring every ~33 msec (30 FPS playback). This analysis shows software glitches during media playback.
Figure 6. VTune™ Summary Report
Looking at Figure 7, the summary report confirms an inconsistent frame rate over time. The FPS varies for 30 FPS movie playback between 0-60 FPS. The chart shows the total number of frames executed in an application with a specific frame rate. A high number of slow or fast frames signals a performance bottleneck. The goal is to optimize the code to keep the frame rate constant, for example, from 30 to 60 FPS.
Figure 7. VTune™ analysis of Frame Rates
Next, we used the Windows Performance Analyzer (WPA) tool to analyze the application for wakeup activities, interrupts, and context switches. Figure 8 shows using CPU-based Intel® SSE instructions for H264 decode. It is more efficient to offload this work on to the GPU than to run it on the CPU.
Figure 8. WPA Analysis of Wakeup Activities, Interrupts, and Context Switches
WPA also shows wakeup activities from PowerDVD during playback. Figure 9 displays the two PowerDVD threads, both running at 10 msec. The two threads are not coalesced, which causes the overall system to wake up at a 5 msec timer interval. Figure 10 shows the call stack with sleep loop Win32* API being called every 10 msec interval.
Figure 9. WPA thread analysis
Figure 10. WPA call stack with sleep loop analysis
Table 3 reveals significant reduction in package residency after optimization.
Table 3. Validating Package Residency after Optimization
C-state Counters | Average (%) Before Optimization | Average (%) After Optimization |
---|---|---|
PackageC0-C1 | 100% | 20.18% |
PackageC0-C2 | 0% | 8.29% |
PackageC0 C3 | 0% 0% | 0.19% |
PackageC0 C6 | 0% | 1.91% |
PackageC0 C7 | 0% | 69.43% |
Optimization Results/Validation
The following tables show the “before” and “after” results:
Table 4. Intel® Power Gadget and BLA: Before and After1
Before Optimization | After Optimization |
---|---|
Package C0 is pegged at 100% during media playback | Package C0 is reduced to 20% |
SoC power is ~6 W | SoC power reduced to ~1.8 W on test system |
Table 5. Intel® VTune™ Amplifier and WPA Results: Before and After1
Before | After | |
---|---|---|
Intel® VTune™ Amplifier |
|
|
Windows Performance Analyzer | Frequent wakeups (5 msec) – expected frequency is 10 msec with audio playback | Sleep thread removed – reduced wakeups by 2x (5 msec to 10 msec) |
Battery Life Analyzer | Package residency 100% | Package residency ~20% |
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance
We optimized by:
The first task was to use the Intel Media SDK for offloading decode to graphics which will provide better efficient/watt usage of Intel HD graphics. The pseudo code in Figure 11 provides an example of a simple use of Intel Media SDK to offload a stream of frame to graphics.
Figure 11. Intel® Media SDK code snippet – offloading a frame to graphics.
Once we offloaded to graphics using the Intel Media SDK, we ran PowerDVD and measured the results using Intel VTune Amplifier. Compared to Figure 5 where we didn't see any codec usage, we now see Video Enhancement in the summary (Figure 12).
Figure 12. Intel® VTune™ Amplifier Summary result
Examining other Intel VTune graphics views, we verified that by using Intel Media SDK [to do what?] use of frame decoded on the GPU vs. on the CPU. Figure 13 shows a batch of frames being decoded after ~20 msec on GPU. Offloading the decode work to the GPU helped to reduce CPU utilization by ~25% on the test system.
Figure 13. Frame decoding after ~20 msec on the GPU
To verify our optimization of offloading graphics, we ran Intel Power Gadget. Compared to the baseline result shown in Figure 2, we saw ~2 W of power saving just by performing graphics offloading (Figure 14).
Figure 14. Power Savings resulting from Graphics Offload
We made some good progress, but ~4 W was not low enough. As stated earlier, the goal for streaming media 1080p playback is ~1.7 W of SoC/package power.
The next step was to find other CPU-based optimizations. Initial analysis showed sleep loop calls from two threads (non-coalesced) waking the CPU every 5 msec. CyberLink engineers needed to remove the sleep threads from their application. However, this was one of the most difficult changes since it required modifying the structure of the application. Figure 15 shows wakeup activities increase to 10 mse after periodic activities were removed.
Figure 15. Optimized Cyberlink PowerDVD* after removing periodic activities
Removing periodic activities revealed a ~800 mW saving. With current optimizations, 1080p HD streaming playback SoC power went from ~6 W to 2.8 W, but additional optimizations still had to be done to reach the 1.7 W goal seen in best-in-class applications.
Figure 16. Power Optimizations down to ~2.8 W
The next step was to reduce extra memory copies using an overlay. With the overlay, the overall package power was reduced by ~400 mW. Figure 17 shows power was reduced to ~1.8 W from ~6 W.
Figure 17. Cyberlink PowerDVD* at final Power Consumption (1.8 W)
With that, the most important optimization goals had been achieved, and Intel and Cyberlink engineers deemed the project a success.
Close collaboration between Cyberlink and Intel helped to complete the optimization in two months with full validation. The final product with all optimizations was released to OEMs six months from when we started.
Conclusion
The Intel and PowerDVD engineers used several tools including Intel VTune and Microsoft Windows Performance Analyzer to reach the optimum low-power playback. The collaboration included knowledge sharing on tools with weekly analysis/meetings to meet the battery life goal before the release deadline.
Several iterations were completed before the team was satisfied with their results (PowerDVD consumes ~1.8 W down from ~6 W.) Intel and Cyberlink engineers faced the challenge of keeping the quality of playback the same before and after optimization. Each optimization required a validation and analysis process before it could pass the Cyberlink team's internal quality tests. Thus, every change was tracked and user experience metrics (power and performance) were evaluated.
The following optimizations were found to work the best for achieving the optimization goals, but as noted above, these were accomplished over several iterations:
The combined efforts between the Intel and CyberLink PowerDVD team resulted in optimizing their streaming media playback application to reach the best-in-class goal.
Source: https://software.intel.com/en-us/articles/optimizing-cyberlink-powerdvd-10-improves-battery-life
For more such windows resources and tools from Intel, please visit the Intel® Developer Zone