GPU Accelerated Compositing in Chrome

现在的位置: 首页 > 综合 > 正文

GPU Accelerated Compositing in Chrome

2013年01月24日 ⁄ 综合 ⁄ 共 13780字 ⁄ 字号小中大 ⁄ 评论关闭

Vangelis Kokkevis

Summary

This document provides background and details on the implementation of hardware accelerated compositing in Chrome.

Introduction

Traditionally, web browsers relied entirely on the CPU to render web page content. With capable GPUs becoming an integral part of even the smallest of devices and with rich media such as video and 3D graphics playing
an increasingly important role to the web experience, attention has turned on finding ways to make more effective utilization of the underlying hardware to achieve better performance and power savings. There's clear indication that getting the GPU directly
involved with compositing the contents of a web page can result in very significant speedups. The largest gains are to be had from eliminating unecessary (and very slow) copies of large data, especially copies from video memory to system memory. The most obvious
candidates for such optimizations are the <video> element and the WebGL canvas, both of which can generate their results in areas of memory that that CPU doesn't have fast access to.

Delegating compositing of the page layers to the GPU provides other benefits as well. In most cases, the GPU can achieve far better efficiency than the CPU (both in terms
of speed and power draw) in drawing and compositing operations that involve large numbers of pixels as the hardware is designed specifically for these types of workloads.

WebKit Rendering Basics

The source code for the WebKit rendering engine is vast and complex (and somewhat scarcely documented!). For the purposes of this document I've extracted some of the the important basic
blocks.

Nodes and the DOM tree

In WebKit, the contents of a web page are internally stored as a tree of Node objects
called the DOM tree. Each HTML element on a page as well as text that occurs between elements is associated with a Node. The top level Node of the
DOM tree is always a Document Node.

RenderObjects, the Render Tree and the GraphicsContext
Each node in the DOM tree that produces visual output has a corresponding RenderObject. RenderObject's are stored in a parallel tree structure, called
the Render Tree. A RenderObject knows how to present (paint) the contents of the Node on a display surface. It does so by issuing the necessary draw
calls to the GraphicsContext associated with the page renderer. The GraphicsContext is ultimately responsible for writing the pixels on the bitmap
that gets displayed to the screen.

RenderLayers

Each RenderObject is associated with a RenderLayer either directly
or indirectly via an ancestor RenderObject. RenderObjects that share the same coordinate space (e.g. are affected by the same CSS transform) typically belong to the same RenderLayer. RenderLayers exist so that the elements of the page are composited in the
correct order to properly display overlapping content, semi-transparent elements, etc. There's a number of conditions that will trigger the creation of a new RenderLayer for a particular RenderObject, as defined inRenderBoxModelObject::requiresLayer() and
overwritten for some derived classes. In general a RenderObject warrants the creation of a RenderLayer if:

It's the root object for the page

It has explicit CSS position properties (relative, absolute or a transform)

It is transparent

Has overflow, an alpha mask or reflection

Corresponds to <canvas> element that has a 3D (WebGL) context

Corresponds to a <video> element

Notice that there isn't a one-to-one correspondence between RenderObject's and RenderLayer's. A particular RenderObject is associated either with the RenderLayer that was created for it if there is one or with the
RenderLayer of the first ancestor that has one.

RenderLayers form a tree hierarchy as well. The root node is the RenderLayer corresponding to the root element in the page and the descendants of every node are layers visually contained within the parent layer. The children of each RenderLayer are kept into
two sorted lists both sorted in ascending order, the negZOrderList containing child layers with negative z-indices (and hence layers that go below
the current layer) and the posZOrderList containt child layers with positive z-indices (layers that go above the current layer).

The Rendering paths

WebKit fundamentally renders a web page by traversing the RenderLayer hierarchy starting from the root layer. The WebKit codebase contains two distinct code paths for rendering the contents of a page, the software
path and hardware accelerated path. As the name suggests, the hardware accelerated path is
there to make use of GPU acceleration for compositing some of the RenderLayer contents and code for it lives behind the ACCELERATED_COMPOSITING compile time flag. Currently Chrome uses the software path exclusively. Safari on the Mac (and most likely iOS)
follows the hardware accelerated path which makes heavy use of Apple's proprietary CoreAnimation API. It's also worth noting that 3D CSS transforms are only available with the hardware accelerated path as a pure software implementation would be prohibitively
slow.

The Software Path

In the software path, the page is rendered by sequentially painting all the RenderLayers, from back to front, directly into a single destination bitmap. The RenderLayer
hierarchy is traversed recursively starting from the root and the bulk of the work is done in RenderLayer::paintLayer() which performs the following basic steps (the list of steps is simplified here for clarity):

Determines whether the layer intersects the damage rect for an early out.

Recursively paints the layers below this one by calling paintLayer() for the layers in the negZOrderList.

Asks RenderObjects associated with this RenderLayer to paint themselves. This is done by recursing down the RenderTree starting with the RenderObject which created the
layer. Traversal stops whenever a RenderObject associated with a different RenderLayer is found.

Recursively paints the layers above this one by calling paintLayer() for the layers in the posZOrderList.

RenderObjects paint themselves into the destination bitmap by issuing draw calls into the shared GraphicsContext (implemented in Chrome via Skia for win/linux). Note
that the GraphicsContext itself has no concept of layers with the exception of the case where a layer is semi-transparent. In that case the RenderLayer calls GraphicsContext::beginTransparencyLayer() before asking the RenderObjects to draw. In the Skia implementation,
the call to beginTransparencyLayer() causes all subsequent draw calls to render in a separate bitmap, which gets composited with the original one when the layer drawing is complete and a matching call to endTransparencyLayer() is made by the GraphicsLayer.

The Hardware Accelerated Path

The difference between the hardware accelerated path and the software path is that, when hardware acceleration is enabled, some (but not all) of the RenderLayer's get
their own backing surface (compositing layer) into which they paint instead of drawing directly into the common bitmap for the page. A subsequent compositing pass composites all the backing surfaces onto the destination bitmap. The compositor is responsible
for applying the necessary transformations (as specified by the layer's CSS transform properties) to each bitmap before compositing it. Since painting of the layers is decoupled from compositing, invalidating one of these layers only results in repainting
the contents of that layer alone and recompositing. In contrast, with the software path, invalidating any layer requires repainting all layers (at least the overlapping portions of them) below and above it which unnecessarily taxes the CPU.

While in theory every single RenderLayer could paint itself into a separate backing surface to avoid unnecessary repaints, in practice this could be quite wasteful in
terms of memory (vram especially). In the current WebKit implementation, one of the following conditions must to be met for a RenderLayer to get its own compositing layer (see RenderLayerCompositor::requiresCompositingLayer()):

Layer has 3D or perspective transform CSS properties

Layer is used by <video> element using accelerated video decoding

Layer is used by a <canvas> element with a 3D context

Layer uses a CSS animation for its opacity or uses an animated webkit transform

Layer has a descendant that has a compositing layer

Layer has a sibling with a lower z-index which has a compositing layer (in other words the layer is rendered on top of a composited layer)

Two significant implications of WebKit's implementation of accelerated compositing are:

Even with hardware acceleration enabled, pages that don't contain a <video> or WebGL elements and don't make use of 3D CSS transformations/animation use the software path.

Pages with composited RenderLayer's will always render via the compositor.

H/W Accelerated Compositing

Code related to the compositor lives inside WebCore, behind the USE(ACCELERATED_COMPOSITING) guards. Part of the code is shared among all platforms and part of it is Chrome-specific. Thankfully, the WebKit code is
structured such that implementing the compositor for Chrome required no changes to the core WebKit codebase and all the Chrome-specific code are provided in platform-specific source files that live in platform/graphics/chromium much the same way we've done
with the GraphicsContext and GraphicsContextSkia.

With the addition of the accelerated compositor, in order to eliminate costly memory transfers, the final rendering of the browser's tab area is handled directly by the GPU. This is a significant departure from the
current model in which the Renderer process passes (via IPC and shared memory) over a bitmap with the page's contents to the Browser process for display:

Software Rendering Architecture

With the current un-accelerated implementation, compositing of the RenderLayer's takes place in the WebKit code (via Skia or CG) and runs on the CPU. In the h/w accelerated architecture, compositing of the h/w accelerated
layers with the rest of the page contents happens on the GPU via calls to the platform specific 3D APIs (GL / D3D). The code ultimately responsible for making these calls is encapsulated in a library running inside the Renderer process, the Compositor.
The Compositor library is essentially using the GPU to composite rectangular areas of the page into a single bitmap.

The GPU Process

Restricted by its sandbox, the Renderer process (where WebKit and the compositor live) cannot directly issue calls to the 3D APIs provided by the OS (GL/D3D). For that reason we use a separate process to do the rendering.
We call this process the GPU Process. The GPU process is specifically designed to provide access to the system's 3D APIs from within the Renderer
sandbox or the even more restrictive Native Client "jail". It works via a client-server model with the client being the code running in the restricted environment and the server the code that actually makes the calls into the graphics APIs which works is as
follows:

The client (code running in the Renderer or within a NaCl module),
instead of issuing calls directly to the system APIs, serializes them and puts them in a ring buffer (Command Buffer) residing in memory shared between itself and the server process.

The server (GPU process running in a less restrictive sandbox
that allows access to the platform's 3D APIs) picks up the serialized commands from shared memory, parses them and executes the appropriate graphics calls, outputting directly to a window.

The GPU Process

The commands accepted by the GPU process are patterned closely after the GL ES 2.0 API (for example there's a command corresponding to glClear, one to glDrawArrays, etc). Since most GL calls don't have return values,
the client and server can work mostly asynchronously which keeps the performance overhead fairly low. Any necessary synchronization between the client and the server, such as the client notifying the server that there's additional work to be done, is handled
via an IPC mechanism. It's also worth noting that in addition to providing storage for the command buffer, shared memory is used for passing larger resources such as bitmaps for textures, vertex arrays, etc between the client and the server. From the client's
perspective, an application has the option to either write commands directly into the command buffer or use the GL ES 2.0 API via a client side library that we provide which handles the serialization behind the scenes. Both the compositor and WebGL currently
use the GL ES client side library for convenience. On the server side, commands received via the command buffer are converted to calls into either desktop GL (on mac and linux) or D3D (on windows) via ANGLE .

Currently Chrome uses a single GPU process per browser instance, serving requests from all the renderer processes and any plugin processes. The GPU process, while single threaded, can multiplex between multiple command buffers, each one of which is associated
with its own rendering context.

The GPU process architecture offers several benefits including:

Security: The bulk of the rendering logic remains in the sandboxed Renderer process.

Robustness: A GPU process crash (e.g. due to faulty drivers) doesn't bring down the browser.

Uniformity: Standardizing on OpenGL ES 2.0 as the rendering API for the browser regardless of the platform allows for a single, easier to maintain codebase across all OS
ports of Chrome.

The Compositor

The code
The bulk of the chromium implementation for the compositor lives in WebCore's platform/graphics/chromium directory. The compositing logic is mostly in LayerRendererChromium.cpp and the implementations
of the various composited layer types are in {Content|Video|Image}LayerChromium.cpp files. The compositor is implemented on top of the GL ES 2.0 client library which proxies the graphics calls to the GPU process.

When a page renders via the compositor, all its pixels are drawn directly onto the window via the GPU process. The compositor maintains a hierarchy of GraphicsLayers which is constructed by traversing the RenderLayer tree and updated as the page changes. With
the exception of WebGL and video layers, the contents of each of the layers are first drawn into a system memory bitmap and then uploaded to a texture. The compositor keeps track of which layers have changed since the last time they were drawn and only updates
the textures as needed. Rendering the contents of a page is simply a matter of doing a depth first traversal of the GraphicsLayer hierarchy and drawing a texture quad for each layer with the appropriate transformation.

Compositing with the GPU process

At a minimum, when doing h/w accelerated compositing, the GPU process handles a single graphics context, the one used by the compositor. However, in the presence of GPU-accelerated content in the page (such as WebGL or Pepper3D plugin instances), the GPU process
needs to be able to juggle multiple graphics contexts, each associated with its own Command Buffer, Shared Memory, IPC channel and a GL context. The way composition of GraphicsLayers whose contents are created directly on GPU works is that instead of them
rendering straight into the backbuffer, they render into a texture (using a Frame Buffer Object) that the compositor context grabs and uses when rendering the layer. It's important to note that in order for the compositor's GL context to have access to a
texture generated by an offscreen GL context, all GL contexts used by the GPU process are created such that they share resources. The resulting architecture looks like:

Handling mutliple contexts

The flags

Use the --enable-accelerated-compositing command line flag to enable the compositor on any of the three platforms and head to a page like this
one or this
one to see it in action. If you are curious about the structure of the composited layers, use the --show-composited-layer-borders flag.

As mentioned earlier, accelerated compositing in WebKit (and Chromium) kicks in only if certain types of content appear
on the page. An easy trick to force a page to switch over to the compositor is to supply a -webkit-transform:translateZ(0) for an element in the page.

【上篇】C++编程规范之3：使用版本控制系统
【下篇】User Guild for Initializing the Android Develop Environment with VMware player

作者: cire

该日志由 cire 于11年前发表在综合分类下，最后更新于 2013年01月24日.
转载请注明: GPU Accelerated Compositing in Chrome | 学步园 +复制链接

抱歉!评论已关闭.

学步园

GPU Accelerated Compositing in Chrome

This document provides background and details on the implementation of hardware accelerated compositing in Chrome.

Introduction

WebKit Rendering Basics

H/W Accelerated Compositing

作者: cire

书签

最新文章New

本站推荐

返回首页