现在的位置: 首页 > 综合 > 正文

/LGC图形渲染/Memory management for graphics processors

2014年02月20日 ⁄ 综合 ⁄ 共 10328字 ⁄ 字号 评论关闭

Memory management for graphics processors

By Jonathan Corbet

November 6, 2007

The management of video hardware has long been an area of weakness in the
Linux system (and free operating systems in general). The X Window System
tends to get a lot of the blame for problems in this area, but the truth of
the matter is that the problems are more widespread and the kernel has
never made it easy for X to do this job properly. Graphics processors (GPUs) have
gotten steadily more powerful, to the point that, by some measures, they
are the fastest processor on most systems, but kernel support for the
programming of GPUs has lagged behind. A lot of work is being done to remedy
this situation, though, and an important component of that work has just
been put forward for inclusion into the mainline kernel.

Once upon a time, video memory comprised a simple frame buffer from which
pixels were sent to the display; it was up to the system's CPU to put
useful data into that frame buffer. With contemporary GPUs, the memory
situation has gotten more complex; a typical GPU can work with a few
different types of memory:

 

  • Video RAM (VRAM) is high-speed memory installed directly on the video
    card. It is usually visible on the system's PCI bus, but that need
    not be the case. There is likely to be a frame buffer in this memory,
    but many other kinds of data live there as well.

     

  • In many lower-end systems, the "video RAM" is actually a dedicated
    section of general-purpose system memory. That RAM is set aside for
    the use of the GPU and is not available for other purposes. Even
    adapters with their own VRAM may have a dedicated RAM region as well.

     

  • Video adapters contain a simple memory management unit (the graphics
    address remapping table or GART) which can be used to map various
    pages of system memory into the GPU's address space. The result is
    that, at any time, an arbitrary (scattered) subset of the system's RAM
    pages are accessible to the GPU.

Each type of video memory has different characteristics and constraints.
Some are faster to work with (for the CPU or the GPU) than others. Some types
of VRAM might not be directly addressable by the CPU. Memory may or may not
be cache coherent - a distinction which requires careful programming to
avoid data corruption and performance problems. And graphical applications
may want to work with much larger amounts of video memory than can be made
visible to the GPU at any given time.

All of this presents a memory management problem which, while being similar
to the management needs of system RAM, has its own special constraints. So
the graphics developers have been saying for years that Linux needs a
proper manager for GPU-accessible memory. But, for years, we have done
without that memory manager, with the result that this management task has
been performed by an ugly combination of code in the X server, the kernel,
and, often, in proprietary drivers.

Happily, it would appear that those days are coming to an end, thanks to
the creation of the translation-table maps (TTM) module written primarily
by Thomas Hellstrom, Eric Anholt, and Dave Airlie. The TTM code provides a
general-purpose memory manager aimed at the needs of GPUs and graphical
clients.

The core object managed by TTM, from the point of view of user space, is
the "buffer object." A buffer object is a chunk of memory allocated by an
application, and possibly shared among a number of different applications.
It contains a region of memory which, at some point, may be operated on by
the GPU. A buffer object is guaranteed not to vanish as long as some
application maintains a reference to it, but the location of that buffer is
subject to change.

Once an application creates a buffer object, it can map that object into
its address space. Depending on where the buffer is currently located,
this mapping may require relocating the buffer into a type of memory which
is addressable by the CPU (more accurately, a page fault when the
application tries to access the mapped buffer would force that move).
Cache coherency issues must be handled as well, of course.

There will come a time when this buffer must be made available to the GPU
for some sort of operation. The TTM layer provides a special "validate"
ioctl()
to prepare buffers for processing; validating a buffer
could, again, involve moving it or setting up a GART mapping for it. The
address by which the GPU will access the buffer will not be known until it
is validated; after validation, the buffer will not be moved out of the
GPU's address space until it is no longer being operated on.

That means that the kernel has to know when processing on a given buffer
has completed; applications, too, need to know that.
To this end, the TTM layer provides "fence" objects. A fence is
a special operation which is placed into the GPU's command FIFO. When the
fence is executed, it raises a signal to indicate that all instructions
enqueued before the fence have now been executed, and that the GPU will no
longer be accessing any associated buffers. How the signaling works is
very much dependent on the GPU; it could raise an interrupt or simply write
a value to a special memory location. When a fence signals, any associated
buffers are marked as no longer being referenced by the GPU, and any
interested user-space processes are notified.

A busy system might feature a number of graphical applications, each of
which is trying to feed buffers to the GPU at any given time. It is not at
all unlikely that the demands for GPU-addressable buffers will exceed the
amount of memory which the GPU can actually reach. So the TTM layer will
have to move buffers around in response to incoming requests. For
GART-mapped buffers, it may be a simple matter of unmapping pages from
buffers which are not currently validated for GPU operations. In other
cases, the contents of the buffers may have to be explicitly copied to
another type of memory, possibly using the GPU's hardware to do so. In such cases,
the buffers must first be invalidated in the page tables of any user-space
process which has mapped it to ensure that the buffer will not be written
to during the move. In other words, the TTM really does become an
extension of the system's memory management code.

The next question which is bound to come up is: what happens when graphical
applications want to use more video memory than the system as a whole can
provide? Normal system RAM pages which are used as video memory are locked
in place (and unavailable for other uses), so there must be a clear limit
on the number of such pages which can be created. The current solution to
this problem is to cap the number of such pages at a fraction of the available low memory
- up to 1GB on a 4GB, 32-bit system. It would be nice to be able to extend
this memory by writing unused pages to swap, but the Linux swap implementation is
not meant to work with pages owned by the kernel. The long-term plan would
appear to be to let the X server create a large virtual range with
mmap()
, which would then be swappable. That functionality has not
yet been implemented, though.

There is a lot more to the TTM code than has been described here; some more
information can be found in this TTM overview [PDF]
.
For the time being, this code works with a patched version of the Intel
i915 driver, with other drivers to be added in the future. TTM has been
proposed

for inclusion into -mm now and merger into the mainline for
2.6.25. The main issue between now and then will be the evaluation of the
user-space API, which will be hard to change once this code is merged.
Unfortunately, documentation for this API appears to be scarce at the
moment.


(Log in
to post comments)

 

Memory management for graphics processors

Posted Nov 8, 2007 23:53 UTC (Thu) by jwb
(guest, #15467)
[Link
]

This article touches on something I've always wondered.  Is there anything on these GPUs which
prevents a user from addressing random bits of the video memory? Suppose I use "fast user
switching" in Mac OS to give the system over to some other user. Does the OS flush the VRAM?
Could the other user program the GPU to display random memory addresses as textures? If there
are remnants of other people's windows in texture memory, that would be Bad(TM).

 

Memory management for graphics processors

Posted Nov 9, 2007 1:11 UTC (Fri) by zlynx
(subscriber, #2285)
[Link
]

I believe that the Linux kernel DRM (direct rendering) modules are supposed to check the
command stream to ensure that cannot happen.

I suppose that with a very optimized DMA design, this might not be possible. In that case
you'd have to trust the X server and OpenGL driver to take care of it.

 

Memory management for graphics processors

Posted Nov 9, 2007 19:05 UTC (Fri) by nix
(subscriber, #2304)
[Link
]

Unless the commands are so simple they aren't Turing-complete, or their 
form is extremely restricted and stereotyped, I suspect this reduces to
solving Rice's theorem, which is of course impossible in the general case
for just the same reason solving the halting problem is, and is
ridiculously difficult even in most useful special cases.

 

Memory management for graphics processors

Posted Nov 9, 2007 8:59 UTC (Fri) by airlied
(subscriber, #9104)
[Link
]

Later GPUs such as Intel 965 and up, and ATI r500, and Nvidia g80, have page table support
that can be leverage at a hw level...

However we can mostly protect things using the kernel with the TTM layer described above. It
won't let you access video ram directly everything must go through a buffer object which has
basic sharing permissions.

In the future we hope to implement better permission models and maybe even some sort of
SELinux integration.

 

Security Implications: Use cases

Posted Nov 9, 2007 18:19 UTC (Fri) by AnswerGuy
(subscriber, #1256)
[Link
]

The "fast user switching" feature that's being added to systems like
Ubuntu might have some security downsides. However, we should consider
the use case for which "fast user switching" is intended an keep in mind
that there are other user switching methods available.

Perusing the Wikipedia article
on fast user switching
will cover some of the security considerations. Since the "virtual
console" features of Linux, FreeBSD and other PC UNIX clones fit roughly
into the category of "fast user switching" then the fact that the
contents of all those additional terminal (text) screens is visible is
roughly the same risk as you express concern about.

The obvious suggestions are: employ screen blankers which are
(optionally) hooked into the fast user switching, don't use fast user
switching in security sensitive situations (such as Internet cafes).

This latter point is the most obvious. In any case where you're worried
about the next user potentially trying to snoop or exploit your session
you should be using a "secure attention key" to force the system to
kill all processes associated with your terminal and revoke all
credentials from it.

Clearly the intended use cases for "fast user switching" are the common
situations where folks want to share a computer while maintaining their
own desktop configurations, etc. For example I have a couple of
computers in the living room, one at the end of the couch and another by
the easy chair. Everyone in the household uses whichever of these is
closest to do a little web browsing, re-attach to his or her screen
session, read e-mail or whatever. Since my wife and I both have root on
all of machines around the house we're clearly not worried about the
other being able to bypass security, etc.

Given that "fast user switching" implies console access ... there are
far greater security risk associated with PC console access in general.

 

JimD

 

Memory management for graphics processors

Posted Nov 15, 2007 11:21 UTC (Thu) by jfj
(guest, #37917)
[Link
]

That's nice and all, but can you *please* export simple 2D acceleration (blit/fillrect) to
userspace?

The code is already there. It just needs an ioctly.

Imho, OpenGL and that stuff is mostly useful for Games and the Demo-scene. Video players might
also need YUV and scalling. But with simple 2D acceleration and the framebuffer you can bring
up a full windowing environment with the quality of WinXP. Yes, i am anachronistic.

 

Memory management for graphics processors

Posted Nov 21, 2007 16:14 UTC (Wed) by pharm
(guest, #22305)
[Link
]

While you're waiting for the kernel to get its act together, directfb does all that already.
It does require ownership of the hardware though.

 


抱歉!评论已关闭.