现在的位置: 首页 > 综合 > 正文

Anatomy of the Linux slab allocator

2013年03月24日 ⁄ 综合 ⁄ 共 13586字 ⁄ 字号 评论关闭




Anatomy of the Linux
slab allocator

Learn how Linux manages memory

M.
Tim Jones
, Consultant Engineer, Emulex Corp.

M. Tim Jones is an embedded software architect and
the author of GNU/Linux Application Programming, AI Application Programming, andBSD Sockets Programming from a Multilanguage Perspective. His engineering background ranges from the development of kernels for geosynchronous spacecraft
to embedded systems architecture and networking protocols development. Tim is a Consultant Engineer for Emulex Corp. in Longmont, Colorado.

Summary: 
Good operating system performance depends in part on the operating system's ability to efficiently manage resources. In the old days, heap memory managers were the norm, but performance suffered due to fragmentation and the need for memory reclamation. Today,
the Linux® kernel uses a method that originated in Solaris but has been used in embedded systems for quite some time, allocating memory as objects based on their size. This article explores the ideas behind the slab allocator and examines its interfaces and
their use.

Tag
this!

Update
My dW interests
(
Log
in
|What's
this?
)Skip
to help for Update My dW interests

Date: 
15 May 2007
Level:  Intermediate
Activity:  16559 views
Comments:   0 (
Add
comments
)

Average
rating (based on 90 votes)

Dynamic memory management

The goal of memory management is to provide a method
by which memory can be dynamically shared amongst a variety of users for a variety of purposes. The memory management method should do both of the following:

  • Minimize the amount of time required to manage the memory
  • Maximize the available memory for general usage (minimize management
    overhead)

Memory management is ultimately a zero-sum game
of tradeoffs. You can develop an algorithm that uses little memory for management but takes more time to manage the available memory. You can also develop an algorithm that efficiently manages memory but uses a bit more memory. In the end, the requirements
for the particular application drive the balance of the tradeoffs.

Early memory managers used a heap-based allocation
strategy. In this method, a large block of memory (called the heap) is used to provide memory for user-defined purposes. When users need a block of memory, they make a request for a given size. The heap manager looks at the available memory
(using a particular algorithm) and returns the block. Some of the algorithms used in this search are thefirst-fit (the first block encountered in the heap that satisfies the request), and thebest-fit (the best block in the
heap that satisfies the request). When the users are finished with the memory, they return it to the heap.

The fundamental problem with this heap-based allocation
strategy is fragmentation. As blocks of memory are allocated, they are returned in different orders and at different times. This tends to leave holes in the heap requiring more time to efficiently manage the free memory. This algorithm tends to be
memory efficient (allocating what's necessary) but requires more time to manage the heap.

Another approach, calledbuddy memory allocation,
is a faster memory technique that divides memory into power-of-2 partitions and attempts to allocate memory requests using a best-fit approach. When memory is freed by the user, the buddy block is checked to see if any of its contiguous neighbors have also
been freed. If so, the blocks are combined to minimize fragmentation. This algorithm tends to be a bit more time efficient but can waste memory due to the best-fit approach.

This article focuses on Linux kernel memory management
and, in particular, the mechanisms provided through slab allocation.

The slab cache

The slab allocator used in Linux is based on an
algorithm first introduced by Jeff Bonwick for the SunOS operating system. Jeff's allocator revolves around object caching. Within a kernel, a considerable amount of memory is allocated for a finite set of objects such as file descriptors and other common
structures. Jeff found that the amount of time required to initialize a regular object in the kernel exceeded the amount of time required to allocate and deallocate it. His conclusion was that instead of freeing the memory back to a global pool, he would have
the memory remain initialized for its intended purpose. For example, if memory is being allocated for a mutex, the mutex initialization function (mutex_init) need only be performed once when the memory is first allocated for the mutex. Subsequent allocations
of the memory need not perform the initialization because it's already in the desired state from the previous deallocation and call to the deconstructor.

The Linux slab allocator uses these ideas and others
to build a memory allocator that is efficient in both space and time.

Figure 1 illustrates the high-level organization
of the slab structures. At the highest level is the cache_chain, which is a linked list of the slab caches. This is useful for best-fit algorithms that look for a cache that most closely fits the size of the desired allocation (iterating the list). Each element
of the cache_chain is a kmem_cache structure reference (called a cache). This defines a pool of objects of a given size to manage.


Figure 1. The major structures of the slab allocator

Each cache contains a list ofslabs,
which are contiguous blocks of memory (typically pages). Three slabs exist:

slabs_full

Slabs that are fully allocated.

slabs_partial

Slabs that are partially allocated.

slabs_empty

Slabs that are empty, or no objects allocated.

Note that the slabs on the slabs_empty list are
prime candidates for reaping. This is the process by which the memory used by slabs is provided back to the operating system for other uses.

Each slab in the slab list is a contiguous block
of memory (one or more contiguous pages) that is divided into objects. These objects are the fundamental elements that are allocated and deallocated from the particular cache. Note that the slab is the smallest allocation to the slab allocator, so if it needs
to grow, this is the minimum by which it will grow. Typically, numerous objects are allocated per slab.

As objects are allocated and deallocated from a
slab, the individual slab can move between the slab lists. For example, when all objects are consumed in a slab, it moves from the slabs_partial list to the slabs_full list. When a slab is full and an object is deallocated, it moves from the slabs_full list
to the slabs_partial list. When all objects are deallocated, it moves from the slabs_partial list to the slabs_empty list.

Motivation behind the slab

The slab cache allocator provides a number of benefits
over traditional memory management schemes. First, kernels commonly rely on the allocation of small objects that are allocated numerous times over the lifetime of the system. The slab cache allocator provides this through the caching of similarly sized objects,
thus avoiding the fragmentation problems that commonly occur. The slab allocator also supports the initialization of common objects, thus avoiding the need to repeatedly initialize an object for the same purpose. Finally, the slab allocator supports hardware
cache alignment and coloring, which allows objects in different caches to occupy the same cache lines for increased cache utilization and better performance.


Back
to top

API functions

Now on to the application program interface (API)
for creating new slab caches, adding memory to the caches, destroying the caches, as well as the functions to allocate and deallocate objects from them.

The first step is to create your slab cache structure,
which you can create statically as:

static structkmem_cache *my_cachep;

 

Linux source for
the slab cache

You can find the slab cache source in ./linux/mm/slab.c.
The kmem_cache structure is also defined in ./linux/mm/slab.c. The discussion in this article focuses on the current implementation in the 2.6.21 Linux kernel.

This reference is then used by the other slab cache
functions for creation, deletion, allocation, and so on. The kmem_cache structure contains the per-central processing unit (CPU) data, a set of tunables (accessible through the proc file system), statistics, and necessary elements to manage the slab cache.

kmem_cache_create

Kernel function kmem_cache_create is used to create
a new cache. This is commonly performed at kernel init time or when a kernel module is first loaded. Its prototype is defined as:

struct kmem_cache *

kmem_cache_create(
const char *name, size_t size, size_t align,

                      unsigned
long flags;

                      void
(*ctor)(void*, struct kmem_cache *, unsigned long),

                      void
(*dtor)(void*, struct kmem_cache *, unsigned long));

 

The name argument defines the name of the cache,
which is used by the proc file system (in /proc/slabinfo) to identify this cache. The size argument specifies the size of the objects that should be created for this cache, with the align argument defining the required alignment for each object. The flags
argument specifies options to enable for the cache. These flags are shown in Table 1.


Table 1. Partial list of options for kmem_cache_create (as specified in flags)

Option

Description

SLAB_RED_ZONE

Insert markers at header and trailer of the object
to support checking of buffer overruns.

SLAB_POISON

Fill a slab with a known pattern to allow monitoring
of objects in the cache (objects owned by the cache, but modified externally).

SLAB_HWCACHE_ALIGN

Specify that the objects in this cache must be
aligned to the hardware cachline.

 

The ctor and dtor arguments define an optional object
constructor and deconstructor. The constructor and deconstructor are callback functions provided by the user. When a new object is allocated from the cache, it can be initialized through the constructor.

After the cache is created, its reference is returned
by the kmem_cache_create function. Note that this function allocates no memory to the cache. Instead, when attempts are made to allocate objects from the cache (when it's initially empty), memory is allocated to it through arefill operation.
This same operation is used to add memory to the cache when all its objects are consumed.

kmem_cache_destroy

Kernel function kmem_cache_destroy is used to destroy
a cache. This call is performed by kernel modules when they are unloaded. The cache must be empty before this function is called.

voidkmem_cache_destroy( struct
kmem_cache *cachep );

 

kmem_cache_alloc

To allocate an object from a named cache, the kmem_cache_alloc
function is used. The caller provides the cache from which to allocate an object and a set of flags:

void*kmem_cache_alloc( struct
kmem_cache *cachep, gfp_t flags );

 

This function returns an object from the cache.
Note that if the cache is currently empty, this function may invoke cache_alloc_refill to add memory to the cache. The flag options for kmem_cache_alloc are the same as those for kmalloc. Table 2 provides a partial list of the flag options.


Table 2. Flag options for the kmem_cache_alloc and kmalloc kernel functions

Flag

Description

GFP_USER

Allocate memory for a User (this call may sleep).

GFP_KERNEL

Allocate memory from kernel RAM (this call may
sleep).

GFP_ATOMIC

Force no-sleep on this call (useful for interrupt
handlers).

GFP_HIGHUSER

Allocate from high memory.

 

Slab allocation for
NUMA

For Non-Uniform Memory Access (NUMA) architectures,
the allocation function for a specified node is kmem_cache_alloc_node.

kmem_cache_zalloc

The kernel function kmem_cache_zalloc is similar
to kmem_cache_alloc, except that it performs a memset of the object to clear it out prior to returning the object to the caller.

kmem_cache_free

To free an object back to the slab, the kmem_cache_free
is used. The caller provides the cache reference and the object to be freed.

voidkmem_cache_free( struct
kmem_cache *cachep, void *objp );

 

kmalloc and kfree

The most common memory management functions in the
kernel are the kmalloc and kfree functions. The prototypes for these functions are defined as:

void *kmalloc( size_t size,
int flags );

voidkfree( const void *objp
);

 

Note that in kmalloc the only arguments are the requested size of the object
to allocate and a set of flags (see the partial list in
Table
2
). But kmalloc and kfree use the slab cache just like the previously-defined functions. Instead of naming a specific slab cache from
which to allocate an object, the kmalloc function iterates through the available caches looking for one that can satisfy its size constraint. When found, an object is allocated (using __kmem_cache_alloc). To free an object with kfree, the cache from which
the object was allocated is determined with a call to virt_to_cache. This function returns the cache reference that is then used in a call to __cache_free to release the object.

Generic object allocation

Internally in the slab source, a function named
kmem_find_general_cachep is provided that performs a cache search looking for a slab cache that best fits the necessary object size.

Other miscellaneous functions

The slab cache API provides a number of other useful
functions. The kmem_cache_size function returns the size of the objects that are managed by this cache. You can also retrieve the name of a given cache (as defined at cache creation time) through a call to kmem_cache_name. A cache can be shrunk by releasing
free slabs in the cache. This can be accomplished with a call to kmem_cache_shrink. Note that this action (called reaping) is performed automatically on a periodic basis by the kernel (through kswapd).

unsigned intkmem_cache_size(
struct kmem_cache *cachep );

const char *kmem_cache_name(
struct kmem_cache *cachep );

intkmem_cache_shrink( struct
kmem_cache *cachep );

 


Back
to top

Example use of the slab cache

The following code snippets demonstrate the creation
of a new slab cache, allocating and deallocating objects from the cache, and then destroying the cache. To begin, a kmem_cache object must be defined and then initialized (see Listing 1). This particular cache contains 32-byte objects and is hardware-cache
aligned (as defined by the flags argument SLAB_HWCACHE_ALIGN).


Listing 1. Creating a new slab cache

static struct kmem_cache *my_cachep;

 

static void init_my_cache( void )

{

 

  my_cachep =
kmem_cache_create(

                 "my_cache",           
/* Name */

                 32,                   
/* Object Size */

                 0,                    
/* Alignment */

                 SLAB_HWCACHE_ALIGN,   
/* Flags */

                 NULL,
NULL );          /* Constructor/Deconstructor */

 

  return;

}

 

With your slab cache allocated, you can now allocate
an object from it. Listing 2 provides an example of allocating and deallocating an object from the cache. It also demonstrates two of the miscellaneous functions.


Listing 2. Allocating and deallocating objects

int slab_test( void )

{

 void *object;

 

 printk( "Cache name is
%s/n", kmem_cache_name( my_cachep ) );

 printk( "Cache object
size is %d/n", kmem_cache_size( my_cachep ) );

 

 object =
kmem_cache_alloc( my_cachep, GFP_KERNEL );

 

 if (object) {

 

   kmem_cache_free(
my_cachep, object );

 

 }

 

 return 0;

}

 

Finally, Listing 3 is an example of destroying a
slab cache. The caller must ensure that no attempts to allocate objects from the cache are performed during the destroy operation.


Listing 3. Destroying a slab cache

static void remove_my_cache( void )

{

 

 if (my_cachep)
kmem_cache_destroy( my_cachep );

 

 return;

}

 


Back
to top

Proc interface to the slab

The proc file system provides a simple way to monitor
all slab caches that are active in the system. This file, called /proc/slabinfo, provides detailed information about all slab caches in addition to providing a few tunables that are accessible from user space. The current version of slabinfo provides a header
so that the output is a bit more readable. For each slab cache in the system, the number of objects, number of active objects, and the object size is provided (in addition to the objects and pages per slab). A set of tunables an

抱歉!评论已关闭.