现在的位置: 首页 > 综合 > 正文

动态内存管理

2013年12月09日 ⁄ 综合 ⁄ 共 8033字 ⁄ 字号 评论关闭

http://ialloc.org/2013/memory-allocator-notes/

memory-allocator-notes

一直对内存管理相关的内容很感兴趣,毕竟自认为自己在 “高性能服务端开发” 界里混过些 时日。要不跟内存打打交道,通晓几种内存管理机制,怎么能好意思说自己吃过这口饭呢。

言服务器开发,必提 内存管理、并发模型、日志、磁盘、网络读写。这些基础设施打造好 了,业务逻辑实现起来、程序跟踪调试起来才能得心应手。

另外,在整理 Nginx 源代码笔记的过程中,重新意识到 Nginx 在内存管理方面做得相当威 武不屈。比起处处使用 malloc 来说,总是让人觉得那么踏实和惬意。

以前研读过几个内存管理机制的代码,但是总觉得对这一块缺少大局观。一个东西归不到某 一类里,总觉得有几分怆然若失。这就网上到处扒拉文章 (主要是 Wikipedia) 试图将内存管理相关的话题都大体归纳一下。

参考里的 Inside memory managment 也是类似功能的文章。

类别划分

Dynamic memory allocation

动态内存 (Heap-based) 分配有以下几种基本实现方式:

  • 变长内存管理机制 /* TODO */
  • Fixed-sized-blocks allocation, also called Memory Pool. It uses a free list of fixed-size blocks of memory. This works well for simple embedded systems where no large objects need to be allocated, but suffers from fragmentation.
    However, due to the significantly reduced overhead this method can substantially improve performance for objects that need frequent allocation/de-allocation.

    • 基本思想:预先申请、内存块(block)大小固定、缓存(free list)
    • 逻辑简单,运行开销少。一次性释放,不需要存储内存块维护信息,利用率高。
  • Buddy memory allocation. In this system, memory is allocated into several pools of memory instead just one, where each pool represents blocks of memory of a certain power of two in size. All blocks of a particular size are kept in a sorted
    linked list or tree and all new blocks that are formed during allocation are added to their respective memory pools for later use. If a smaller size is requested that is available, the smallest available size is selected and halved. One of the resulting halves
    is selected, and the process repeats until the request is complete. When a block is allocated, the allocator will start with the smallest sufficiently large block to avoid needlessly breaking blocks. When a block is freed, it is compared to its buddy. If they
    are both free, they are combined and placed in the next-largest size buddy-block list. BEST FIT

    • Free memory is used to maintain in linked lists, each of similar sized blocks. Every block is of size 2^k. When some memory is needed by a thread, the block part of next higher order is selected, and divided into two. Note that the two such pieces different
      in address only in their kth bit. Such pieces are known as buddies. When any used part is freed, the OS examines to see if its buddy is also free. If so, it is reconnected, and sends into the original free-block linked-list.
    • 基本思想:按2幂次方分级、高级别的内存分块 (一般2分) 交给低级别使用、内存块 释放后会尝试和相邻空闲内存块合并成更大空闲内存块并释放给上级
    • 较少 external fragmentation (如果不是2分的情况下可能发生),但是依然存在 internal fragmentation,分配的内存块需要附带管理信息,内存释放较快。
  • Slab allocation. It is a memory management mechanism intended for the efficient memory allocation of kernel objects which displays the desirable property of eliminating fragmentation caused by allocations and deallocations. The technique is
    used to retain allocated memory that contains a data object of a certain type for reuse upon subsequent allocations of objects of the same type.

    • Slab allocation can be layered on top of the more coarse buddy allocator to provide a more fine-grained allocation. – 根据需要划分出更小的 chunk。
    • With slab allocation, memory chunks suitable to fit data objects of certain type or size are preallocated. The slab allocator keeps track of these chunks, known as caches, so that when a request to allocate memory for a data object of a certain type is
      received it can instantly satisfy the request with an already allocated slot. Destruction of the object, however, does not free up the memory, but only opens a slot which is put in the list of free slots by the slap allocator. The next call to allocate memory
      of the same size will return the now unused memory slot. This process eliminates the need to search for suitable memory space and greatly alleviates memory fragmentation. In this context a slab is one or more contiguous pages in the memory containing pre-allocated
      memory chunks.
    • A slab is the amount that a cache can grow or shrink by. It represents on memory allocation to the cache from the machine, and whose size is customarily a multiple of the page size. A slab must contain a list of free buffers (or bufctls), as well as a list
      of the bufctls that have been allocated.
    • A buffer is the memory that the user of a slab allocator would use.
    • 核心思想:缓存、根据需求预先切块以减少 internal fragmentation、从系统中整 块申请或释放一个或多个 pagesize 大小的内存块以减少系统内存的 fragmentation。
    • 从基本形式上貌似和 memory pool 没有区别? /* TODO */
  • Region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region is a collection of allocated objects that can be efficiently deallocated all at once. Like stack allocation, regions
    facilitate allocation and deallocation of memory with low overhead; but they are more flexible, allowing objects to live longer than the activation record in which they were allocated. In typical implementations, all objects in a region are allocated in a
    single contiguous range of memory addresses.

    • 核心思想:支持变长内存申请、从系统申请一个或多个 pagesize 大小的内存块、整 个内存块统一释放。
    • 申请释放简单,支持变长内存申请,但是适用场景有局限性。

Nginx的内存池

Memory pool

Some systems, like the web server Nginx, use the term memory pool to refer to a group of variable-size allocations which can be later deallocated all at once. This is also known as a region.

Nginx 根据 HTTP 的业务特性,将内存池和具有生命周期的对象挂接起来,在对象销毁时, 一次性的将对象对应的内存池都进行释放。这样一来,对象运行期间,不论是定长还是变长 的内存请求,都能使用内存池进行满足,在提高性能的同时也避免了内存块管理的复杂 性。

Nginx的共享内存

Slab Allocator

Nginx 的共享内存使用加强版的 Slab allocator (至少从上面的概念中严格定义的 话)进行管理。它将 slab allocator 和 buddy memory system 里的分级相结合,基本上实现了一个 使用自定义内存空间 (Nginx 的共享内存) 的通用内存管理器。

另外,glibapr_pool 和 stl 大致也是这样的实现方式。

Implementations

  • obstack – region memory allocator.

  • dlmalloc – buddy memory allocator.

    • It’s the default native version of malloc in some versions of Linux.
    • Allocated chunk contains an 8 or 16 byte overhead for the size of the chunk and usage flags. Applied 8-byte aligned, so 24 bytes minimum chunk size.
    • For requests below 256 bytes, a simple two power best fit allocator is used. If there are no free blocks in that bin, a block from the next highest bin is split in two.
    • For requests of 256 bytes or above but below the mmap threshold, recent versions of dlmalloc use an in-place bitwise trie algorithm. If there is no free space left to satisfy the request, dlmalloc tries to increase the size of the heap, usually via thebrk system
      call.
    • For requests above the mmap threshold, the memory is always allocated using the mmap system call. The threadhold is usually 256 KB.
  • ptmalloc – buddy memory allocator.

    • Default glibc’s malloc implementation currently.
    • Based on dlmalloc and was extended for use with multithread (especially on SMP systems).
  • FreeBSD’s malloc – buddy memory allocator

    • jemalloc
    • In order to avoid lock contention, jemalloc uses separate arenas for each CPU.
    • Experiments measuring number of allocations per second in multithreading application have shown that this makes it scale linearly with the number of threads, while for both phkmalloc (FreeBSD’s old implementation) and dlmalloc performance was inversely
      proportional to the number of threads.
  • OpenBSD’s malloc

    • For requests greater in size than one page, the entire allocation is retrieved usingmmap; smaller sizes are assigned from memory pools maintained by malloc within a number of “bucket pages”, also allocated with mmap.
    • On a call to free, memory is released and unmapped from the process address space using munmap.
    • This system is designed to improve security by taking advantage of the address space layout randomization and gap page features implemented as part of OpenBSD’s mmap system call, and to detect use-after-free bugs—as a large memory allocation is completely
      unmapped after it is freed, further use causes a segmentation fault and termination of the program.
  • hoard – buddy memory allocator

    • Hoard’s goal is scalable memory allocation performance.
    • Hoard uses mmap exclusively, but manages memory in chunks of 64K called superblocks.
    • By allocating only from superblocks on the local per-thread or per-CPU heap, and moving mostly-empty superblocks to the global heap so they can be reused by other processors.
    • Hoard keeps fragmentation low while achieving near linear scalability with the number of threads.
  • tcmalloc – buddy memory allocator.

    • Thread-Cache malloc.
    • Every thread has local storage for small allocations. For large allocations mmap orsbrk can be used.
    • GC for local storage of dead threads.
    • More than twice as fast as glibc’s ptmalloc for multithreaded programs.
  • nedmalloc – buddy memory allocator

    • scalable, multithreaded memory allocator with little memory fragmentation.
    • It is more than 125 times faster than the standard Windows XP memory allocator, 4-10 times faster than the standard FreeBSD 6 memory allocator and up to twice as fast as ptmalloc2, the standard Linux memory allocator.
  • kmalloc – in kernel malloc

  • apr_pool – slab memory allocator

  • glib – slab memory allocator

  • stl’s allocator – slab memory allocator

常用技术

  • 将可以提供的内存块按2的次幂划分成几个等级,申请时按照 best-fit 的测策略选取可 用内存块。减少 external fragmentation, 但是增加了internal fragmentation.
  • 按2的此幂划分后,在分配较小的块时,先从较大的块中申请内存,划分成小块后,再交 付给用户。
  • 缓存。内存块被释放后,并不真正返回给操作系统,而先链到 free list 中,供下次使 用。另一方面,利用CPU Cache, Thread-Specific Data特性,提高性能和并发性能。
  • 对齐。按一定的内存边界对齐,提高内存读取速度。
  • 合并。相邻的内存块都被释放后,将它们合并成较大的空闲内存块。
  • 预先申请较大的内存块,一次性释放。

结语

其实,总结到最后,我不但没有更清晰,相反觉得上面这几类管理机制,从上面列出的概念 上都不太容易区分清楚。像memory pool,如果实现了空间自增长和自缩小的话,和 slab allocator 又有什么区别;像 slab allocator 在常见的实现里又都实现了按2的冪次方进 行分级,那如果它实现了连续块释放和小块由大块切分后生成的话,又和 buddy memory allocator 有什么区别。

也许没什么区别,也许这就是区别。

抱歉!评论已关闭.