现在的位置: 首页 > 综合 > 正文

Linux 内核的WorkQueues API做了修改

2019年05月17日 ⁄ 综合 ⁄ 共 11548字 ⁄ 字号 评论关闭

    WorkQueue机制允许内核代码在晚一点的时间执行。Workqueue通过存在的一个或者多个专门的进程实现,去执行队列工作。因为在进程的上下文汇总执行,因此如果需要,其可以sleep。WorkQueue也可以延迟特定时间执行工作。所以它们在内核中许多地方使用。
David
Howells最近检查workqueue时发现work_struct(用来描述一个程序执行)是相当大的,在64-bit机器上有96bytes,这是相当大的数据结构,因为很多地方都使用这个结构。因此他想出办法把它们变小,他成功了但是需要改动workqueue的API.

导致struct
work_struct臃肿的原因是:
1.其中所包含timer
structure。许多workqueue的用户从来不使用这个delay特性,但是在结构体内都包含timer_list结构。
2.私有数据指针,这是传递给work函数的参数。许多函数使用这个指针,但是它通常可以从work_struct指针中用contain_of()计算出来。
3.一个word只用一个bit来表示pending,用来说明这个work_struct目前在队列上等待执行。

David处理了以上的情况,使用了一种新的结构体struct delayed_work,专门用于延时调用使用。而把struct
work_structure中的timer结构体删除了。私有数据指针消失了,work函数使用一个指向work_structure的指针,
typedef void (*work_func_t)(struct work_struct *work)。使用一些技巧删除了pending
word。这些变动的结果使得workqueue的API发生了变化。有两种方法声明一个workqueue的entry。
   
DECLARE_WORK(name, func);
    DECLARE_DELAYED_WORK(name, func);
 
对于在运行时生成的work structure,初始化宏现在如下:
  INIT_WORK(struct work_struct work,
work_func_t func);
  PREPARE_WORK(struct work_struct work, work_func_t
func);
  INIT_DELAYED_WORK(struct delayed_work work, work_func_t func);
 
PREPARE_DELAYED_WORK(struct delayed_work work, work_func_t func);
 
INIT_*版本的宏初始化整个结构,它们必须在这个结构第一次初始化的时候使用,PREPARE_*版本的宏运行速度稍微快些。
The functions
for adding entries to workqueues (and canceling them) now look like this:
   
int queue_work(struct workqueue_struct *queue,
                   struct
work_struct *work);
    int queue_delayed_work(struct workqueue_struct
*queue,
                           struct delayed_work *work);
    int
queue_delayed_work_on(int cpu,
                              struct
workqueue_struct *queue,
                             struct delayed_work
*work);
    int cancel_delayed_work(struct delayed_work *work);
    int
cancel_rearming_delayed_work(struct delayed_work *work);

Interestingly,
David has added a variant on the workqueue declaration and initialization
macros:
    DECLARE_WORK_NAR(name, func);
   
DECLARE_DELAYED_WORK_NAR(name, func);
    INIT_WORK_NAR(name, func);
   
INIT_DELAYED_WORK_NAR(name, func);
    PREPARE_WORK_NAR(name, func);
   
PREPARE_DELAYED_WORK_NAR(name, func);
The "NAR" stands for
"non-auto-release." Normally, the workqueue subsystem resets a work entry's
pending flag prior to calling the work function; that action, among other
things, allows the function to resubmit itself if need be. If the entry is
initialized with one of the above macros, however, this reset will not happen,
and the work function is expected to reset the flag itself (with a call to
work_release()). The stated purpose is to prevent the workqueue entry from being
released before the work function is done with it - but there is nothing in the
clearing of the pending bit which would cause that release to happen. Perhaps
that is why there are no users of the _NAR variants in David's patch. It may be
that somebody is thinking about implementing reference-counted workqueue
structures in the future.

Meanwhile, these changes require a lot of fixes
throughout the kernel tree; that drew a complaint from Andrew Morton, who was
unable to make those changes mesh with all of the other patches queued up for
the opening of the 2.6.20 merge window. Andrew suggested that the workqueue
patches could be merged after 2.6.20-rc1 comes out, as was done with the
interrupt handler function prototype in 2.6.19. But Linus, who likes the
workqueue patches, would rather get them in sooner:

I'd actually prefer
to take it before -rc1, because I think the previous time we did something after
-rc1 was a failure (the whole irq argument handling thing). It just exposed too
many problems too late in the dev cycle. I'd rather have the problems be exposed
by the time -rc1 rolls out, and keep the whole "we've done all major nasty ops
by -rc1" thing.

So it seems that, somehow, all of the pieces will be made
to fit and the workqueue API will change in
2.6.20.

因此可以用以下方法升级你程序的workqueues:
1.任何work_struct有调用一下这些函数的:
   
queue_delayed_work()
    queue_delayed_work_on()
   
schedule_delayed_work()
    schedule_delayed_work_on()
   
cancel_rearming_delayed_work()
    cancel_rearming_delayed_workqueue()
   
cancel_delayed_work()
需要改成delayed_work。注意,cancel_delayed_work()经常在它不起作用的地方调用(我认为是人们误解了它的作用)。
2.一个delayed_work
struct必须用如下初始化:
    __DELAYED_WORK_INITIALIZER
   
DECLARE_DELAYED_WORK
    INIT_DELAYED_WORK
    而不是:
   
_WORK_INITIALIZER
    DECLARE_WORK
    INIT_WORK
   
(这些只用来处理work_struct(non-delayable
work).
3.初始化函数不再接受一个data指针参数,因此需要删除这个。
4.下列任何一个关于delayed_work调用的函数:
   
queue_work()
    queue_work_on()
    schedule_work()
   
schedule_work_on()
    必须改正成对应的如下函数:
    queue_delayed_work()
   
queue_delayed_work_on()
    schedule_delayed_work()
   
schedule_delayed_work_on()
    给一个值为0的timeout参数作为一个附加参数。这样只queue对应的work
item,不设定timer.
5.任何直接检查work item的pending flag,如下所示:
    test_bit(0,
&work->pending)
    应该被下面合适的函数代替:
    work_pending(work)
   
delayed_work_pending(work)
6. work function 必须改成如下:
    void
foo_work_func(struct work_struct *work)
    {
        ...
    }
   
这个需要对work_struct和delayed_work handler同时运用:
   
a)如果传入的为NULL的datum,这个work参数会被忽略。
   
b)如果这个数据是一个指向结构的指针,这个结构包含这work_struct,例如:
        struct foo {
       
struct work_struct worker;
        ...
    };
    void
foo_work_func(struct work_struct *work)
    {
        struct foo *foo =
   
        ...
    }
   
如果work_struct被放置在被包含的struct的开始位置,可以省略掉container_of()的指令,否则container_of()就是必须的。
 c)如果这个数据是一个包含delayed_work的结构地址的值,那么如下类似的代码需要使用:
    
struct foo {
        struct delayed_work worker;
        ...
   
};

    void foo_work_func(struct work_struct *work)
    {
       
struct foo *foo = container_of(work, struct foo, worker.work);
       
...
    }
   
注意这里有一个例外,work在container_of()中,因为这个work_struct被包含在delayed_work中。
   
d)如果这个数据不是一个指向container的指针,但是这个container在work
handler运行时是存在的,那么数据可以用一个额外的变量存储在container中。
   
   
handler应该安装(b)和(c)中编写,对于这个额外的变量可以在contain_of()之后再访问。
   
   
很多情况是一个双向链表结构: work_struct <==> otherStruct。例如net_device
   
e)如果数据是完全不相关的,不能存储到container中,因为这个container可能在handler中不能访问,那么work_struct或者delayed_work应该被下列宏初始化:
   
DECLARE_WORK_NAR
    DECLARE_DELAYED_WORK_NAR
    INIT_WORK_NAR
   
INIT_DELAYED_WORK_NAR
    __WORK_INITIALIZER_NAR
   
__DELAYED_WORK_INITIALIZER_NAR
   
这些宏和普通的初始化参数有着一样的参数,但是设置work_struct的flag意味着在work函数被调用之前不会被清除。

参考资料:
   
1.http://lwn.net/Articles/211279/
   
2.http://bugboy.ycool.com/post.2926602.html
   
3.http://bugboy.ycool.com/post.2927176.html
    4.David Howells
<dhowells-AT-redhat.com>


依据以上方法修改后的LDD3书附带的源代码jiq.c如下(该例子中没有使用
contain_of()从work_struct获取私有数据,而是直接使用全局变量):

/*
 * jiq.c --
the just-in-queue module
 *
 * Copyright (C) 2001 Alessandro Rubini and
Jonathan Corbet
 * Copyright (C) 2001 O'Reilly & Associates
 *
 *
The source code in this file can be freely used, adapted,
 * and
redistributed in source or binary form, so long as an
 * acknowledgment
appears in derived source files. The citation
 * should list that the code
comes from the book "Linux Device
 * Drivers" by Alessandro Rubini and
Jonathan Corbet, published
 * by O'Reilly & Associates. No warranty is
attached;
 * we cannot take responsibility for errors or fitness for
use.
 *
 * $Id: jiq.c,v 1.7 2004/09/26 07:02:43 gregkh Exp
$
 */

 
#include <linux/config.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/init.h>

#include <linux/sched.h>
#include <linux/kernel.h>
#include <linux/fs.h> /*
everything... */

#include <linux/proc_fs.h>
#include <linux/errno.h> /*
error codes */

#include <linux/workqueue.h>
#include <linux/preempt.h>
#include <linux/interrupt.h> /*
tasklets */

MODULE_LICENSE("Dual
BSD/GPL"
);

/*
 * The delay for the delayed workqueue timer
file.
 */

static long delay = 1;
module_param(delay, long, 0);

/*
 * This module is a silly one: it only
embeds short code fragments
 * that show how enqueued tasks `feel' the
environment
 */

#define LIMIT    (PAGE_SIZE-128)    /*
don't print any more after this size */

/*
 * Print information about the current
environment. This is called from
 * within the task queues. If the limit is
reched, awake the reading
 * process.
 */

//static struct work_struct
jiq_work;

struct delayed_work
jiq_work;  //
struct work_struct修改为struct delayed_work
static DECLARE_WAIT_QUEUE_HEAD (jiq_wait);

/*
 * Keep track of info we need between task
queue runs.
 */

static
struct clientdata {
    int len;
    char *buf;
    unsigned long jiffies;
    long delay;
}jiq_data;

#define SCHEDULER_QUEUE ((task_queue *)
1)

static void jiq_print_tasklet(unsigned long);
static DECLARE_TASKLET(jiq_tasklet, jiq_print_tasklet, (unsigned long)&jiq_data);

/*
 * Do the printing; return non-zero if the
task should be rescheduled.
 */

static int jiq_print(void *ptr)
{
    struct clientdata *data = ptr;
    int len = data->len;
    char *buf = data->buf;
    unsigned long j = jiffies;

    if (len > LIMIT) {

        wake_up_interruptible(&jiq_wait);
        return 0;
    }

    if (len ==
0)
        len = sprintf(buf," time
delta preempt pid cpu command/n"
);
    else
        len =0;

      /* intr_count is only exported since 1.3.5, but
1.99.4 is needed anyways */

    len +=
sprintf(buf+len, "%9li
%4li %3i %5i %3i %s/n"
,
            j, j - data->jiffies,
            preempt_count(),
current->pid, smp_processor_id(),
            current->comm);

    data->len +=
len;
    data->buf +=
len;
    data->jiffies = j;
    return 1;
}

/*
 * Call jiq_print from a work
queue
 */

static void jiq_print_wq(struct work_struct *ptr)
{
//    struct clientdata *data = (struct clientdata
*) ptr;

    
    if
(!jiq_print (&jiq_data))
        return;
    
    if (jiq_data.delay)
        schedule_delayed_work(&jiq_work, jiq_data.delay);
    else
        schedule_work(&jiq_work.work);//使用
jiq_work.work
}

static int jiq_read_wq(char *buf, char **start, off_t offset,
                   int len, int *eof,
void *data)
{
    DEFINE_WAIT(wait);
    
    jiq_data.len = 0;
/* nothing printed, yet
*/

    jiq_data.buf = buf; /*
print in this place */

    jiq_data.jiffies = jiffies; /*
initial time */

    jiq_data.delay = 0;
    
    prepare_to_wait(&jiq_wait, &wait, TASK_INTERRUPTIBLE);
    schedule_work(&jiq_work.work);
//使用jiq_work.work
    schedule();
    finish_wait(&jiq_wait, &wait);

    *eof
= 1;
    return jiq_data.len;
}

static int jiq_read_wq_delayed(char *buf, char **start, off_t offset,
                   int len, int *eof,
void *data)
{
    DEFINE_WAIT(wait);
    
    jiq_data.len = 0;
/* nothing printed, yet
*/

    jiq_data.buf = buf; /*
print in this place */

    jiq_data.jiffies = (unsigned long )jiffies; /*
initial time */

    jiq_data.delay = delay;
    
    prepare_to_wait(&jiq_wait, &wait, TASK_INTERRUPTIBLE);
    schedule_delayed_work(&jiq_work, delay);
    schedule();
    finish_wait(&jiq_wait, &wait);

    *eof
= 1;
    return jiq_data.len;
}

/*
 * Call jiq_print from a
tasklet
 */

static void jiq_print_tasklet(unsigned long ptr)
{
    if (jiq_print ((void *)
ptr))
        tasklet_schedule (&jiq_tasklet);
}

static int jiq_read_tasklet(char *buf, char **start, off_t offset, int len,
                int *eof,
void *data)
{
    jiq_data.len = 0;
/* nothing printed, yet
*/

    jiq_data.buf = buf; /*
print in this place */

    jiq_data.jiffies = jiffies; /*
initial time */

    tasklet_schedule(&jiq_tasklet);
    interruptible_sleep_on(&jiq_wait);
/* sleep till completion
*/

    *eof = 1;
    return jiq_data.len;
}

/*
 * This one, instead, tests out the
timers.
 */

static
struct timer_list jiq_timer;

static void jiq_timedout(unsigned long ptr)
{
    jiq_print((void *)ptr);
/* print a line
*/

    wake_up_interruptible(&jiq_wait);
/* awake the process */
}

static int jiq_read_run_timer(char *buf, char **start, off_t offset,
                   int len, int *eof,
void *data)
{

    jiq_data.len = 0;
/* prepare the argument for jiq_print()
*/

    jiq_data.buf = buf;
    jiq_data.jiffies = jiffies;

    init_timer(&jiq_timer);
/* init the timer structure
*/

    jiq_timer.function
= jiq_timedout;
    jiq_timer.data = (unsigned long)&jiq_data;
    jiq_timer.expires = jiffies + HZ; /* one
second */

    jiq_print(&jiq_data);
/* print and go to sleep
*/

    add_timer(&jiq_timer);
    interruptible_sleep_on(&jiq_wait);
/* RACE */
    del_timer_sync(&jiq_timer);
/* in case a signal woke us up
*/

    
    *eof = 1;
    return jiq_data.len;
}

/*
 * the init/clean
material
 */

static
int jiq_init(void)
{

    /* this line is in jiq_init()
*/

    INIT_DELAYED_WORK(&jiq_work, jiq_print_wq); //
使用
INIT_DELAYED_WORK代替INIT_WORK
    create_proc_read_entry("jiqwq", 0,
NULL, jiq_read_wq, NULL);
    create_proc_read_entry("jiqwqdelay", 0,
NULL, jiq_read_wq_delayed, NULL);
    create_proc_read_entry("jitimer", 0,
NULL, jiq_read_run_timer, NULL);
    create_proc_read_entry("jiqtasklet", 0,
NULL, jiq_read_tasklet, NULL);

    return 0; /*
succeed */

}

static void jiq_cleanup(void)
{
    remove_proc_entry("jiqwq", NULL);
    remove_proc_entry("jiqwqdelay", NULL);
    remove_proc_entry("jitimer", NULL);
    remove_proc_entry("jiqtasklet", NULL);
}

module_init(jiq_init);
module_exit(jiq_cleanup);

抱歉!评论已关闭.