http://blog.chinaunix.net/uid-24774106-id-3372932.html
http://blog.chinaunix.net/uid-24774106-id-3379478.html
chrt -p <pid>
SCHED_OTHER,SCHED_FIFO,SCHED_RR
最近几天结合源码看了很多linux进程调度的文章,虽然掌握了个大概,但是越看,细节越多,写这篇文章的信心也就越不足,曾有系列文章叫鼠眼看linux进程调度,很符合我现在的心境,就像盲人摸象,学到一些东西,很惊喜,但是总有一种力不从心的惶恐。但是好久没写博文了,还是写一篇。写的不对的地方,请大家批评指正。
锁,等很多的机制带来了太多的不确定性,很难做到硬实时。
默认值为1秒和0.95秒换句话说,在1秒的周期以内,所有实时进程运行时间之和不得超过0.95秒,剩下的0.5秒钟留给普通进程。
之间的亲缘关系是不同的。
实现的。
linux进程调度相关的内核代码看了两遍左右,也看了一些讲述linux进程调度的一些文章,总想写个系列文章,把进程调度全景剖析一遍,但是总是感觉力不逮己,自己都不敢下笔写文章了。算了,还是不难为自己了,就随便写写自己的心得好了。
- #include<stdio.h>
- #include<stdlib.h>
- #include<unistd.h>
- #include<sys/time.h>
- #include<sys/types.h>
- #include<sys/sysinfo.h>
- #include<time.h>
- #define __USE_GNU
- #include<sched.h>
- #include<ctype.h>
- #include<string.h>
- #define COUNT 300000
- #define MILLION 1000000L
- #define NANOSECOND 1000
- void test_func()
- {
- int i = 0;
- unsigned long long result = 0;;
- for(i= 0; i<8000;i++)
- {
- result += 2;
- }
- }
- int main(int argc,char* argv[])
- {
- int i;
- struct timespec sleeptm;
- long interval;
- struct timeval tend,tstart;
- struct tm lcltime =
{0}; - struct sched_param param;
- int ret = 0;
- if(argc!= 3)
- {
- fprintf(stderr,"usage:./test sched_method sched_priority\n");
- return -1;
- }
- cpu_set_t mask ;
- CPU_ZERO(&mask);
- CPU_SET(1,&mask);
- if (sched_setaffinity(0, sizeof(mask),&mask)
==
-1) - {
- printf("warning: could not set CPU affinity, continuing...\n");
- }
- int sched_method
= atoi(argv[1]); - int sched_priority
= atoi(argv[2]); - /* if(sched_method> 2
|| sched_method< 0) - {
- fprintf(stderr,"sched_method scope [0,2]\n");
- return -2;
- }
- if(sched_priority> 99
|| sched_priority< 1) - {
- fprintf(stderr,"sched_priority scope [1,99]\n");
- return -3;
- }
- if(sched_method== 1
|| sched_method
== 2)*/ - {
- param.sched_priority
= sched_priority; - ret = sched_setscheduler(getpid(),sched_method,¶m);
- if(ret)
- {
- fprintf(stderr,"set scheduler to %d %d failed %m\n");
- return -4;
- }
- }
- int scheduler
= sched_getscheduler(getpid()); - fprintf(stderr,"the scheduler of PID(%ld) is %d, priority (%d),BEGIN time is :%ld\n",
- getpid(),scheduler,sched_priority,time(NULL));
- sleep(2);
- sleeptm.tv_sec
= 0; - sleeptm.tv_nsec
= NANOSECOND; - for(i= 0;i<COUNT;i++)
- {
- test_func();
- }
- interval = MILLION*(tend.tv_sec- tstart.tv_sec)
- +(tend.tv_usec-tstart.tv_usec);
- fprintf(stderr," PID = %d\t priority: %d\tEND TIME is %ld\n",getpid(),sched_priority,time(NULL));
- return 0;
- }
上面这个程序有几点需要说明的地方
- struct sched_param {
- /*
...
*/ - int sched_priority;
- /*
...
*/ - };
- #define SCHED_OTHER 0
- #define SCHED_FIFO 1
- #define SCHED_RR 2
- #ifdef __USE_GNU
- # define SCHED_BATCH 3
- #endif
SCHED_OTHER表示普通进程,对于普通进程,第三个参数sp->sched_priority只能是0
- /*
- * Valid priorities
for SCHED_FIFO and SCHED_RR are - * 1..MAX_USER_RT_PRIO-1, valid priorityfor
SCHED_NORMAL, - * SCHED_BATCH
and SCHED_IDLE is 0. - */
LINUX系统提供了其他的系统调用来获取不同策略优先级的取值范围:
- #include <sched.h>
- int sched_get_priority_min
(int policy); - int sched_get_priority_max
(int policy);
- int sched_setparam(pid_t pid,const struct sched_param
*sp);
在用户层或者应用层,1表示优先级最低,99表示优先级最高。但是在内核中,[0,99]表示的实时进程的优先级,0最高,99最低。[100,139]是普通进程折腾的范围。应用层比较天真率直,就看大小,数字大,则优先级高。ps查看进程的优先级也是如此。有意思的是,应用层实时进程最高优先级的99,在ps看进程优先级的时候,输出的是139.
- PID PRI CMD TIME PSR
- 6303 139 ./test 1 99 00:00:04 1
虽说本文主要讲的是实时进程,但是需要插句话。对于普通进程,是通过nice系统调用来调整优先级的。从内核角度讲[100,139]是普通进程的优先级的范围,100最高,139最低,默认是120。普通进程的优先级的作用和实时进程不同,普通进程优先级表示的是占的CPU时间。深入linux内核架构中提到,普通优先级越高(100最高,139最低),享受的CPU time越多,相邻的两个优先级,高一级的进程比低一级的进程多占用10%的CPU,比如内核优先级数值为120的进程要比数值是121的进程多占用10%的CPU。
- static const
int prio_to_weight[40]=
{ - /*
-20 */ 88761, 71755, 56483, 46273, 36291, - /*
-15 */ 29154, 23254, 18705, 14949, 11916, - /*
-10 */ 9548, 7620, 6100, 4904, 3906, - /*
-5 */ 3121, 2501, 1991, 1586, 1277, - /* 0
*/ 1024, 820, 655, 526, 423, - /* 5
*/ 335, 272, 215, 172, 137, - /* 10
*/ 110, 87, 70, 56, 45, - /* 15
*/ 36, 29, 23, 18, 15, - };
- [root@localhost sched]# cat comp.sh
- #/bin/sh
- ./test $1 99 &
- usleep 1000;
- ./test $1 70 &
- usleep 1000;
- ./test $1 70 &
- usleep 1000;
- ./test $1 70 &
- usleep 1000;
- ./test $1 50 &
- usleep 1000;
- ./test $1 30 &
- usleep 1000;
- ./test $1 10 &
因为test进程有sleep 2秒,所以可以给comp.sh启动其他test的机会。可以看到有 99级(最高优先级)的实时进程,3个70级的实时进程,50级,30级,10级的各一个。
- #define DEF_TIMESLICE (100* HZ
/ 1000)
- [root@localhost sched]# cat getpsinfo.sh
- #!/bin/sh
- for((i = 0; i < 40; i++))
- do
- ps -C test -o pid,pri,cmd,time,psr >>psinfo.log 2>&1
- sleep 2;
- done
第二个脚本比较复杂是systemtap脚本,观察名字为test的进程相关的上下文切换,谁替换了test,或者test替换了谁,同时记录下test进程的退出:
- [root@localhost sched]# cat cswmon_spec.stp
- global time_offset
- probe begin { time_offset
= gettimeofday_us()} - probe scheduler.ctxswitch
{ - if(next_task_name==
"test" ||prev_task_name==
"test") - {
- t = gettimeofday_us()
- printf(" time_off (%8d )%20s(%6d)(pri=%4d)(state=%d)->%20s(%6d)(pri=%4d)(state=%d)\n",
- t-time_offset,
- prev_task_name,
- prev_pid,
- prev_priority,
- (prevtsk_state),
- next_task_name,
- next_pid,
- next_priority,
- (nexttsk_state))
- }
- }
- probe scheduler.process_exit
- {
- if(execname()==
"test") - printf("task :%s PID(%d) PRI(%d) EXIT\n",execname(),pid,priority);
- }
- probe timer.s($1){
- printf("--------------------------------------------------------------\n")
- exit();
- }
- 终端1 :
- stap ./cswmon_spec.stp 70
- 终端2 :
- ./getpsinfo.sh
- 终端3
- ./comp.sh 1
输出结果如下:
- time_off ( 689546 ) test( 6305)(pri= 120)(state=0)-> migration/2( 11)(pri= 0)(state=0)
- time_off ( 689977 ) stap( 5895)(pri= 120)(state=0)-> test( 6305)(pri= 120)(state=0)
- time_off ( 690067 ) test( 6305)(pri= 29)(state=1)-> stap( 5895)(pri= 120)(state=0)
- time_off ( 697899 ) test( 6303)(pri= 120)(state=0)-> migration/2( 11)(pri= 0)(state=0)
- time_off ( 698042 ) test( 6307)(pri= 120)(state=0)-> migration/0( 3)(pri= 0)(state=0)
- time_off ( 699114 ) stap( 5895)(pri= 120)(state=0)-> test( 6303)(pri= 120)(state=0)
- time_off ( 699307 ) test( 6303)(pri= 0)(state=1)-> test( 6307)(pri= 120)(state=0)
- time_off ( 699371 ) test( 6307)(pri= 29)(state=1)-> stap( 5895)(pri= 120)(state=0)
- time_off ( 699392 ) test( 6309)(pri= 120)(state=0)-> migration/3( 15)(pri= 0)(state=0)
- time_off ( 699966 ) events/1( 20)(pri= 120)(state=1)-> test( 6309)(pri= 120)(state=0)
- time_off ( 700034 ) test( 6309)(pri= 29)(state=1)-> stap( 5895)(pri= 120)(state=0)
- time_off ( 707379 ) test( 6311)(pri= 120)(state=0)-> migration/3( 15)(pri= 0)(state=0)
- time_off ( 707587 ) test( 6313)(pri= 120)(state=0)-> migration/0( 3)(pri= 0)(state=0)
- time_off ( 712021 ) stap( 5895)(pri= 120)(state=0)-> test( 6311)(pri= 120)(state=0)
- time_off ( 712145 ) test( 6311)(pri= 49)(state=1)-> test( 6313)(pri= 120)(state=0)
- time_off ( 712252 ) test( 6313)(pri= 69)(state=1)-> stap( 5895)(pri= 120)(state=0)
- time_off ( 727057 ) test( 6315)(pri= 120)(state=0)-> migration/0( 3)(pri= 0)(state=0)
- time_off ( 727952 ) stap( 5895)(pri= 120)(state=0)-> test( 6315)(pri= 120)(state=0)
- time_off ( 728047 ) test( 6315)(pri= 89)(state=1)-> stap( 5895)(pri= 120)(state=0)
- time_off ( 2690181 ) stap( 5895)(pri= 120)(state=0)-> test( 6305)(pri= 29)(state=0)
- time_off ( 2699316 ) test( 6305)(pri= 29)(state=0)-> test( 6303)(pri= 0)(state=0)
- task :test PID(6303) PRI(0) EXIT
- time_off (13057854 ) test( 6303)(pri= 0)(state=64)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (13057864 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6305)(pri= 29)(state=0)
- time_off (15333340 ) test( 6305)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (15333354 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6305)(pri= 29)(state=0)
- time_off (18743409 ) test( 6305)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (18743422 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6305)(pri= 29)(state=0)
- time_off (22154757 ) test( 6305)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (22154771 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6305)(pri= 29)(state=0)
- task :test PID(6305) PRI(29) EXIT
- time_off (22466855 ) test( 6305)(pri= 29)(state=64)-> test( 6307)(pri= 29)(state=0)
- time_off (25563548 ) test( 6307)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (25563566 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6307)(pri= 29)(state=0)
- time_off (28973602 ) test( 6307)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (28973616 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6307)(pri= 29)(state=0)
- task :test PID(6307) PRI(29) EXIT
- time_off (31846121 ) test( 6307)(pri= 29)(state=64)-> test( 6309)(pri= 29)(state=0)
- time_off (32383671 ) test( 6309)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (32383683 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6309)(pri= 29)(state=0)
- time_off (35793735 ) test( 6309)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (35793747 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6309)(pri= 29)(state=0)
- time_off (39203797 ) test( 6309)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (39203809 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6309)(pri= 29)(state=0)
- task :test PID(6309) PRI(29) EXIT
- time_off (41200440 ) test( 6309)(pri= 29)(state=64)-> test( 6311)(pri= 49)(state=0)
- time_off (42613866 ) test( 6311)(pri= 49)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (42613898 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6311)(pri= 49)(state=0)
- time_off (46024070 ) test( 6311)(pri= 49)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (46024082 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6311)(pri= 49)(state=0)
- time_off (49434004 ) test( 6311)(pri= 49)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (49434017 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6311)(pri= 49)(state=0)
- task :test PID(6311) PRI(49) EXIT
可以清楚的可到,同样是70优先级(内核态是29),6305退出以前,6307根本就捞不着跑。同样6307退出一样,6309根本就捞不着跑。这就是FIFO。
- 终端1 :
- stap ./cswmon_spec.stp 70
- 终端2 :
- ./getpsinfo.sh
- 终端3
- ./comp.sh 1
- time_off ( 4188015 ) test( 6428)(pri= 0)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off ( 4188025 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6428)(pri= 0)(state=0)
- time_off ( 7612014 ) test( 6428)(pri= 0)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off ( 7612024 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6428)(pri= 0)(state=0)
- task :test PID(6428) PRI(0) EXIT
- time_off (10679062 ) test( 6428)(pri= 0)(state=64)-> test( 6430)(pri= 29)(state=0)
- time_off (10964413 ) test( 6430)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (10964422 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6430)(pri= 29)(state=0)
- time_off (11709024 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
- time_off (12736030 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
- time_off (13779022 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
- time_off (13879021 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
- time_off (13984075 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
- time_off (14084020 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
- time_off (14184023 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
- time_off (14284024 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
- time_off (14374486 ) test( 6434)(pri= 29)(state=0)-> watchdog/1( 10)(pri= 0)(state=0)
- time_off (14374502 ) watchdog/1( 10)(pri= 0)(state=1)-> test( 6434)(pri= 29)(state=0)
- time_off (14384097 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
- time_off (14484066 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
- time_off (14584023 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
- time_off (14684020 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
- time_off (14786032 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
- time_off (14886020 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
- time_off (14986026 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
- time_off (15089023 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
- time_off (15192030 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
- time_off (15292026 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
- time_off (15396085 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
- time_off (15496022 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
- time_off (15596027 ) test( 6434)(pri= 29)(state=0)-> test( 6430)(pri= 29)(state=0)
- time_off (15696153 ) test( 6430)(pri= 29)(state=0)-> test( 6432)(pri= 29)(state=0)
- time_off (15796022 ) test( 6432)(pri= 29)(state=0)-> test( 6434)(pri= 29)(state=0)
用户态实时优先级为99,内核态优先级为0的进程6428退出后,3个用户态实时优先级为70的进程6430,6432,6434你方唱罢我登场,每个人都"唱"多久呢?看相邻2条记录的时间差,基本都在100ms左右,这就是时间片。