现在的位置: 首页 > 综合 > 正文

关于strace的一点东西

2018年03月20日 ⁄ 综合 ⁄ 共 12423字 ⁄ 字号 评论关闭

       好久没写博客了,感觉有点惭愧,觉得自己也应该静下心来利用自己可分配的时间去提升自己.
       虽然最近在看一些Python的东西,可是觉得自己还是不能忘记本行啊,Linux C的一些东西必须一直好好温习.
       不废话,今天打算写一个简单的东西, strace , 应该都对它或多或少的知道一些.很多时候除了core dump, backtrace的方式去排查程序的问题外, strace 跟踪调试也是不错的选择.

  看一小段测试code:

  /*
======================================================================
   *
   *       Filename:  strace_use_test.c
   *        Version:  1.0
   *        Created:  2014年07月19日 09时40分49秒
   *       Revision:  none
   *       Compiler:  clang
   *         Author:  sim szm, xianszm007@gmail.com
   *        Company:  Class 1107 of Computer Science and Technology
   *
   * ======================================================================
   */
  
  #include <unistd.h>
  #include <assert.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/types.h>
  #include <fcntl.h>
  #include <sys/stat.h>
  
  int main(int argc, char *argv[])
  {
          int fd,i=0;
          fd=open("/tmp/non_file",O_RDONLY);
          if(fd<0)
                i=2;
          else
                i=1;
          return i;
  }

编译: gcc strace_use_test.c -o strace_use
  生成跟踪文件: strace -o strace_use.txt ./strace_use (其实就是跟踪结果重定向到文件)
  我们来看下对应代码生成的strace_dump信息:

  szm@szm-Lenovo:~/some_code$ cat strace_use.txt 
  (1) execve("./strace_use", ["./strace_use"], [/* 74 vars */]) = 0
  (2) brk(0)                                  = 0x8a92000
  (3) uname({sys="Linux", node="szm-Lenovo", ...}) = 0
  (4) access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
  (5) mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb775d000
  (6) access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
  (7) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
  (8) fstat64(3, {st_mode=S_IFREG|0644, st_size=131876, ...}) = 0
  (9)mmap2(NULL, 131876, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb773c000
  (10) close(3)                                = 0
  (11) access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
  (12) open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
  (13) read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\226\1\0004\0\0\0"..., 512) = 512
  (14) fstat64(3, {st_mode=S_IFREG|0755, st_size=1730024, ...}) = 0
  (15) mmap2(NULL, 1743580, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7592000
  (16) mprotect(0xb7735000, 4096, PROT_NONE)   = 0
  (17) mmap2(0xb7736000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a3) = 0xb7736000
  (18) mmap2(0xb7739000, 10972, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7739000
  (19) close(3)                                = 0
  (20) mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7591000
  (21) set_thread_area({entry_number:-1 -> 6, base_addr:0xb75916c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
  (22) mprotect(0xb7736000, 8192, PROT_READ)   = 0
  (23) mprotect(0x8049000, 4096, PROT_READ)    = 0
  (24) mprotect(0xb777f000, 4096, PROT_READ)   = 0
  (25) munmap(0xb773c000, 131876)              = 0
  (26) open("/tmp/non_file", O_RDONLY)         = -1 ENOENT (No such file or directory)
  (27) exit_group(2)                           = ?

      (当然,标号是我自己编辑上去的,哈哈)

      看到这一堆,也许你会看到一些熟悉的函数调用,但是每行干嘛的估计是不是有点迷惑.所以我们来说说每一行都在干什么.其实更多的就是每一行对应了一个系统调用:(下面的一些系统地址是我自己机子,换台机子肯定不同),差不多格式就是[系统调用的名( 参数... ) = 返回值 错误标志和描述] 这种样子,还是看看啥意思吧.
   1: 对于命令行下执行的程序,execve(或exec系列调用中的某一个)均为strace输出系统调用中的第一个。strace首先调用fork或clone函数新建一个子进程,然后在子进程中调用exec载入需要执行的程序(这里为./strace_use)
   2: 以0作为参数调用brk,返回值为内存管理的起始地址(若在子进程中调用malloc,则从0x8a92000地址开始分配空间)
   3: 调用access函数检验/etc/ld.so.nohwcap是否存在
   5: 使用mmap2函数进行匿名内存映射,以此来获取8192bytes内存空间,该空间起始地址为0xb775d000,匿名内存映射就是不涉及具体的文件名,避免了文件的创建及打开,很显然只能用于具有亲缘关系的进程间通信(这个解释也许有点简单,google一下吧)
   7: 调用open函数尝试打开/etc/ld.so.cache文件,返回文件描述符为3
   8: fstat64函数获取/etc/ld.so.cache文件信息
   9: 调用mmap2函数将/etc/ld.so.cache文件映射至内存
   10: close关闭文件描述符为3指向的/etc/ld.so.cache文件
   13: 调用read,从/lib/i386-linux-gnu/libc.so.6该libc库文件中读取512bytes,即读取ELF头信息
   16: 使用mprotect函数对0xb7735000,起始的4096bytes空间进行保护(PROT_NONE参数就是不能访问,对应还有PROT_READ表示可以读取)
   25: 调用munmap函数,将/etc/ld.so.cache文件从内存中去映射,与9行的mmap2函数对应
   26: 对应源码中使用到的唯一的系统调用——open函数,使用其打开/tmp/non_file文件
   27: 子进程结束,退出码为2(源码返回值)
  从之前的源码看来,真正能与源码对应上的只有open这一个系统调用(Line 26),其他系统调用几乎都用于进行进程初始化工作:装载被执行程序、载入libc函数库、设置内存映射等。而且源码中的if语句或其他代码在相应strace输出中并没有体现,因为它们并没有唤起系统调用。所以strace只关心程序与系统之间产生的交互,进而strace不适用于程序逻辑代码的排错和分析。当然对于Linux中几百个系统调用,上面strace输出的几个只是很小一块啦.
  
  下面说说strace一般常用的几点吧:
  首先那就是跟踪进程喽, 默认情况下,strace只跟踪指定的进程,而不对指定进程中新建的子进程进行跟踪。使用-f选项,可对进程中新建的子进程进行跟踪,并在输出结果中打印相应进程PID:
  我们吧strace_use_test.c的代码改为如下:      

 

  /*
======================================================================
   *
   *       Filename:  strace_use_test.c
   *        Version:  1.0
   *        Created:  2014年07月19日 09时40分49秒
   *       Revision:  none
   *       Compiler:  clang
   *         Author:  sim szm, xianszm007@gmail.com
   *        Company:  Class 1107 of Computer Science and Technology
   *
   * ======================================================================
   */
  
  #include <unistd.h>
  #include <assert.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/types.h>
  #include <fcntl.h>
  #include <sys/stat.h>
  
  int main(int argc, char *argv[])
  {
          pid_t pid;
          pid=fork();
          if(pid>0){
                  printf("child do_something here !\n");
          }else if(pid==0){
                  printf("parent area print!\n");
          }else{
                  perror("fork error !");
                  _exit(0);
          }
          return 0;
  }
  

对应加上 -f 选项跟踪父子进程, dump 信息如下:

  szm@szm-Lenovo:~/some_code$ strace -o strace_use.txt -f ./strace_use 
  parent area print!
  child do_something here !
  szm@szm-Lenovo:~/some_code$ cat strace_use.txt 
  9256  execve("./strace_use", ["./strace_use"], [/* 74 vars */]) = 0
  9256  brk(0)                            = 0x8d8d000
  9256  uname({sys="Linux", node="szm-Lenovo", ...}) = 0
  9256  access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
  9256  mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7799000
  9256  access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
  9256  open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
  9256  fstat64(3, {st_mode=S_IFREG|0644, st_size=131876, ...}) = 0
  9256  mmap2(NULL, 131876, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7778000
  9256  close(3)                          = 0
  9256  access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
  9256  open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
  9256  read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\226\1\0004\0\0\0"..., 512) = 512
  9256  fstat64(3, {st_mode=S_IFREG|0755, st_size=1730024, ...}) = 0
  9256  mmap2(NULL, 1743580, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb75ce000
  9256  mprotect(0xb7771000, 4096, PROT_NONE) = 0
  9256  mmap2(0xb7772000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a3) = 0xb7772000
  9256  mmap2(0xb7775000, 10972, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7775000
  9256  close(3)                          = 0
  9256  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75cd000
  9256  set_thread_area({entry_number:-1 -> 6, base_addr:0xb75cd6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
  9256  mprotect(0xb7772000, 8192, PROT_READ) = 0
  9256  mprotect(0x8049000, 4096, PROT_READ) = 0
  9256  mprotect(0xb77bb000, 4096, PROT_READ) = 0
  9256  munmap(0xb7778000, 131876)        = 0
  9256  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb75cd728) = 9257
  9257  fstat64(1,  <unfinished ...>
  9256  fstat64(1,  <unfinished ...>
  9257  <... fstat64 resumed> {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 0), ...}) = 0
  9256  <... fstat64 resumed> {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 0), ...}) = 0
  9257  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
  9256  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
  9257  <... mmap2 resumed> )             = 0xb7798000
  9256  <... mmap2 resumed> )             = 0xb7798000
  9257  write(1, "parent area print!\n", 19 <unfinished ...>
  9256  write(1, "child do_something here !\n", 26 <unfinished ...>
  9257  <... write resumed> )             = 19
  9256  <... write resumed> )             = 26
  9257  exit_group(0)                     = ?
  9256  exit_group(0)                     = ?

       9256   clone(child_stack=0,flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb75cd728) = 9257
即9256这个进程内核进行clone()产生9527.

       所以很多时候多进程的一些程序strace跟踪时,一般打开 -f 选项, 使用strace对运行中的程序进行跟踪,-p 参数即可,命令执行之后,被跟踪的进程照常执行,strace的其他选项也适用于运行中的进程跟踪。有时间要求的时候,strace还可以记录程序与系统交互时,各个系统调用发生时的时间信息,有r、t、tt、ttt、T等几个选项,它们记录时间的方式为:
  -T: 记录各个系统调用花费的时间,精确到微秒
  -r: 以第一个系统调用(通常为execve)计时,精确到微秒
  -t: 时:分:秒
  -tt: 时:分:秒. 微秒
  -ttt: 计算机纪元以来的秒数. 微秒
  
        Strace 还可以用来判断程序挂起的问题,继续修改程序如下:

/*
======================================================================
 *
 *       Filename:  strace_use_test.c
 *        Version:  1.0
 *        Created:  2014年07月19日 09时40分49秒
 *       Revision:  none
 *       Compiler:  clang
 *         Author:  sim szm, xianszm007@gmail.com
 *        Company:  Class 1107 of Computer Science and Technology
 *
 * ======================================================================
 */

#include <unistd.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <fcntl.h>
#include <sys/stat.h>

int main(int argc, char *argv[])
{
        printf("flag see !\n");
        if(strcmp(argv[1],"flag1")==0){
                while(1);
        }else if(strcmp(argv[1],"flag2")==0){
                sleep(400);
        }
        return 0;
}

        flag1 可以用来模拟用户态死循环挂死状态,flag2 可以用来模拟内核态挂死状态.因为sleep()是陷入内核的.
看下对应的 strace dump结果:
      首先对应的是flag1

szm@szm-Lenovo:~/some_code$ strace ./strace_use flag1
execve("./strace_use", ["./strace_use", "flag1"], [/* 74 vars */]) = 0
brk(0)                                  = 0x8cff000
uname({sys="Linux", node="szm-Lenovo", ...}) = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77a8000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=131876, ...}) = 0
mmap2(NULL, 131876, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7787000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\226\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1730024, ...}) = 0
mmap2(NULL, 1743580, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb75dd000
mprotect(0xb7780000, 4096, PROT_NONE)   = 0
mmap2(0xb7781000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a3) = 0xb7781000
mmap2(0xb7784000, 10972, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7784000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75dc000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb75dc6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7781000, 8192, PROT_READ)   = 0
mprotect(0x8049000, 4096, PROT_READ)    = 0
mprotect(0xb77ca000, 4096, PROT_READ)   = 0
munmap(0xb7787000, 131876)              = 0
fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 0), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77a7000
write(1, "flag see !\n", 11flag see !
)            = 11

      一直卡在这里循环

      再看看flag2:

szm@szm-Lenovo:~/some_code$ strace ./strace_use flag2
execve("./strace_use", ["./strace_use", "flag2"], [/* 74 vars */]) = 0
brk(0)                                  = 0x83c0000
uname({sys="Linux", node="szm-Lenovo", ...}) = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb777a000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=131876, ...}) = 0
mmap2(NULL, 131876, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7759000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\226\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1730024, ...}) = 0
mmap2(NULL, 1743580, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb75af000
mprotect(0xb7752000, 4096, PROT_NONE)   = 0
mmap2(0xb7753000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a3) = 0xb7753000
mmap2(0xb7756000, 10972, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7756000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75ae000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb75ae6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7753000, 8192, PROT_READ)   = 0
mprotect(0x8049000, 4096, PROT_READ)    = 0
mprotect(0xb779c000, 4096, PROT_READ)   = 0
munmap(0xb7759000, 131876)              = 0
fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 0), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7779000
write(1, "flag see !\n", 11flag see !
)            = 11
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({400, 0},

  陷入sleep

  从输出的跟踪信息可以看出两中情况的不同, 当用户态挂死情况下,strace在write(由调用的printf产生write进描述符1,也就是标准输出) 一行输出之后没有其他系统调用输出;进程在内核态挂死,最后一行的系统调用nanosleep不能完整显示 (这里nanosleep没有返回值表示该调用尚未完成 ) 所以我们就可以知道:使用strace跟踪挂死程序,如果最后一行系统调用显示完整,程序在逻辑代码处挂死;如果最后一行系统调用显示不完整,程序在该系统调用处挂死。

  当程序挂死在系统调用处,我们可以查看相应系统调用的man手册,了解在什么情况下该系统调用会出现挂死情况。
  
  当然strace 还有其他一堆参数,就不说了,其实我也没看,哈哈.常用跟踪进程信息用法的就是strace -o output.txt -T -tt -e trace=all -f -p pid (-e trace=all 意思是跟踪所有系统调用)

       Ok, 周末一上午写篇Blog 总比睡觉好.Over .

抱歉!评论已关闭.