现在的位置: 首页 > 综合 > 正文

Detection of a buffer overrun

2013年09月07日 ⁄ 综合 ⁄ 共 3891字 ⁄ 字号 评论关闭

Value of one of the thread-argument variable has changed at some place, which is only detected when the program crashes.

(gdb) p *a
$7 = {
  host = 0x0, 
  port = 0, 
  h_ent = 0x7fff5fbf6f10, 
  request = 0x100005620 "UH??H??? ", 
  reqlen = 140734799803600, 
  timeout = 0x100004e56, 
  s = 0x100281000, 
  cnt = 1606381568, 
  sargs = 0x1001000e0
}

Look at the garbage.

So, watchpoint.

watch a

isn't helpful at all, since there is no write to the variable a. Have to use memory location watch, which is

(gdb) watch -location a
Hardware watchpoint 7: *(struct args **) 4297592504

(GDB shipped with OS X doesn't recognize watch -l (as specified in the help message).

Then the debugger kindly pauses my program at

(gdb) c
Continuing.
Hardware watchpoint 6 deleted because the program has left the block 
in which its expression is valid.
READ: 8192, TOTAL: 8192
Hardware watchpoint 7: *(struct args **) 4297592504

Old value = (struct args *) 0x7fff5fbf6f40
New value = (struct args *) 0x7fff5fbf6f00
0x0000000100002641 in parse_response (buf=0x10027eeb8 "HTTP/1.1 200 OK\r\nVary: Accept-Encoding\r\nContent-Type: image/x-icon\r\nLast-Modified: Tue, 14 Aug 2012 15:19:23 GMT\r\nDate: Wed, 17 Apr 2013 14:07:54 GMT\r\nExpires: Thu, 25 Apr 2013 14:07:54 GMT\r\nX-Content"..., len=8192, a=0x7fff5fbf7000) at md.c:512
512	    nextproc[len] = 0;
(gdb) bt
#0  0x0000000100002641 in parse_response (buf=0x10027eeb8 "HTTP/1.1 200 OK\r\nVary: Accept-Encoding\r\nContent-Type: image/x-icon\r\nLast-Modified: Tue, 14 Aug 2012 15:19:23 GMT\r\nDate: Wed, 17 Apr 2013 14:07:54 GMT\r\nExpires: Thu, 25 Apr 2013 14:07:54 GMT\r\nX-Content"..., len=8192, a=0x7fff5fbf7000) at md.c:512
#1  0x0000000100005c10 in recv_engine (p=0x7fff5fbf6f40) at engine.c:394
#2  0x00007fff88c637a2 in _pthread_start ()
#3  0x00007fff88c501e1 in thread_start ()

(gdb) info locals 
readlen = 8192
sfd = 3
i = 0
n = 3
quit = 0
_i = 0
a = (struct args *) 0x7fff5fbf6f00
buf = "HTTP/1.1 200 OK\r\nVary: Accept-Encoding\r\nContent-Type: image/x-icon\r\nLast-Modified: Tue, 14 Aug 2012 15:19:23 GMT\r\nDate: Wed, 17 Apr 2013 14:07:54 GMT\r\nExpires: Thu, 25 Apr 2013 14:07:54 GMT\r\nX-Content"...
go_loop = 1
total = 8192
error_exit = 0
out_rwx.86 = (int (*)[1]) 0x10027ee20
(gdb) l parse_response 
502	    
503	    return bytes_processed - bufsz_orig;
504	}
505	
506	int // return 0 for normal, 1 for all done, -1 for error
507	parse_response (char *buf, long len, struct session_args *a) {
508	    // responsible for parsing the status line 
509	    // and handle the rest to different parse sub routines
510	    char *nextinput = buf;
511	    char *nextproc = buf;
(gdb) l
512	    nextproc[len] = 0;
(gdb) l engine.c:366
361	                    || a->sargs[i].rtt_timer[0].tv_usec) {
362	                    // we only pause for the first time
363	                    timer_pause (a->sargs[i].rtt_timer);
364	                }
365	                //printf ("Read %d\n", sfd);
366	                int readlen = recv (sfd, buf, sizeof(buf) - 1, 0);
367	                if (readlen == 0) {
368	                    dprintf ("READ 0 <%d>\n", a->s[i]);
369	                    close (a->s[i]);
370	                    if (a->sargs[i].repeat_recv > 0) {

Everything is just crystal clear now. I did an recv using the whole buffer, at line 366, then passed to parse_response, which expected one more byte at end of buffer so I could append a '\0' for easier string operations (line 512), of which location overruns
into &a, the variable that shouldn't have been changed.

It's weird that valgrind didn't detect this. Maybe there is a stack checking tool in experimental stage I remember but this definitely is not in the memcheck tool.

Anyway, looks like it's working again now.

A few more insights:

1. this bug didn't reveal until I cleared out some recursion code, tests the code with multiple threads, on a line that has delays and drops.

2. GCC built binary on LInux didn't gave me trouble. LLVM on OS X shows up problems 8/10 times.

3. Read the code, make asserts base on assumptions. I did this in the parsing module, not in the boundary of networking and parsing modules.

4. watch -location is useful.

5. Still didn't figure out what is "-fstack-protector" for in gcc / clang

Update 0424:

Using valgrind exp-sgcheck

==25634== Invalid read of size 1
==25634==    at 0x4C2C008: index (h_intercepts.c:89)
==25634==    by 0x402BB6: parse_response (md.c:552)
==25634==    by 0x403A5F: test_parse (md.c:896)
==25634==    by 0x403CAF: main (md.c:951)
==25634==  Address 0x7ff000430 expected vs actual:
==25634==  Expected: stack array "buf" of size 8,192 in frame 2 back from here
==25634==  Actual:   unknown
==25634==  Actual:   is 0 after Expected

Valgrind sgcheck found one occurrence, which is a few accesses after the one bug I found using GDB as shown above. But it could still be useful sometimes maybe. 

抱歉!评论已关闭.