PCS(Procedure Call Standard for Arm architecture)就定义了过程调用中,寄存器的特殊用途。

Role in the procedure call standard

r15 PC The Program Counter.

r14 LR The Link Register.

r13 SP The Stack Pointer.

r12 IP The Intra-Procedure-call scratch register. (可简单的认为暂存SP)

实际上,还有一个r11是optional的,被称为FP,即frame pointer。

1,stack frame


简单的说,stack frame就是一个函数所使用的stack的一部分,所有函数的stack frame串起来就组成了一个完整的栈。stack frame的两个边界分别由FP和SP来限定。



在程序执行过程中(通常是发生了某种意外情况而需要进行调试),通过SP和FP所限定的stack frame,就可以得到母函数的SP和FP,从而得到母函数的stack frame(PC,LR,SP,FP会在函数调用的第一时间压栈),以此追溯,即可得到所有函数的调用顺序。

3,gcc关于stack frame的优化选项


其实gcc就有一个关于stack frame的优化选项:



Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some

On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro "FRAME_POINTER_REQUIRED" controls whether
a target machine supports this flag.







环境:X86+RedHat 9.0,gcc 3.2.2


$ cat test.c
void a(unsigned long a, unsigned int b)
unsigned long i;
unsigned int j;

i = a;
j = b;


j += 2;


$ gcc -c test.c -o with_SFP.o

$ objdump -D with_SFP.o

with_SFP.o: file format elf32-i386

Disassembly of section .text:

00000000 <a>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 08 sub $0x8,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: 89 45 fc mov %eax,0xfffffffc(%ebp)
c: 8b 45 0c mov 0xc(%ebp),%eax
f: 89 45 f8 mov %eax,0xfffffff8(%ebp)
12: 8d 45 fc lea 0xfffffffc(%ebp),%eax
15: ff 00 incl (%eax)
17: 8d 45 f8 lea 0xfffffff8(%ebp),%eax
1a: 83 00 02 addl $0x2,(%eax)
1d: c9 leave
1e: c3 ret
Disassembly of section .data:

可以看到函数ENTER时首先把上一层函数的EBP入栈,设置本函数的EBP,然后会根据临时变量的数量和对齐要求去设置ESP,也就产生了函数的stack frame。
我们再看看函数的返回:"leave"指令相当于"mov %ebp,%esp;pop %ebp",也就是ENTER是两条指令的恢复过程,所以,后面的"ret"指令和"call"指令对应。


$ gcc -fomit-frame-pointer -c test.c -o no_SFP.o

$ objdump -D no_SFP.o

no_SFP.o: file format elf32-i386

Disassembly of section .text:

00000000 <a>:
0: 83 ec 08 sub $0x8,%esp
3: 8b 44 24 0c mov 0xc(%esp,1),%eax
7: 89 44 24 04 mov %eax,0x4(%esp,1)
b: 8b 44 24 10 mov 0x10(%esp,1),%eax
f: 89 04 24 mov %eax,(%esp,1)
12: 8d 44 24 04 lea 0x4(%esp,1),%eax
16: ff 00 incl (%eax)
18: 89 e0 mov %esp,%eax
1a: 83 00 02 addl $0x2,(%eax)
1d: 83 c4 08 add $0x8,%esp
20: c3 ret
Disassembly of section .data:

显而易见,代码难懂了;-P, 代码执行长度缩短了,应该能引起效率的提升。 可恶的是,不能用backtrace调试了。

$ arm-linux-objdump -D SFP_arm.o

SFP_arm.o : file format elf32-littlearm

Disassembly of section .text:

00000000 <a>:
0: e1a0c00d mov ip, sp
4: e92dd800 stmdb sp!, {fp, ip, lr, pc}
8: e24cb004 sub fp, ip, #4 ; 0x4
c: e24dd010 sub sp, sp, #16 ; 0x10
10: e50b0010 str r0, [fp, -#16]
14: e50b1014 str r1, [fp, -#20]
18: e51b3010 ldr r3, [fp, -#16]
1c: e50b3018 str r3, [fp, -#24]
20: e51b3014 ldr r3, [fp, -#20]
24: e50b301c str r3, [fp, -#28]
28: e51b3018 ldr r3, [fp, -#24]
2c: e2833001 add r3, r3, #1 ; 0x1
30: e50b3018 str r3, [fp, -#24]
34: e51b301c ldr r3, [fp, -#28]
38: e2833002 add r3, r3, #2 ; 0x2
3c: e50b301c str r3, [fp, -#28]
40: e91ba800 ldmdb fp, {fp, sp, pc}
Disassembly of section .data:

$ arm-linux-objdump -D no_SFP_arm.o

no_SFP_arm.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <a>:
0: e24dd010 sub sp, sp, #16 ; 0x10
4: e58d000c str r0, [sp, #12]
8: e58d1008 str r1, [sp, #8]
c: e59d300c ldr r3, [sp, #12]
10: e58d3004 str r3, [sp, #4]
14: e59d3008 ldr r3, [sp, #8]
18: e58d3000 str r3, [sp]
1c: e59d3004 ldr r3, [sp, #4]
20: e2833001 add r3, r3, #1 ; 0x1
24: e58d3004 str r3, [sp, #4]
28: e59d3000 ldr r3, [sp]
2c: e2833002 add r3, r3, #2 ; 0x2
30: e58d3000 str r3, [sp]
34: e28dd010 add sp, sp, #16 ; 0x10
38: e1a0f00e mov pc, lr
Disassembly of section .data:

