现在的位置: 首页 > 综合 > 正文

一些性能测试词汇

2013年08月20日 ⁄ 综合 ⁄ 共 2892字 ⁄ 字号 评论关闭

Operands 操作数
i = immediate constant, r = any general purpose register, r32 = 32-bit register, etc., mm = 64
bit mmx register, x or xmm = 128 bit xmm register, y = 256 bit ymm register, sr = segment
register, m = any memory operand including indirect operands, m64 means 64-bit memory
operand, etc.

 

Reciprocal throughput  吞吐量的倒数=时钟周期/指令 越小越好。

(Reciprocal throughput is simply the reciprocal of the maximum throughput of a particular instruction. Throughput is measured in instructions/cycle, so reciprocal throughput is cycles/instruction.

The throughput is the maximum number of instructions of the same kind that can be
executed per clock cycle when the operands of each instruction are independent of the
preceding instructions. The values listed are the reciprocals of the throughputs, i.e. the
average number of clock cycles per instruction when the instructions are not part of a
limiting dependency chain. For example, a reciprocal throughput of 2 for FMUL means that
a new FMUL instruction can start executing 2 clock cycles after a previous FMUL. A
reciprocal throughput of 0.33 for ADD means that the execution units can handle 3 integer
additions per clock cycle.)

 

Latency 延时  表示完全执行一个指令所需的时钟周期,延时越少越好。

(The latency of an instruction is the delay that the instruction generates in a dependency
chain. The measurement unit is clock cycles. Where the clock frequency is varied
dynamically, the figures refer to the core clock frequency. The numbers listed are minimum
values. Cache misses, misalignment, and exceptions may increase the clock counts
considerably. Floating point operands are presumed to be normal numbers. Denormal
numbers, NAN’s and infinity may increase the latencies by possibly more than 100 clock
cycles on many processors, except in move, shuffle and Boolean instructions. Floating point
overflow, underflow, denormal or NAN results may give a similar delay. A missing value in
the table means that the value has not been measured or that it cannot be measured in a
meaningful way.)

 

µops 微操作(micro-operation)
Uop or µop is an abbreviation for micro-operation. Processors with out-of-order cores are
capable of splitting complex instructions into µops. For example, a read-modify instruction
may be split into a read-µop and a modify-µop. The number of µops that an instruction
generates is important when certain bottlenecks in the pipeline limit the number of µops per
clock cycle.

 

Execution unit 执行单元 (每个执行单元可以处理特定类型的微操作)
The execution core of a microprocessor has several execution units. Each execution unit
can handle a particular category of µops, for example floating point additions. The
information about which execution unit a particular µop goes to can be useful for two
purposes. Firstly, two µops cannot execute simultaneously if they need the same execution
unit. And secondly, some processors have a latency of an extra clock cycle when the result
of a µop executing in one execution unit is needed as input for a µop in another execution
unit.

 

Execution port 执行端口(每个执行端口一次只能传送一个微操作到执行单元中)
The execution units are clustered around a few execution ports on most Intel processors.
Each µop passes through an execution port to get to the right execution unit. An execution
port can be a bottleneck because it can handle only one µop at a time. Two µops cannot
execute simultaneously if they need the same execution port, even if they are going to
different execution units.

 

 

 

 

 

 

 

 

 

抱歉!评论已关闭.