现在的位置: 首页 > 综合 > 正文

深入理解PHP的引用(References in PHP)

2013年08月07日 ⁄ 综合 ⁄ 共 19048字 ⁄ 字号 评论关闭

深入理解PHP的引用(References in
PHP)
  huangguisu

为了深入理解PHP的引用,找到一篇老外的东西: http://derickrethans.nl/talks/phparch-php-variables-article

很多内容还是直接看英文版比较好,翻译过来有时候词不达意。

基础知识

php在zend里面存储的变量,PHP中每个变量都有对应的 zval, Zval结构体定义在Zend/zend.h里面,其结构:

typedef struct _zval_struct zval;  
struct _zval_struct {  
    /* Variable information */  
    zvalue_value value;     /* The value  存储变量的值*/  
    zend_uint refcount__gc; /* 引用计数 */  
    zend_uchar type;        /* 变量具体类型*/  
    zend_uchar is_ref__gc;  /* 是否引用 1为引用,0不是*/  
};  

后面也经常提到refcount 即refcount_gc (PHP5.3以后引入的垃圾收集机制)

PHP’s handling of variables can be non-obvious, at times.Have you ever wondered what happens at the engine level  when a variable is copied to another? How about when a  function returns a variable “by reference?” If so, read on.

PHP是弱语言,其变量处理的过程是不可见的。你是否曾经很想知道在变量复制的时候,PHP引擎做了什么?你是否曾经很想知道一个函数是如何以引用的方式返回一个变量?如果是这样,请您接着向下看。

       Every computer language needs some form of container to hold data-variables. In some languages,  those variables have a specific type attached to  hem. They can be a string, a number, an array, an  object or something
else. Examples of such statically-typed languages are C and pascal. Variables in PHP do not have this specific restraint. They can be a string in one line, but a number in the next line. Converting  between types is also easy to do, and often, even auto-matic.
These loosely-typed variables are one of the  properties that make PHP such an easy and powerful  language, although they can sometimes also cause  interesting problems. Internally, in PHP, those variables are all stored in a  similar container, called a zval
container (also called“variable container”). This container keeps track of several things that are related to a specific value. The most  important things that a variable container contains are  the value of the “variable”, but also the type of the variable.
Python is similar to PHP in this regard as it also labels each variable with a type. The variable container  contains a few more fields that the PHP engine uses to  keep track of whether a value is a reference or not. It also keeps reference count of its value. Variables
are stored in a symbol table, which is quite analogous to an associative array. This array has keys  that represent the name of the variable, and those keys point to variable containers that contain the value (andtype) of the variables. See Figure 1 for an
example of  this.

总结就是变量存储在一个于类似关联数组的符号表中。

.

1 . 引用计数 Reference Counting

PHP tries to be smart when it deals with copying variables like in $a   =   $b. Using the = operator is also called an “assign-by-value” operation. While assigning by  value, the PHP engine will not actually create a copy
of  the variable container, but it will merely increase the refcount__gc field in the variable container. As you can  imagine this saves a lot of memory in case you have a large string of text, or a large array.Figure
2
shows how  this “looks”. 

    In Step 1 there is one variable, a, which  contains the text"this  is" s and it has (by default) a reference count of 1.

    In step 2, we assign variable $a  to  variable$b and$c. Here, no copy of the variable container is made, only the  refcount value gets updated  with 1 for each variable that
is assigned to the container. Because we assign two more variables here, the refcount gets updated to 2 and ends up being 3 after the two assignment statements. 

    in step 3,Now, you might wonder what would happen if the variable $cgets changed. Two things might happen, depending on the value of therefcount. If the value is 1, then the  container
simply gets updated with its new value (and possibly its type, too). In case therefcountvalue is larger than 1, a new variable container gets created containing  the new value (and type). You can see this  in step 3 of Figure 2。Therefcount
value for the variable container that is linked to the variable  $ais decreased by one so that the variable container that belongs to variable$a and $b now has a refcount of 2, and the newly
created container has a refcount of 1.

   in step 4 ,When  unsett( ) is called on a variable the  refcount value of the variable container that is linked to the variable that is unset will be decreased by one.
This happens when we call unset( $b )  in step 4. If the refcount value drops below 1, the PHP Engine will free the variable container.

   in step 5,The variable container is then destroyed, as you can see in step 5.

2. 函数传值 Passing Variables to Functions

       Besides the global symbol table that every script has, every call to a user defined function creates a symbol table where a function locally stores its variables. Every time a function is called, such a symbol table is created, and every time a function
returns, this symbol table is destroyed. A function returns by either using the
return 
statement, or by implicitly returning because the end of the function has been reached.

     In Figure 3, I illustrate exactly how variables are assed to functions. 

     In step 1, we  assign a value to the ariable $a, again—“this is”. We pass this variable to the  do_something g(  )   function, where it is received in the ariable
$s

     In step 2, you can see that it is practically he same operation as assigning a variable to another ne (like we did in the previous section with $b   =  $a),except that the variable is stored in a different
symbol table—the one that belongs to the called function—and that the reference count is increased twice, instead  he normal once. The reason for this is that the function’s stack also contains a reference to the variable container.(原因是函数栈也包含了这个变量容器的引用)

     in step 3 ,When we assign a new value to the variable  $s in step 3, the  refcount of the original variable container is decreased by one and a new variable container is created, containing
the new variable.

     In step 4, we return the variable with thereturn  statement. The returned  variable gets an entry in the global symbol table and  the refcount
 
value is increased by 1. When the function  ends, the function’s symbol table will be destroyed. During the destruction, the engine will go over all variables in the symbol table and decrease therefcount
 
of  
each variable container. When a refcount  of a variable  container reaches 0, the variable container is destroyed.
As you see, the variable container is again not copied  when returning it from the function due to PHP’s reference counting mechanism.
      If the variable $s  would not have been modified in  step 3 then variable$a and $b would still point to the
same variable container which would have a refcount   value of 2. In this situation, a copy of the variable container that was created with the statement$a    = =   “ this  is ” would not have been made

3. 介绍引用Introducing References

References are a method of having two names for the same variable. A more technical description would be: references are a method of having two keys in a symbol table pointing to the same zval container. References  can be created with the  reference assignment
operator  &=.


Figure 4 gives a schematic overview of how references work in combination with reference counting. 

     Instep 1, we create a variable$a that contains the string  “this is”

     Instep 2,Then in step two we create two references ($b and  $c)to the same variable container. The refcount  increases normally for each assignment
making the  final refcount 3, after both assignments by reference ($b   =&    $a and  $c  =& $a ), but because the  reference assignment operator is used, the other valueis_ref is now set to 1. 

        This value is important for two reasons.The second one I will  divulge a little bit later in this article(后面将会说明第二原因),   and the first reason that makes this value important is when we are reassigning a new value to one of the three variables that
all point to the same variable container. If the is_ref  value is set to 0 when a new value is set  for a specific variable, the PHP engine will create a new  variable container as you could see in step 3 of Figure 2. But
if the   is_ref  value is set to 1, then the PHP  engine will not create a new variable container and simply only update the value to which one of the variable
names point as you can see in step 2 of Figure 4

    In step 3, The exact same result would be reached when the statement $a   = 42 was used instead of$b  =  42. After the  variable container is modified,
all three variables$a,  $band $c will contain the value  42 .

    In step 4, we use theunset() language construct to  remove a variable—in this case variable $c. Using
 unset() on a variable means that therefcount  value of  the variable container that the variable
points to gets   decreased by 1. This works exactly the same for referenced variables. There is one difference, though, that  shows in step 5. 

    In step 5 When the reference count of a variable  container reaches 1 and the is_ref  value is set to 1, the  is_ref
 
value is reset to 0. The reason for this is that a  variable container can only be marked as a referenced   variable container when there is more than one variable   pointing to the variable container.

4 .混合变量直接赋值和引用赋值 Mixing Assign-by-Value and Assign-by-Reference

混合方式系,并没有节约内存空间,反而增加了。这个由于引用赋值后需要重新分配一份内存给引用的变量。
Something interesting—and perhaps unexpected—happens if you mix an assign-by-value call and an assign-by-reference call. This shows in Figure 5.

 

     In step 1,In the first step we create two variables$a   and$b, where the latter is assigned-by-value to the former. This creates a situation where
there is one variable container withis_ref set to 0 and r re ef fc co ou un nt t set to 2. This should be familiar by now.
     In step 2 we proceed by assigning variable$c by reference to variable$c. Here, the PHP engine will create a copy of the variable container. The variable$akeeps  pointing
to the original variable container but the  refcount  is, of course, decreased to 1 as there is only  one variable pointing the this variable container now.  The variables $b and$c point
to the copied container   which has now arefcount  of 2 and theis_ref  value is set to 1.

    You can see that in this case, using a reference does not save you any memory, it actually uses more memory, as it had to duplicate the original variable container.
The container had to be copied, otherwise the PHP  engine would have no way of knowing how to deal  with the reassignment of one of the three variables as  two of them were references to the same container$b  and$c, while the
other was not supposed to be a reference. If there is only one container with r re ef fc co ou un nt t set to  3, andis_ref set to 1, then it is impossible to figure that out. That is the reason why the PHP engine needs to create a copy of
the container when you do an assignment-by-reference.

    If we switch the order of assignments—first we assign  $a
by reference to $b and then we assign $a by value to $c—then something similar happens. Figure 6 shows  how this is handled.

    

    In step 1,
In the first step we assign the variable $a to the string
“this is”
and then we proceed to assign $a by reference to variable$b. We now have one variable container whereis_ref  is 1 and  refcount   is 2.
    In step 2,
, we assign variable $a by value to variable $c, now a copy of the variable container is made in order for the PHP engine to be able to handle modifications to the variables, correctly, with the same reasons as stated in the
previous paragraph.But if you go back to step 2 of Figure 2, where we assign the variable$ato both$b and$c, you see that no copy is made here.

5. 函数引用传递Passing References to Functions
Variables can also be passed-by-reference to functions. This is useful when a function needs to modify the value of a specific variable when it is called. The script in
Figure 7 is a slightly modified version of the script that  you have already seen in Figure 3.

      The only difference is the ampersand (&) in front of the$s  variable in the declaration of the functiondo_something(). This ampersand instructs the PHP engine that the variable to  which the ampersand
is applied is going to be passed  by reference and not by value. A different name for a passed-by-reference variable is an “out variable”. When a variable is passed by reference to a function the new variable in the function’s symbol table is pointed to the
old container and the refcount   value is
increased by 2 (one for the symbol table, and one for the stack). Just as in a normal assignment-by-reference  the is_ref  value inside the variable container is also set to 1 as you can see in step 2. From here on, the same  things happen
as with a normal reference like in step 3,where no copy of the variable container is made if we  assign a new value to the variable$s.

     The  refcount    $s ;  statement is basically the same as the $c   = $a statement in step 2 of Figure 6. The global varible$a  and the local variable    $s 
are both references to  he same variable container and the logic dictates that   is_ref   is set to 1 for a specific container and this conainer is assigned to another variable by-value, the conainer does not need to be duplicated. This is
exactly  hat happens here, except that the newly created varible is created in the global symbol table by the assignment of the return value of the function with the statement $b  =  do_something( $s ).

6 . 函数引用返回 Returning by Reference
         Another feature in PHP is the ability to “return by reference”. This is useful, for example, if you want to select a variable for modification with a function, such as
selecting an array element or a node in a tree structure. In Figure 8 we show how returning by references work by means of an example.

 

      In step 1,In this example (step 1), we  define a $tree variable (which is actually not a tree, but a simple array) that contains three elements. The three
elements have key values of 1, 2 and 3, and all of them  point to a string describing the English word that  matches with the key’s value (ie.one,  two and three). 

      In step 2,This array gets passed to the fiind_node()function by  reference, along with the key of the element that thefiind_node()
function should look for and return. We  need to pass by reference here, otherwise we can not   return a reference to one of the elements, as we will be  returning a reference to a copy of the  $tree . When  $tree  is passed
to the function it has arefcount of 3  andis_refis set to 1. Nothing new here.

      In step 3,The first statement in the function,  $item  = & $node[$key], causes a new variable to be created in the  symbol table of the function, which points to the array  element where the key is
“3” (because the variable$key is set to 3). In this step 3 you see that the creation of  the$item by  assigning it by reference to the array element causes therefcountvalue
of the variable container that belongs to the array element to be increased by 1. Theis_refvalue of that variable container is now 1, too, of course.

     In step 4,The interesting things happen in step 4 where we   return $item   (by reference) back to the calling scope  and assign it (by reference) to  $node. This causes therefcountof
the variable container to which the 3rd  array key points to be set to 3. At this point $tree$item   (from the function’s scope) and $node (global scope) all point to this variable container.

     In step 5, When the  symbol table of the function is destroyed (in step 5), therefcount value decreases from 1 to 2.$node is now a reference to the third element
in the array.  If the variable$item   would not have been assigned by reference to the return value of the do_something() function, but instead would have been assigned by value, then$node would not have been
a reference to $tree[3]. In this case, therefcount value of the variable  container to which $tree[3] points is then 1 after the  function ends, but for some strange reason theis_refvalue
is not reset to 0 as you might expect. My tests did  not find any problems with this, though, in this simple  example. If the function do_something() would not  have been a “return-by-reference function”, then again  the $node
variable would not be a reference to  $tree[3]]. In this case, theis_ref  value of the variable ( $tree )container would have been reset to 0.

    In step 6,Finally, in step 6, we modify the value in the variable container to which both$node and $tree[3] point.

    Please do note that it is harmful not to accept a reference from a function that returns a reference. In some  cases, PHP will get confused and cause memory corruptions which are very hard to find and debug. It is
also not a good idea to return a static value as reference, as  the PHP engine has problems with that too. In PHP 4.3, both cases can lead to very hard to reproduce bugs and  crashes of PHP and the web server. In PHP 5, this works  all a little bit better.
Here you can expect a warning and  it will behave “properly”. Hopefully, a backported fix   for this problem makes it into a new minor version of   PHP 4—PHP 4.4.

7.The Global Keyword
PHP has a feature that allows the use of a global variable inside a function: you can make this connection  with the g gl lo ob ba al l keyword. This keyword will create a ref-
erence between the local variable and the global one.  Figure 9 shows this in an example.


       In step 1 and 2, we create the variable$varand call the functionupdate_var() with the string literal“one” 
as the sole parameter. At this point, we have two variable containers. The first one is pointed to from the  global variable $var, and the second one is the $val   the functionupdate_var() with the string
literal“one” as the sole parameter. At this point, we have two variable containers. The first one is pointed to from the global variable $var, and the second one is the 
$val  variable in the called function. The latter variable container has arefcount  value of 2, as both the variable on  the stack and the local variable$val  point to it.

     In step 3,The global   $var
statement, in the function, creates a new variable in the local scope, which is created as a reference to the variable with the same name in the  global scope. As you can see in step 3, this increases the refcount  of the
variable container from 1 to 2 and this  also sets the is_ref value to 1.

     In step 4, we unset the variable
$var .
Against some  people’s expectation, the global variable $vardoes not  get unset—as theunset() was done on a reference to  the global variable$varand not that variable itself.

     In step 5, To reestablish the reference, we employ the global   keyword, again in step 5. As you can see, we have re-created the same situation as in step 3. Instead of using global    $var we
could just as well have used $var    ==&$GLOBAL[ [‘var’] as it would have created the exact same situation.

    In step 6, we continue to reassign the
$var 
variable to  the function’s $val  argument. This changes the value to which both the global variable$var  and the local variable$var  point; this is what you would expect from a  referenced variable.
When the function ends,

    In step 7, the reference from the variable in the scope of the function disappears, and we end up with one variable container with arefcount  of 1 and anis_ref
value of 0.

8. (勿滥用引用)Abusing References

In this section, I will give a few examples that show you how references should not be used—in some cases these examples might even create memory corruptions
in PHP 4.3 and lower.

Example 1: “Returning static values by-reference”. In Figure 10, we have a very small script with a return-by-reference function called definition().

       This function  simply returns an array that contains some elements.  Returning by reference makes no sense here, as the exact same things would happen internally if the variable container holding the array was returned by value,  except that in the
intermediate step (step 3) the is_ref  value of the container would not be set to 1, of course.  In case the$defvariable in the function’s scope would  have been referenced by another variable, something  that might happen
in a class method where you do $def =  $this->def then the return-by-reference properties of the function would have copied the array, because this creates a similar situation as in step 2 of  Figure 5.

    Example 2: “Accepting references from a function   hat doesn’t return references”. This is potentially dan-gerous; PHP 4.3 (and lower) does not handle this properly. In Listing 1, you see an example of something that is not going to work properly.

<?php
	function &split_list($emails)
	{
	$emails =& preg_split(“/[,;]/”, $emails);
	return $emails;
	}

	$emails =
	split_list(‘derick@php.net;derick@derickrethans.nl;dr@ez.no’);

      This function was implemented with performance in mind, trying not to copy  variable containers by using references. As you should  know after reading this article, this is not going to buy  you anything. There are a few reasons why it doesn’t  work.
The first reason is that the PHP internal function   preg_split() does not return by reference—actually,  no internal function in PHP can return anything by reference. So, assigning the return value by reference  from a function that doesn’t
return a reference is pointless. The second reason why there is no performance  benefit, here, is the same one as in Example 1, in the  previous paragraph: you’re returning a static valuenot a reference to a variable—it does not make sense to  make thes
split_list()
function to return-by-reference.

 

9. 总结 Conclusion

After reading this article, I hope that you now fully  understand how references, refcounting, and variables  work in PHP. It should also have explained that assigning by reference does not always save you memory—it’s better to let the PHP engine handle
this optimization. Do not try to outsmart PHP yourself here and only  use references when they are really needed. In PHP 4.3, there are still some problems with references, for which patches are in the works. These patches are backports from PHP 5-specific
code, and  although they work fine, they will break binary compatibility—meaning that compiled extensions no longer  work after those patches are put into PHP. In my opinion, those hard to produce memory corruption errors  should be fixed in PHP 4 too, though,
so perhaps this creates the need for a PHP 4.4 release. If you’re having  problems, you can try to use the patch located at  http://files.derickrethans.nl/patches/ze1-return-refrence-20050429.diff.txt The
PHP Manual also has some information on references, although it does not explain the internals very  well. The URL for the section in PHP’s Manual is
http://php.net/language.references

抱歉!评论已关闭.