Today was created a new PECL package, which brings weak references to PHP.
Weak references had already an RFC asking for their support
in coming PHP 5.4, which was however not very perceived by PHP core developers.
As they suggested, such a new feature which doesn't need changes in Zend
Enginge (backend of PHP) should be first tried as an extension and later,
if really wanted by majority of users, may be included into PHP core.
Installation
If you want it now, you can just download the PECL package from SVN and
compile it. If you have compiled other extensions, it will take you less than
five minutes (if you haven't compiled anything yet, try reading official
manual).
svn co http://svn.php.net/repository/pecl/weakref/trunk/ weakref
cd weakref/
phpize && ./configure && make && sudo make install
Don't forget to load this extensions in php.ini (by adding
extension=weakref.so).
What are they good for?
Weak references have really nothing to do with references
as they are known in
PHP. Concept of weak references is however common in other
programming languages and taking inspiration
in Java, they have been added also to PHP.
Normally in PHP, if you store an object to a variable, Zend engine
knows it's being used and won't remove it from memory. Thus, you can access
that object at any time. Pretty simple and obvious. On the other side, when you
store a weak reference to an object to a variable, Zend engine
won't care about it much. As soon as such object is not used anywhere else, it
can be removed and memory freed. Thus, if you want to access weakly
referenced object, you have to make sure that it still exists beforehand.
This may well sound confusing. Why would someone use it and risk his object can
be removed from memory?
Weak references won't be much useful in small and simple scripts, but bigger
frameworks or libraries can benefit from them. They can save memory and/or CPU
time, especially in cases when data can be loaded from external source, like a
database.
Consider a very simple database layer, which would provide
access to products from a database. Such class can have only one method:
class ProductDatabase {
function getProduct($productId);
}
Based on given $productId, ProductDatabase will return an
object representing a product with that particular id. In a complex application,
this database can be accessed in many different parts of code and such app may
ask for the same object several times. Previously you had two options how to
implement such a database layer.
First and easier way is to load the object every time it's requested, which
may lead to weird behavior:
class ProductDatabase {
function getProduct($productId) {
return mysql_fetch_object(mysql_query("SELECT * FROM product WHERE id=$productId"));
}
}
$prodA = $prodDb->getProduct(1); $prodB = $prodDb->getProduct(1); var_dump($prodA === $prodB);
echo $prodA->price; $prodA->price += 10;
$prodA->save();
$prodB->price += 10;
$prodB->save();
echo $prodDb->getProduct(1)->price;
When you try to load the same object several times, more instances will be
actually created. This may lead to higher memory consumption and also to some
insidious bugs.
Second, more advanced approach, uses identity
map. After an object is loaded from the database, it'll be stored
in internal array under it's id. When second request is made for the same id,
existing object is returned.
class ProductDatabase {
private $identityMap;
function getProduct($productId) {
if(!isset($this->identityMap[$productId])) {
$this->identityMap[$productId] = mysql_fetch_object(mysql_query("SELECT * FROM product WHERE id=$productId"));
}
return $this->identityMap[$productId];
}
}
$prodA = $prodDb->getProduct(1); $prodB = $prodDb->getProduct(1); var_dump($prodA === $prodB);
This approach seems to be better. Not only returned objects are identical,
but it also queries the database only when necessary. However, every object is
kept in memory, which may cause troubles in cases where you work with millions
of products at the same time.
Here is the right time to invite weak references. Code will
be similar to previous example, but „real“ objects won't be stored in
identity map. We will store only weak references, which will disappear as soon
as they are not used anywhere else in the application.
class ProductDatabase {
private $identityMap;
function getProduct($productId) {
if(!isset($this->identityMap[$productId]) || !$this->identityMap[$productId]->valid()) {
$this->identityMap[$productId] = new WeakRef(mysql_fetch_object(mysql_query("SELECT * FROM product WHERE id=$productId")));
}
return $this->identityMap[$productId]->get();
}
}
$prodA = $prodDb->getProduct(1); $prodB = $prodDb->getProduct(1); var_dump($prodA === $prodB);
unset($prodA, $prodB); $prodA = $prodDb->getProduct(1);
As you can see in the example, with weak references we can benefit
from identity map and not create new instances when they already exist, and at
the same we avoid memory leaks. It's a clear win for weak
references!
It's not a cache…
Weak references won't solve all the problems themselves. In second example,
identity map stored real object instances and thus worked also as a
cache. When you asked for a product for the first time, it was loaded from
database and stored in an array. At any next time, the same object would still
be there ready for you to use it. It was optimal in the means of database
queries.
With weak references this doesn't hold anymore. As shown in the third
example, after all instances have been removed (variables $prodA and $prodB),
object is completely removed from the memory. Next time you will request product
with the same id, a database must be queried again. As you can see, weak
references do not work as a cache.
… so use it with a cache
Don't worry, it's not that bad. You can very easily combine weak
references and cache together to create optimal
solution. We will take benefits of fair memory consumption from weak
references and low number of database queries from cache.
class ProductDatabase {
const CACHE_SIZE = 100;
private $identityMap;
private $cache;
function getProduct($productId) {
if(isset($this->cache[$productId])) return $this->cache[$productId];
elseif(isset($this->identityMap[$productId]) && $this->identityMap[$productId]->valid()) {
return $this->identityMap[$productId]->get();
}
else {
if(count($this->cache) > self::CACHE_SIZE) $this->cache = array();
$obj = mysql_fetch_object(mysql_query("SELECT * FROM product WHERE id=$productId"));
$this->identityMap[$productId] = new WeakRef($obj); $this->cache[$productId] = $obj;
return $obj;
}
}
}
$prodA = $prodDb->getProduct(1); $prodB = $prodDb->getProduct(1); var_dump($prodA === $prodB);
unset($prodA, $prodB); $prodA = $prodDb->getProduct(1);
Caching algorithm is very simple, but it's fast and easy. When cache is not
full, it will store objects and not load them again from the database, even when
no other instances are kept in the rest of the application. And even if cache in
database layer is flushed, objects which are stored in variables in different
parts of system are still accessible via weak references.
It's a complete win for us!
Summary
Database layers like Zend Db or Doctrine 2 ORM can be easily
extended to benefit from this new PECL package and thus make memory consumption
and database queries much better, and our applications much faster. If you are
able to compile a package, you can start using it today.