PhpStorm plugin development series

I started developing a custom language plugin for PhpStorm, the best IDE for PHP I've met so far. But since the documentation for plugin development is rather poor and I have to dig in the code a lot, I decided to share my hardly gained knowledge with others.

So first things you'll (I'll) need to start:

I'll be working on support for Latte templating language (part of Nette framework), you can follow my progres on GitHub.

Wish me luck, and see you next week.

Mock vs final fights in testing

There has always been a war between testers, who love mocking all classes, and library developers, who prevent misusing their code by means of final keyword. You cannot do both. Once a class (or method) is declared as final, it cannot be mocked (extended). Since testers need to test their applications built on top of 3rd party libraries, they always play second turn in this war. That's unlucky, since once a final is made, it cannot be defeated.

Or can it?

Reflection – extended

Here I present a mighty weapon for all the testers; or rather just a prototype for now. It gives you the power to remove existing final from classes/methods and thus make them mockable. Here is how to use it:

final class B { ... }
$refl = new ReflectionClass('B');
$refl->setFinal(false);
// follow the link above for more examples

Voilà, the class is not final anymore and you can mock it freely.

How to

patch for PHP is needed atm, as this is an experimental feature. The syntax is also experimental, please add your suggestions.

PS: This idea was presented by @JanTvrdik on Nette Brain Cloud meeting.

Extending Nette Debugger

Nette Debugger (or Laďenka by it's Czech beautiful name) is very helpful when solving problems, because it displays exceptions in a lovely way. Sometimes it's not enough though and you'd like to see even more, which is absolutely relevant and possible.

Exception details

You can register panels, which are just callbacks executed when an uncaught exception is thrown. Standard exceptions have just a message, which is often enough to understand the problem, but may not be in all cases. For example in my app, when an ForbiddenException is thrown, it contains not only an error message, but also an array or required privileges (which weren't met). These details are not displayed by default, which is however not a problem for us. We can register a panel to display details:

use Nette\Diagnostics\Debugger;
Debugger::$blueScreen->addPanel(function(\Exception $ex = null) {
        if(method_exists($ex, 'getDetails')) {
                return array(
                        'tab'   => 'Details',
                        'panel' => Debugger::dump($ex->getDetails(), true),
                );
        }
});

From now on, every-time an exception with method getDetails() is thrown, these details will be displayed before the call stack. So you know, which privileges weren't met, and when angry customer calls you a minute later, you can just adjust his profile for his happiness.

Environmental panels

In my app, I want also some more info about the environment where the exception occurred. I want details of the user logged-in to the app, config file which have been processed and probably also info about cluster node, which processed the request. These panels should be displayed in the bottom of bluescreen, in a similar way HTTP request a HTTP response are displayed. This can be achieved by returning bottom = true from the callback:

Debugger::$blueScreen->addPanel(function() use($context) {
  static $displayed = false;
  if($displayed) return; // show only once if exception has parents
  $displayed = true;

  return array(
    'tab'   => 'User details',
    'panel' => Debugger::dump($context->user, true),
  );
});

Summary

Make sure that you have all info you need to process an exception log when it happens; add your app specific panels to Nette Debugger.

Entity Select-boxes & Nette

Forms often contain a selection, e.g. when creating a user for in your app, you may need a role to be selected. If you use an ORM, you'd probably have IdentityEntity class in your app referring to an RoleEntity class. How will you create such a form? By adding plain select and converting Role Entity to Role Id back and forth?

// creating form
$roles = $roleRepository->findAll()->fetchPairs('id', 'name');
$frm->addSelect('role', 'Role', $roles);

// populating form
$frm['role']->setValue($user->role->getId());

// processing form
$user->role = $roleRepository->find($frm['role']->getValue());

It doesn't look that nice :( I think it will look much nicer if we can have select-box for entities. Imagine:

// creating form
$frm->addEntitySelect('role', 'Role', $roleRepository); // providing repository here

// populating form
$frm['role']->setValue($user->role); // setting an entity

// processing form
$user->role = $frm['role']->getValue(); // getting entities back

It looks cleaner, doesn't it? We can also create an EntityMultiSelectBox (similar to classical MultiSelectBox). And it works great with data-binding, so that I don't need to write any boilerplate code when populating or saving the form:

$frm->addEntitySelect('role', 'Role',
   $this->getRepository(RoleEntity::getClassName()), 'name')
  ->setRequired()->bind('role');

Filling the form from entity and vice-versa is done automatically by data-binding.

Sample implementation in a gist.

Article::getClassName() or Article::className

I miss a feature in PHP, which would allow me to reference classes easily. For example in Java, if you have a class cz.juzna.abc.Acticle and you want to give it to a variable/method, you can reference it by cz.juzna.abc.Acticle.class.

In PHP, this can be done only by a string with class' name ("cz\juzna\abc\Article"), which is however a string and not a reference. For example IDE won't see it as usage of a class (it won't autocomplete, won't find it in usages, …). This makes it very easy to make errors or typos in such definitions, or to break the system while refactoring.

It may be difficult to extend PHP itself, but we can add this feature to Nette\Object:

abstract class Object {
  ...
  static function getClassName() {
    return get_called_class();
  }
}

With this, I can use cz\juzna\abc\Article::getClassName(), which is a piece of code; it's „alive“ and will fail if mistaken.

Advantages

  • No need for full namespaces, if you have use-statement for the class
  • Auto-completion in IDE; you start typing Rol and it offers RoleEntity for you
  • Go to definition or Ctrl+click works in IDE
  • Warnings in IDE if the class is misspelled
  • Found in „find usages“
  • Can be refactored
  • etc.

Summary: it's a live class now, instead of a dead string.

Patching PHP

This feature would make more sense if available in vanilla PHP, for every class and not only those extending Nette\Object. I experimented with PHP's source code and created a small patch which adds magic constant MyClass::className to every class. Available for PHP 5.3.8 here. It's not stable for production, rather as a proof of concept for developers to test, whether it's worth to have it.

I'm also experimenting with a magic constant MyClass::class which would contain MyClass' reflection object (similar to what we have in Java), but there are still some questions about this to be answered. Mainly, that such a constant must be in language core, while PHP's reflection is an extension (and core depending on an extension is not a very good idea). Any suggestions are welcome!

Experience

I added this to my Nette fork and have been using this for a month now, and it's so useful to have it in every class, with IDE's hints as well. Especially if you work with ORM like Doctrine.

If you're using ORM and if you type class names as strings often, try it for a while and you'll see ;)

Note: I also wrote a post on Nette forum regarding this issue.

Data binding in Nette Forms

form in your app sometimes match exactly to fields of one entity, but it's not always the case. This data-binding should be more general, e.g. one form can provide editing of a hierarchy of entities at once. And it should be bi-directional, pulling data from model and also storing it back.

Usually you need to write lot's of boilerplate code, one chunk which populates the form fields and other chunk which reads user's values and stores them back to entity's properties. In Java course at my University, we were shown how data-binding works there using ExperssionEngine, which inspired me. Something similar can be added to Nette Forms easily, in three steps:

1/ binding – I slightly extended Nette by adding one property to each form control. You can see the change in my fork, but it's not in official release yet. To use it, just bind the controls to model's properties like in this example:

$frm->addText('name', 'Name')->setRequired()->bind('name');
$frm->addText('username', "Username")->setRequired()->bind('credentials[0].username');

It doesn't do anything by itself, only each control knows what it should be mapped to or from. And as you can see, you can specify non-trivial bindings like credentials[0].username.

2/ populating the form is done in a separate component, e.g. in the form itself (see this example of EntityForm class). You just $frm->bind($entity); and the form loads all it needs from the entity.

3/ populating the entity is easy again, e.g. by $frm->populateEntity($entity); which just takes all values in form and stores them into existing entity.

How to create new models?

After a form for adding new user has been submitted, you can create an empty model and get it populated from the form:

// Create empty model
$user = new IdentityEntity;
$user->addCredentials(new PasswordEntity($user));

// Populate it
$frm->populateEntity($user);

Editing existing data

… is very easy, all you need is just to load existing model from repository and bind it to the form.

Work in progress

So far only a raw idea, I'm working on this. Comments are welcome…

Abstract properties

I hope you know you can declare classes and methods as abstract in most programming languages (including PHP) and if you write high quality object oriented code, I guess you use them quiet often. There is no discussion if they're good or not (at least I hope; and I'd like to hear your opinions).

But what about abstract properties, do you use them? And do you use them in PHP? Are they good or bad habit? We use them in my company pretty often, in similar way as Zend Framework does. But we still miss official support for them from PHP's side.

Little bit of history

Many years ago, abstract methods and classes we're no supported in PHP, but people used them anyway. They had just no means to enforce them in PHP itself, so they started adding comments or annotations saying, that a method is abstract. They just knew how to use such classes, they knew they couldn't shouldn't instantiate them, and everything was fine. Almost.

From PHP's point of view, there were not abstract classes and so it let you instantiate them, which could led to various problems. It was time for PHP 5 to come and show it's great potential.

Note: have you noticed we've been using annotations for ages even though PHP doesn't support them natively either? ;)

Native abstract classes and methods

PHP 5 brought native support for abstract classes and abstract methods. It is able to enforce them and shows you meaningful error messages during compilation time (compiling to bytecode). It stops errors from being created or from being found too late. And it's not the only advantage – IDE's now know the abstract keyword as well and can generate scaffolding code for you. Everyone benefits from native support.

Why not properties?

Abstract properties are not natively supported by PHP, I guess because nobody saw a good reason for them. And also, because many people consider them a bad solution for badly designed application. Everyone tells me to use abstract methods, because they make clean object oriented design.

Let me introduce you my problem and possible solutions.

Example: Renderers

Renderer design pattern consists of classes, which render objects (into strings). Each renderer class can render one or more domain classes, e.g. MovieNameRenderer can take a Movie object and give you it's name, or MovieGenreRenderer can convert a constant into human readable genre of specified movie. With such renderers, you can have generic ListView widget which takes a DataSource and renders each object using specified Renderer.

interface IRenderer {
  /** @return string */
  function render($obj);
}

When you want to have many simple renderers which just take a property of given object, you have basically two options:

abstract class PropertyRenderer implements IRenderer {
  protected $propertyName; // abstract, should be defined in descendants

  function render($obj) {
    return $obj->{$this->propertyName};
  }
}

// Definition of particular renderer becomes easy
class MovieNameRenderer extends PropertyRenderer {
  protected $propertyName = 'name';
}

This is however not a clean solution, as PHP won't force you to define $propertyName. You can solve this by using abstract methods instead:

abstract class PropertyRenderer implements IRenderer {
  abstract function getPropertyName();

  function render($obj) {
    return $obj->{$this->getPropertyName()};
  }
}

// It becomes more complicated and less efficient
class MovieNameRenderer extends PropertyRenderer {
  function getPropertyName() {
    return 'name';
  }
}

(Check all the examples in Gist with more relevant methods)

It becomes even more complicated and even much less efficient with label renderers. Compare those two implementations from Gist: one using abstract properties or abstract methods. If I look at the one which uses properties, it's shorter, seems more clear to me and it's much faster. Every function call brings some overhead, which in my test case made the second implementation cca 75% slower.

So, there are some good reasons to use abstract properties (either natively supported or not).

Example 2 – Zend Db

For another example, have a look at Zend Db, where you define table name using a property $_name (here it's not strictly abstract, because you can omit it, but it could and probably should be abstract). Class Zend_Db_Table_Abstract wants all it descendants to define such property, so it should be abstract and PHP would be able to report compile errors.

Implementation

Because I'm trying to learn how PHP works internally, I decided to add abstract properties to world. It showed up to be pretty easy, you can use the patch against trunk (but works also against 5.3) and I wrote also some tests. It won't allow you anything new, but just enforce what you already do and thus make it cleaner. Maybe we will see it in PHP 5.5?

Conclusion

I think I need abstract properties to make my application clean and efficient at the same time. I can't see any reason why abstract properties would be bad. I would love them in official PHP. What do you think?

Next time: native annotations? ;)

Optimizing class cache in PHP

Some time ago I read an article about hacking PHP internals to improve its performance. I liked the idea of avoiding unnecessary syscalls, because context switching is considered to be quiet expensive operation. And I realized there is quiet significant amount of work which needs be performed with class autoloading. So I decided to hack it and make it little bit better.

Classes and autoloading

I need to start with some theory and explain how PHP works with classes. I hope you all already know that PHP compiles source code into bytecode, which is then executed, one opcode at a time (please read something more if you're not familiar with this).

Whenever you want to create new instance of a class (e.g. new MyClass();), at least three opcodes are generated:

0  >   ZEND_FETCH_CLASS            :0      'MyClass'
1      NEW                                 :0
2      DO_FCALL_BY_NAME         0

(you can use Vulcan Logic Disassembler to explore how PHP code is translated to bytecode)

First one the these three opcodes is responsible for finding class definition, second one allocates memory for new object and prepares execution of constructor and finally the last one executes the constructor (if present; or is skipped by Zend Engine if constructor is not present). The later two are pretty simple and don't do much work in general, however we can't say the same about the first opcode. If we're lucky and the class has been already defined ZEND_FETCH_CLASS just finds this definition in a HashTable. However when we use a class for the first time within one request, Zend Engine doesn't know about its definition yet and has to find it, which fires up kind of heavy autoloading.

It is important to realize that autoloading happens in every single request and PHP is doing all the work from scratch (something can be avoided when we use a cache, I'll cover it later in this post).

Autoloading in big web apps

If you happen to have a big PHP application, you usually end up with more places where your classes can be stored. In my company, get_include_path contains sometimes more than 7 locations, and I guess it can be even more. Standard autoloader will try to search for requested class in all locations, which means up to 7 syscalls in my case just to test if given file exists. Zend Autoloader behaves in a similar way.

When the file containing requested class if found, it is include()'d. It means PHP will parse it and execute its content, which will also tell PHP that there is new class available. In more complex cases and class hierarchies, this whole procedure will be called again for each extended class.

The whole process of fetching class may end up pretty long:

  • ZEND_FETCH_CLASS (zend_vm_def.h) is the opcode being executed, and it calls
  • zend_fetch_class_by_name (zend_execute_API.c) which calls
  • zend_hash_quick_find – SUCCESS when already exists, otherwise calls
  • autoload which executes one or more PHP functions for autoloading, they call
  • file_exists to test each possible filename, each means a syscall and possibly hard drive access, then
  • include which calls
  • open syscall to get handle of included file, then
  • compile_file + read syscalls and
  • zend_execute.
  • probably again if it needs parent class…

Pretty much work to do just to instantiate a class, huh?

Caching and APC

„Haha, I'm not a fool.“ you probably say. I guess you use APC to cache bytecode among multiple requests and thus save all the expensive compiling. But anyway, do you know how much you save by that? APC only caches bytecode of entire files, which means you will avoid only compile_file function. Everything else, including many access syscalls and even open syscall, is executed even with APC.

Caching just classes

In many PHP applications, class names are unique. Why can't we make a cache based on class names? We would avoid all the work from autoloading, via all syscalls, to execution of containing file. Anyway, we wanted just the class itself and not to execute the file where it is defined (in most cases).

I asked myself this question and came to conclusion: „Actually, we can!“. We can update Zend Engine to consult external function when searching for a class, and a PHP module can cache them. To be honest, I was probably not the first one with similar idea. I based my work on patch from shire and extended APC little bit further.

I started with shire's patches and added simple modification: APC can install (understand as copy them from shared memory to thread's private memory) classes from it's cache when asked. It means once you have loaded a class, it can be fetched directly from cache without any other work. So even autoloader is not called for second time, if a class is found in cache.

Example

Here is a simple code which may not make much sense in general, but should be good to illustrate how such cache works.

// Custom autoloader
spl_autoload_register(function($className) {
  echo "Autoloading $className\n";
  include "$className.php";
});

// More path with libraries
set_include_path('/www/lib1:/www/lib2:/www/lib3:' . get_include_path());

file_exists('/tmp/start'); // helps tracking syscalls
new MyClass1();
new MyClass2();
...
file_exists('/tmp/end');

When you execute it for the first time, you will see echo for each class being used. And not only echo, there are many syscalls as you can see in strace log.

But when executed for the second time, all classes seem to exist, as they're immediately copied from cache. Autoload is never called anymore and there are no syscalls within execution of the script. Have a look at second strace log and try to find accesses to /tmp/start and /tmp/end which tell where the script itself begun and ended.

Pitfalls

One may say that such cache cannot be invalidated automatically when you change a file. And he is right. But it doesn't have to be such a problem with bigger applications, which you update manually once a week or even less. Once you have pushed all files to production server, you can manually clear all the cache. After that, all classes will get cached and PHP won't need to access your hard drive at all whenever they're needed. That said, this can be a good optimization for production server, but has no reason in development.

Evaluation

I haven't tested it yet in production, so I can't tell how much time it actually saves. Also, my patches are in alpha stage and will need improvements. My point was just to try how much work can we avoid in autoloading.

You can get PHP patch from GitHub and APC patch for now from my web

Stealing Ruby’s yield for PHP

I tried to learn Ruby once because it's so popular and cool these days. But I must admit, I didn't like it much. You may think I'm kinda oldschool, but having quotation or exclamation marks in method names is just not for me. And letting anyone to modify methods in previously defined class is the kind of thing I will probably never open my mind to. Result: thanks, no! Ruby is just not the language for me.

Anyway I don't want to throw all the Ruby stuff away. There are also some very nice features I like, and which I often miss in PHP. So in my diving experiment I decided to swim into Ruby's world and steal some of their beauty for my underwater kingdom experimental PHP fork.

PHP iterators

Ruby has a different approach to iterators than PHP has. In our (PHP) world we can create iterators by means of special types of classes. These implement either Iterator or IteratorAggregate interface, which allows us to create powerful iterators by implementing „just“ a five methods. This can be pretty complicated and not even needed for most iterators we would like to create, so people tend to use imperative approaches like

while($row = mysql_fetch_assoc($result)) { ... }

These may just not to seem kosher to everyone, especially for me. It doesn't really describes what we want to do.

Here it comes IteratorAggregate to help us creating new iterators, where we need to implement only one method and it's done. One would usually create a big array (if not yet exists somewhere in application) and pass it to ArrayIterator. This may result in easier code, but big memory consumption.

When loading data from database, you would probably end up with something like this:

// Update relations between file entries in DB
$relatedState = '...';
$filesTable = Db_Manager::getTable('files');
foreach($filesTable->fetchAll() as $file) {
  $fp = fopen($file->path, 'r');
  $relatedFiles = $filesTable->findRelatedFileTo($file, $relatedStyle)->fetchAll();
  foreach($relatedFiles as $relatedFile) {
    // do something with related file
  }
  ...
  fclose($fp);
}

Ruby iterators

Ruby has a different approach to iterators by using yield statement. It's much easier in comparison with PHP. You don't need to create classes and implement many methods, just a single function is enough. If such generator function has next iteration value ready, it uses yield statement to run associated block once, and then execution comes back to the generator. Generator can than focus on generating another value and yielding again (examples can be found in Ruby's documentation or on Google).

Similar approach can be used in PHP nowadays. Database access similar to that one in previous paragraph would probably look like this:

$relatedState = '...';
$filesTable = Db_Manager::getTable('files');
$filesTable->iterate(function($file) use ($filesTable, $relatedState) {
  $fp = fopen($file->path, 'r');
  $filesTable->findRelatedFileTo($file, $relatedStyle)->iterate(function($relatedFile) use ($file, $filesTable) {
    // do something with related files
  });
  ...
  fclose($fp);
});

This is possible to implement easily in PHP by callbacks (example implementation below). I wanted to make it even easier, as it is in Ruby. And it showed up not to be so difficult with my experimental code blocks. In cases with more local variables, these are shared between a code block and the function such block is defined within (i.e. in the same way a normal block after foreach behaves).

class Db_Table {
  // Executes callback for each row in DB
  function iterate(Closure $callback) {
    $result = $this->db->query(...);
    while($row = mysql_fetch_object($result)) $callback->__invoke($row);
  }
  // more table methods would be here
}

Cleaning syntax

I tried to take the idea of yield in Ruby and add its support to PHP language. We can rewrite PHP code from last paragraph to take this benefit and make it shorter and much easier to read and understand.

class Db_Table {
  // Executes callback for each row in DB
  function iterate() { // no arguments here
    $result = $this->db->query(...);
    while($row = mysql_fetch_object($result)) yield $row; // yield to associated code block
  }
}

$relatedState = '...';
$filesTable = Db_Manager::getTable('files');
$filesTable->iterate() { |$file| // this code block will be used by *yield*; it receives one argument named $file
  $fp = fopen($file->path, 'r');
  $filesTable->findRelatedFileTo($file, $relatedStyle)->iterate() { |$relatedFile|
    // do something with related files
  });
  ...
  fclose($fp);
});

This may be a little shock for you, as the syntax won't be familiar to you at all. But it's very easy actually. Whenever you call a function, you can associate a block of code with it in the same way you would do it with a foreach cycle or if condition. Such function can then yield, which will run the block. Zero, one or more arguments can be passed with yield and they will be received in relevant block. Syntax is very similar to Ruby's one.

More fun with yield

You can have much more fun with yield. The easier start would be searching how Ruby programmers use it. Basic examples to start can be generating Fibonacci Sequence

function genFibonacci($limit) {
  yield 1, 1; // value and position
  yield 1, 2;
  $a = $b = 1;
  $i = 2;
  while($i < $limit) {
    $tmp = $a;
    $a = $a + $b;
    $b = $tmp;
    yield $a, ++$i; // value and position
  }
}

// Generate 10 values
genFibonacci(10) { |$x, $pos| echo "Fibonacci number: $x at position $pos\n"; }

Summary

Actually, this is not something what was impossible before, just a nicer way inspired by Ruby and Python. It's highly experimental, created mainly for fun, and can be found in my PHP fork on GitHub. Comments are welcome as always.

Weak references in PHP

Today was created a new PECL package, which brings weak references to PHP. Weak references had already an RFC asking for their support in coming PHP 5.4, which was however not very perceived by PHP core developers. As they suggested, such a new feature which doesn't need changes in Zend Enginge (backend of PHP) should be first tried as an extension and later, if really wanted by majority of users, may be included into PHP core.

Installation

If you want it now, you can just download the PECL package from SVN and compile it. If you have compiled other extensions, it will take you less than five minutes (if you haven't compiled anything yet, try reading official manual).

svn co http://svn.php.net/repository/pecl/weakref/trunk/ weakref
cd weakref/
phpize && ./configure && make && sudo make install

Don't forget to load this extensions in php.ini (by adding extension=weakref.so).

What are they good for?

Weak references have really nothing to do with references as they are known in PHP. Concept of weak references is however common in other programming languages and taking inspiration in Java, they have been added also to PHP.

Normally in PHP, if you store an object to a variable, Zend engine knows it's being used and won't remove it from memory. Thus, you can access that object at any time. Pretty simple and obvious. On the other side, when you store a weak reference to an object to a variable, Zend engine won't care about it much. As soon as such object is not used anywhere else, it can be removed and memory freed. Thus, if you want to access weakly referenced object, you have to make sure that it still exists beforehand. This may well sound confusing. Why would someone use it and risk his object can be removed from memory?

Weak references won't be much useful in small and simple scripts, but bigger frameworks or libraries can benefit from them. They can save memory and/or CPU time, especially in cases when data can be loaded from external source, like a database.

Consider a very simple database layer, which would provide access to products from a database. Such class can have only one method:

class ProductDatabase {
    function getProduct($productId);
}

Based on given $productId, ProductDatabase will return an object representing a product with that particular id. In a complex application, this database can be accessed in many different parts of code and such app may ask for the same object several times. Previously you had two options how to implement such a database layer.

First and easier way is to load the object every time it's requested, which may lead to weird behavior:

class ProductDatabase {
  function getProduct($productId) {
    return mysql_fetch_object(mysql_query("SELECT * FROM product WHERE id=$productId"));
  }
}
$prodA = $prodDb->getProduct(1); // queries DB and creates object
$prodB = $prodDb->getProduct(1); // queries DB and creates object
var_dump($prodA === $prodB); // false

echo $prodA->price; // 60 (assume 60 is actually stored in db)
$prodA->price += 10;
$prodA->save();

$prodB->price += 10;
$prodB->save();

echo $prodDb->getProduct(1)->price; // 70 (expected 80?)

When you try to load the same object several times, more instances will be actually created. This may lead to higher memory consumption and also to some insidious bugs.

Second, more advanced approach, uses identity map. After an object is loaded from the database, it'll be stored in internal array under it's id. When second request is made for the same id, existing object is returned.

class ProductDatabase {
  private $identityMap;
  function getProduct($productId) {
    // Load object from DB if not yet loaded
    if(!isset($this->identityMap[$productId])) {
      $this->identityMap[$productId] = mysql_fetch_object(mysql_query("SELECT * FROM product WHERE id=$productId"));
    }
    return $this->identityMap[$productId];
  }
}
$prodA = $prodDb->getProduct(1); // queries DB and creates object
$prodB = $prodDb->getProduct(1); // object taken from identity map
var_dump($prodA === $prodB); // true

This approach seems to be better. Not only returned objects are identical, but it also queries the database only when necessary. However, every object is kept in memory, which may cause troubles in cases where you work with millions of products at the same time.

Here is the right time to invite weak references. Code will be similar to previous example, but „real“ objects won't be stored in identity map. We will store only weak references, which will disappear as soon as they are not used anywhere else in the application.

class ProductDatabase {
  private $identityMap;
  function getProduct($productId) {
    // Load object from DB if not yet loaded, or if is not valid anymore
    if(!isset($this->identityMap[$productId]) || !$this->identityMap[$productId]->valid()) {
      $this->identityMap[$productId] = new WeakRef(mysql_fetch_object(mysql_query("SELECT * FROM product WHERE id=$productId")));
    }
    return $this->identityMap[$productId]->get();
  }
}
$prodA = $prodDb->getProduct(1); // queries DB and creates object
$prodB = $prodDb->getProduct(1); // object taken from identity map
var_dump($prodA === $prodB); // true

unset($prodA, $prodB); // removes all instances of product and weak reference becomes invalid
$prodA = $prodDb->getProduct(1); // queries DB and creates object

As you can see in the example, with weak references we can benefit from identity map and not create new instances when they already exist, and at the same we avoid memory leaks. It's a clear win for weak references!

It's not a cache…

Weak references won't solve all the problems themselves. In second example, identity map stored real object instances and thus worked also as a cache. When you asked for a product for the first time, it was loaded from database and stored in an array. At any next time, the same object would still be there ready for you to use it. It was optimal in the means of database queries.

With weak references this doesn't hold anymore. As shown in the third example, after all instances have been removed (variables $prodA and $prodB), object is completely removed from the memory. Next time you will request product with the same id, a database must be queried again. As you can see, weak references do not work as a cache.

… so use it with a cache

Don't worry, it's not that bad. You can very easily combine weak references and cache together to create optimal solution. We will take benefits of fair memory consumption from weak references and low number of database queries from cache.

class ProductDatabase {
  const CACHE_SIZE = 100;
  private $identityMap;
  private $cache;
  function getProduct($productId) {
    // if in cache, it's easy
    if(isset($this->cache[$productId])) return $this->cache[$productId];

    // or if a weak reference is valid, use it
    elseif(isset($this->identityMap[$productId]) && $this->identityMap[$productId]->valid()) {
      return $this->identityMap[$productId]->get();
    }

    // we must do the hard work
    else {
      // clear cache if full; not optimal but easy and fast
      if(count($this->cache) > self::CACHE_SIZE) $this->cache = array();

      $obj = mysql_fetch_object(mysql_query("SELECT * FROM product WHERE id=$productId"));
      $this->identityMap[$productId] = new WeakRef($obj); // store weak reference
      $this->cache[$productId] = $obj; // store in cache

      return $obj;
    }
  }
}
$prodA = $prodDb->getProduct(1); // queries DB and creates object
// code with loads thousands of products here...
$prodB = $prodDb->getProduct(1); // cache has been flushed, but weak reference is still valid
var_dump($prodA === $prodB); // true

unset($prodA, $prodB); // removes instances, but still in cache
$prodA = $prodDb->getProduct(1); // taken from cache

Caching algorithm is very simple, but it's fast and easy. When cache is not full, it will store objects and not load them again from the database, even when no other instances are kept in the rest of the application. And even if cache in database layer is flushed, objects which are stored in variables in different parts of system are still accessible via weak references. It's a complete win for us!

Summary

Database layers like Zend Db or Doctrine 2 ORM can be easily extended to benefit from this new PECL package and thus make memory consumption and database queries much better, and our applications much faster. If you are able to compile a package, you can start using it today.