Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Nikita Popov:
PHP's new hashtable implementation
Dec 26, 2014 @ 10:20:10

In his latest post Nikita Popov gives a detailed look at PHP's new hashtable implementation and what kinds of improvements it offers over the previous methods. The "hashtable" handling is how the language references array values created during the execution of a script.

About three years ago I wrote an article analyzing the memory usage of arrays in PHP 5. As part of the work on the upcoming PHP 7, large parts of the Zend Engine have been rewritten with a focus on smaller data structures requiring fewer allocations. In this article I will provide an overview of the new hashtable implementation and show why it is more efficient than the previous implementation.

He starts with an introduction to the concept of hashtables, describing them as "ordered dictionaries" of key/value pairs that (internally) reference values in an array. He looks at the old method PHP used to make these links and how the new version, with the help of zval handling, is different. He talks about how it handles the order of elements, does lookups and the introduction of "packed" and "empty" hashtables. He ends the post with a look at this new implementation's memory utilization and what kind of performance gains we can expect with its introduction in PHP7.

tagged: hashtable array implementation php7 performance memory lookup

Link: http://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html

Sherif Ramadan:
A Closer Look Into PHP Arrays: What You Don’t See
Oct 29, 2012 @ 11:43:33

In a new post Sherif Ramadan takes an in-depth look at PHP arrays and what happens behind the scenes when they're put to use.

PHP is one unique language where the array data type has been highly generalized to suit a very broad set of use cases. [...] I’m going to share with you some of the underlying details of how the PHP array data type works, why it works the way that it does, how it’s different from other languages, and what behaviors the PHP array has that you may not be fully aware of.

He starts with a section looking at what arrays actually are in PHP (and how they compare to the lower level C arrays). He gives a C-based array example and shows how it's stored in memory. He points out how PHP arrays are different from other languages and shows the C code that works behind the scenes to create the array (actually a hashtable). He gets into a detailed explanation of the iteration of arrays including some basic benchmarks of some of the various methods and gets more in-depth with foreach (including subarrays and arrays containing references).

tagged: array language c hashtable indepth variable


Nikita Popov's Blog:
Understanding PHP's internal array implementation (Part 4)
Mar 29, 2012 @ 09:16:02

Nikita Popov has posted the fourth part of the "PHP's Source Code for PHP Developers" series he and Anthony Ferrara have been posting. In this latest article in the series, Nikita looks specifically at PHP's array implementation and how it's handed "behind the scenes".

Welcome back to the fourth part of the "PHP's Source Code for PHP Developers" series, in which we’ll cover how PHP arrays are internally represented and used throughout the code base.

He starts with an obvious foundation: "everything's a hash table" (even properties, classes and yes, arrays). He describes what a hash table is and talks about two of the most commonly used versions of it in the PHP source - HashTable and Bucket. He gets into their usage a bit and compares this to the corresponding PHP code that uses a standard array.

tagged: source code developers language internal array hashtable bucket


PHP 5.3.10 Released (Security Fix - Recommended Upgrade)
Feb 03, 2012 @ 08:01:29

The PHP development team has officially announced the release of the latest version of PHP in the 5.3.x series - PHP 5.3.10:

The PHP development team would like to announce the immediate availability of PHP 5.3.10. This release delivers a critical security fix. [...] Fixed arbitrary remote code execution vulnerability reported by Stefan Esser, CVE-2012-0830.

It is highly recommended that users upgrade to this latest version to avoid falling victim to this recently introduced bug relating to the new "max_input_vars" setting added to protect from the overflow issue recently brought up in the PHP community.

tagged: release security fix maxinputvars hashtable collision dos vulnerability


Nikita Popov's Blog:
Supercolliding a PHP array
Dec 29, 2011 @ 12:15:30

In a new post to his blog Nikita Popov talks about a little trick with inserting values into arrays that can make it take a lot longer than it should (because of how PHP stores its array values in hashtables).

PHP internally uses hashtables to store arrays. The above creates a hashtable with 100% collisions (i.e. all keys will have the same hash). [...] Because every hash function has collisions this C array doesn't actually store the value we want, but a linked list of possible values. [...] Normally there will be only a small number of collisions, so in most cases the linked list will only have one value. But the [included script] creates a hash where all elements collide.

He explains why it works, noting that it's relatively simple to do in PHP because of how it applies a table mask. The slowness comes in when PHP is forced to go through the entire list when it tries to insert. Because of this issue, there's the potential for a Denial of Service attack that could potentially take a server down. There's a fix already in place for the problem, though, so keep an eye out for the next release (that will include a max_input_vars setting to prevent it).

tagged: collision array hashtable mask denialofservice overload


Johannes Schluter's Blog:
Aug 23, 2010 @ 08:58:43

Johannes Schluter has a new post to his blog on another PHP internals related topic - hashtables.

While preparing my "PHP Under The Hood" talk for the Dutch PHP Conference there was a question on IRC about extension_loaded() being faster than function_exists(), which might be strange as both of them are simple hash lookups and a hash lookup is said to be O(1). I started to write some slides for it but figured out that I won't have the time to go through it during that presentation, so I'm doing this now.

He talks about array storage (a "real" array), numeric and string-based keys, the internals of how each is stored and how the differences make the one function faster than the other (hint: it's all about collisions).

tagged: hashtable array storage variable functionexists extensonloaded