Recently I was dealing with precessing and storing large arrays in PHP (around 100 000 items) and I found out some quiet surprising facts that are very useful in performance critical applications.
Maybe you just want to see the results.
When I started looking for some benchmark I found article Cache a large array: JSON, serialize or var_export?. That is really good but I wanted to compare a few more things (eg. how type of stored data influences save and load times), that's why I decided to extend the article above a little bit.
What's this benchmark all about
In this test I'll compare save time, load time and serialized size of 4 different types of arrays, each array in five sizes. Just to make it clear:
Tested array types:
- Type #1: string => string (eg.
array('helloabcde' => 'blablablab')).
Key value pairs, where both key and value is 10 characters long string.
- Type #2: string => int (eg.
array('hello' => 42))
Key value pairs, where key is a 10 characters long string and value is a number form 0 to 100.
- Type #3: int => string (eg.
array(5 => 'helloabcde'))
Key value pairs, where key is a number form 0 to 100 and value is a 10 characters long string.
- Type #4: int => int (eg.
array(5 => 42))
Key value pairs, where key is a number form 0 to 100 and value is a number form 0 to 100 as well.
Array sizes: 10, 100, 1 000, 10 000 items.
I ran total 5 testes for 3 functions and all values here are their averages.
Because 5 array sizes mean 5 charts per each array type, this article would be too long and hard read. That's why I'm putting here just results for 10, 1 000 and 100 000 items. If you want to see results for all array sizes check this XLSX with all results and charts.
Type #1: string => string
Type #2: string => int
Type #3: int => string
Type #4: int => int
There's one general rule. Always try to use integers for keys and values where it's possible (use type casting like
(int)) and make sure numbers are stored as integers, not as strings. I know, usually you can't avoid strings but it's better to keep this rule in mind during development process than rewriting half of your code.
For smaller arrays up to around 1000 items it's usually better to use
serialize. In some cases it depends on which action you need to perform more often. (eg. for
int => int array saving in JSON is much faster than loading). I would recommend you to run some benchmark and see which method is better for your particular case.
For large arrays (let's say 100 000 and more) it's always better to use
JSON. Loading arrays with
unserialize tends to be extremely slow.
Of course with
serialize you can convert to string also objects (which is not possible with
JSON) but in performance critical applications it's good to be very careful and rather check twice that it's really necessary to use objects.