Monday, December 12, 2011

Node.js and process.nextTick - why you don't use it

Lately I have been messing with a new tool in my hypothetical toolbox - node.js.  Node.js is a platform for developing applications on the server using javascript (based on the V8 javascript engine used in Google Chrome).  The paradigm prevalent in node is event driven programming.  Node is designed that a node process runs inside a single thread, and all of the IO calls (network, file access, database, etc) are asynchronous.  Most of this is done "under the hood."  One thing that node does NOT do, however, is run your code in parallel (like other thread-ready languages, such as python).  This has the benefit of freeing you from worrying about shared resources (read: memory), but the drawback of CPU intensive processes blocking the current execution.

Why is this important?  Well one of the mantras of node.js is to make sure you don't write blocking code.  Some developers hear this and try to find a way to make CPU intensive processes run asynchronously.  Browsing the documentation, they come across a method on the process object called "nextTick", which you can pass a callback to.  The documentation says that this pushes the execution of that method to the next loop around the event loop.  This often gets interpreted as "runs the function in parallel."  False.  It simply defers the execution of that method until the current execution is finished (read:  finished blocking the CPU).  This means that if you have some really CPU intensive code, don't attempt to use process.nextTick to prevent it from blocking requests.  There are some ways to mitigate this, such as spawning a new node process (not terribly efficient, but gets the job done).

One important thing to note, is that in control flow libraries like async.js, there are some misleading method names.  For example, async.js has a method called "parallel."  This is very misleading, because at its core, it uses process.nextTick.  Parallel is really used to coordinate the execution of several methods that run asynchronous IO calls.  The code, however, is always single threaded (although you cannot guarantee the order in which they run).  But if you are trying to parallel simple CPU heavy code blocks using this library, well, you are out of luck.

So, to sum this all up, if you are using process.nextTick, there is a better than good chance you are looking at your project the wrong way.  Remember that while IO is asynchronous, node.js code is not.  Even though it's typically very, very fast :)  I'm quickly growing to love this platform for development, it scales very well.  Nginx uses a very similar structure to produce an extremely resource efficient, highly scalable server.

Happy coding.

Monday, June 27, 2011

PHP class x has no unserializer

I apologize for not having posted in quite a while, but I have been quite busy lately.  Writing lots of new code, developing a few new systems, trying to make everyone's lives better.  And, in doing that, I would like to post a rare problem on here, because this is one of the few problems I've had where Google was no help whatsoever.  Here goes:


Where I work, we use memcache to help make things a little bit faster.  For those who don't know, memcache is a daemon that basically accepts primitive data types (except anything equating to false) and keeps them in memory until they are needed again (written in C, very fast).  The PHP extension for memcache takes the data, serializes it (turns it into a string representation of whatever object you have given it), and stores it in memory (this post can also be my argument against using serialization wherever possible, especially for the lack of interoperability with other programming languages).  Recently, we have been getting an intermittent error when retrieving items out of the cache:

Warning:  Class Collection has no unserializer (Cache line 73)


Which a quick Google search returns almost nothing useful for.  A good friend and colleague of mine and I took to the memcache source code, and finally the PHP source code to try to find a solution.  There was certainly nothing obvious, except that the error is triggered when the object is unserialized (duh).  Also, we could not reproduce this error in our development environment (only in production).  But the only things we did know, was that it only happened on our 'Collection' class, and that it only happened in specific systems.  My incredibly keen colleague figured out that these systems were all part of the same zend cluster, which was one of our last clusters still on PHP 5.2 (we are in the process of upgrading).  I should also point out that our Collection class extends 'ArrayObject', a built in SPL class that is great for treating arrays as objects.  Well, as luck would have it, ArrayObject implements an interface in 5.3 that is not available in 5.2, called 'Serializeable'.  This interface allows customization of the serialize and unserialize functions.  The warning we were receiving occurred when a collection would be cached from a script running on a 5.3 server, and retrieved from a script running on a 5.2 server.  It seems like this should have been a fatal error, but apparently it doesn't because of the way the PHP source registers the custom serializations.

Anyways, the simple answer is to create a new memcache pool for our machines still on 5.2 separate from the 5.3 boxes.  The long term solution is finish the upgrade :)

Lesson here, be careful for problems that don't seem like they could stem from a version upgrade.  Also, always be wary of using serialize / unserialize for transporting data.  If you ever wanted to use a different language to parse the data, you would be pretty much SOL.  I recommend json_encode/decode wherever possible, because then you start thinking in the mindset of data, as opposed to specific language constructs.

Happy debugging