After the horrible experience with Avro, I considered using Protocol Buffer and Thrift for the company. Protocol Buffer's strongest point is that it is stable (not much has changed in the past few years). It is used in every single possible service in Google, it has gone through a very stringent code-review process, it has been written by the world's most seasoned and anal engineers, and thus has been well battle tested. However, I consciously passed over the opportunity to suggest Protocol Buffer to use for the company partly because I'm considered a bias party, and to suggest it will simply reinforce the idea that "Kevin is a Googler so he's obviously biased. He thinks everything coming out of Google is amazing." To be fair, I really think that Google cranks out shit end-user products most of the time (Wave, Buzz, G+, Location, Google Base, Android, etc etc...). Sometimes Google happens to make good end-user products only because Google throws a billion darts in the dark and occasionally one of the darts hits the bullseye. That's all.
I tested Thrift, and it is acceptable. In terms of feature, it is very similar to Protocol Buffer. The first thing I tested was message backward and forward compatibility. There was no problem in either case. Whereas Avro returns an error saying that message format is different, Thrift server gracefully (and correctly) disregards new message types or ignores old messages.
In Java Thrift, you can set your Thrift objects using getters and setters, which is great because if the message type changes (name or type), the Java compiler will give you an error immediately. In Java Python, you can also set your Thrift objects using the constructor and the runtime system will catch name errors. In contrast, Avro does not do any of this, so your program will just run along happily even though you're setting my_integer="Not an integer" and somewhere down the line your program crashes and you're scratching your head.
One last thing I love about Thrift: there is an asynchronous transport!!! This is exactly what powers AdSense, and allows people to easily prototype distributed computation architectures.
http://blog.rapleaf.com/dev/2010/06/23/fully-async-thrift-client-in-java/
There are a few Thrift "bugs" that should be fixed. For example, suppose you set the following as message definition:
2: string lastname = "last_default",
7: string lastname = "HO",
...
The above should signal a compiler error (e.g. "Same type name not allowed."). There are many other errors that should have signaled an error, but are not. I guess either they are too busy, too lazy, or just expect the compiler (either C or Java) to catch the error.
One other minor difference between Protocol Buffer and Thrift: In Thrift, there is no deprecation keyword. In Protocol Buffer, deprecation field compiles into Java, and the compiler will tell you the field is deprecated to allow programmers to update. It's not a big deal, but it may be a big deal for companies that keep updating contracts between two services.
In the end, my take on Avro vs. Thrift is like this. Avro is like Microsoft Zune. Zune has all the bells and whistles-- AM radio, recorder, more buttons, higher display resolution, external HD, blah blah blah. The iPod on the other hand, just does one thing. On paper, Zune is superior over iPod. On paper, Avro is superior over Thrift. But in the end, Avro just doesn't work well (no forward/backward compatibility, buggy buggy buggy and the developers don't even respond to my bug report). What looks good on paper, isn't necessarily good in practice. You can't trust everything you read. You have to play with it.
No comments:
Post a Comment