Wednesday, March 28, 2012

Coding Standard In The Company

Dear Engineers,

Our engineering organization is going to grow. We will hire many more people, and our code base will explode. In a large organization, coding is not just a form of communication between human-to-computer. It is also a form of communication between human-to-human. In this form of communication, coding is like writing an essay. An essay is consists of paragraphs, sentences, words, etc. In addition, writers try to follow certain conventions such as organization/structure, spelling, syntax, punctuations, ..., and they do so for the sake of making communications more effectively. There is a simple automated tool to check for the Python equivalent of punctuation/syntax/spelling (PEP8). There is also the Python equivalent of grammar checking (Pylint). Unfortunately, for high level concepts such as organization/structure/patterns, there are no tools out there (and it would be quite a feat to write a compiler to perform analysis that goes beyond context sensitive grammar). For these high level concepts, the state of the art is basically the code review.

Over the past few months I've been reviewing many people's code. People tell me that my guidelines seem arbitrary and "made up." I will make it clear to everybody that I didn't make up these guidelines. The guidelines below are very VERY common sound engineering practices. I will also make it clear that sometimes, I am too guilty of breaking these guidelines from time to time, and I hope people can point them out so that I can continually improve myself. In the end, the purpose of a coding guideline is to make the overall software quality higher so that it is 1) easier to read by other people (it is a lot more painful to read and fix other people's code than to write yours, right?) 2) easier to debug 3) easier to maintain 4) easier to extend.

Below are some of the most common coding issues. I understand that almost everyone has coded for over X Y Z number of years and you know how to code, but, there are always little things to increase the quality of the code. In short, when you submit a code for review, check for the items below because your code review responses/comments are actually quite predictable based on the common issues below:


1) "string typing" -- this is one of the most common habits programmers commit. String typing is the use of strings in parameters. Let's say you have a function that takes in "active", "disabled", "deleted" strings. As a programmer, you've done this for over 10 years and it has worked well so why should anyone care? Well, when some new hire comes in and makes a typo in the code (upper/lower case typo, or mistyped "activated" instead of "activate", etc), the code will be broken! In addition, refactoring (rename string constants in your code) in a 1 million line code project is extremely error prone and NOT fun -- what if "disable" and "disabled" strings are everywhere in the code, and many of them aren't even relevant to you?

Better: use constants (in a class). For example, class Status may contain instance variables active, disabled, deleted. These are automagically filled by IDEs, thus the refactor (rename, checking, etc) is done in less than 1 second. Lastly, static analysis tools such as Pylint (or Pycheck, ...) can properly infer whether the variables are valid or not BEFORE the code runs. Automated tools are your friends, so when you try to be clever or lazy (because typing string is so much easier right?), then you are on your own.


2) "string hashing" -- this is similar to string typing. This is the use of dynamic language's hash to pass data structure from one function to another. In another word, you are relying on a particular *convention* to hold the program together, so if anyone else forgets and/or breaks your convention, you are f***'ed at runtime! For example, you may have data = {'id': 123, 'name': 'Joe Blow', ...}. When you refactor or make a typo, well... you won't know you've made a mistake until the program runs [in production]!

Better: use classes to hold objects. This way, automated tools can tell you whether there's a mistake or not BEFORE the program even runs.


3) Reliance on a certain naming convention. This is similar to the above two. For example, let's say your code reads from a text file and the file contains text entry "delete" and "create". You try to be clever and make two local functions: deleteName(...) and createName(...) and perform a dynamic call by doing "local()[my_str + 'Name'](name)" "getattr(self, my_str + 'Name')(name)" Well, your code is making a certain assumptions, so if someone else changes the text file, you get a weird runtime error, and your intern spends 6.5 hours looking at this weird issue and time is wasted.

Better: don't get creative and never assume your input is always clean, because someone else will change your input, if not now, then definitely later.


4) Importing from *, or importing from relative paths. This is problematic both from readability standpoint as well as PATH runtime resolution point of view. Frequently, automated tools (such as nosetest) is confused about relative paths, and outputs erroneous messages.

Better: first, for readability, import full path explicitly (instead of *) so the reader know exactly what component you're using. This is super easy with IDEs. Secondly, if you must import a bunch of components in *, then import one level up. For example, "import xyz.ttypes" and use "ttypes.TypeA|B|C|D|E"... This has the added advantage that if you have a name space collision (e.g. ttypes.Rad and models.Rad), then you can disambiguify by using ttypes.Rad and models.Rad.


5) Lacking simple unit test. Let's say you are calling a library called bear.call(rabbit, berry). Function bear.call takes on two parameters, rabbit and berry. When the bear.call parameters are changed, Jenkins happily allow the check-in, and you don't realize your code is broken until runtime on production.

Better: unit tests are to *protect* your code from underlying changes. Even if you don't have time to increase coverage, a simple API usage unit test with only 5 lines goes a long way.


6) Catch all. This is the pattern of "try: ... except Exception e: pass" or AKA "sweep dirt under the rug" pattern. Typically the programmer is trying to perform some operation and is unsure whether the operation will succeed or not but wants to handle failure gracefully. So instead of addressing failures, the programmer blindly catches and passes *all exception*. Why is this a problem? Because 1) you are hiding problems that can compound and/or propagate later. For example if you have an arithmetic precision error and the bad results are ignored, then errors are eventually propagated to other functions and possibly stored in the form of data. I sure wouldn't want my bank to catch-all and save bad arithmetic results in the storage layer. 2) if you hide problems, then programmers don't have a chance to know there's an error and therefore never fix it.

Better: catch exactly what you need to catch, and propagate errors back to the original call site. For example, if you perform some math function, then only catch math errors. If the math function returns a file error for whatever reason, then just allow that error to propagate upwards to the caller, so that the full stack trace is shown to programmers.

There are of course some rare cases where you do in fact catch all error. For example, if you're writing a library layer and the library can call anything and everything (and the program must continue to run), then you have to catch all. In this case, you SHOULD log or print out errors instead of blindly doing nothing.


7) Function parameters named a, b, value, max_val, so on so forth. Why is this bad? Because the reader of the code has no clue what it is, and even with IDEs it is not very helpful! Another parameter that is really hard to read is the "pass all" *arg + **kw parameters. The reader has no choice but to dig into your code and READ what parameters are accepted.

Better: Describe your parameters like max_iteration, baseline_threshold, so on so forth. This way, when someone calls your function and types ALT-P, the names of the parameters pop out immediately. In situations where you must use *arg and **kw, document clearly in your function.


8) Rely on runtime magic. You take a string, and magically turn that into a function and call it. Or, you start using a bunch of introspection methods (find a function that starts with letter X and ends with Z and then call that function). Why is this to be avoided in general? Because automated tools cannot infer anything about the safety of reflections/dynamic calls, so your only assurance that the code is bug free, is when you run it with all possible inputs (ya right). Typically, this type of code is full of dynamic getattr, or local()['func_name'](parameter), so on so forth.

Better: KISS. don't try to be clever and out-clever automated tools. Automation tools (Pylint, Pycharm, Pychecker) are here to help you, so if you try to be clever and out-clever these tools, they become useless. Imagine that the reader of your code needs to think more to follow your logic-- the more complexity he needs to read, the more likely a bug will be passed on.


9) Assumptions about timing. I see this from time to time. Real example: the authors of project R observes that a script always exits within 45 seconds. So he create a cron job every minute that forks off that script. Well, as the data set gets larger, the script starts taking longer and longer. Eventually, the script takes such a long time to execute while the cron still launched every minute, the machine has a zillion of these processes in flight, thrashes, and runs out of virtual memory. The moral of the story: if you assume something behaves certain way 99.99% of the time, then even 0.01% of the time will really screw you up.

Better: don't make assumptions. In the case above, use a [priority] task queue.


10) Busy waiting. If you write a [multi-threading] program that has a for/while-IO-sleep pattern, you're doing busy waiting. For every busy waiting pattern out there, there is an efficient I/O sleep-signal-wake implementation/pattern to use.

Better: this is a very long discussion on IO, OS, wait, lock, signal conversation. Hit me up and I'll give you references to publications and/or text-books.


11) Global variables. You stick everything in global variables because it's "easier." Now functions may have side-effects. It may be easier to write programs using global variables, but it is difficult to understand because other people need to read the source code and see what other programs are doing with the global variables (which cause side-effects). Global variable programming is the antithesis of OO programming.

Better: OO because it increases level of communications and increases flexibility (even if it takes a bit more typing).



There will be others from time to time. I hope these guidelines can help people can program in a way that maximizes communication levels between other programmers, as to allow the code base to grow, to be readable, to be debuggable, to be maintainable, to be extensible.