Code Review for Application Notes
Jerome Kelleher wrote an impassioned letter about the state of the code that gets submitted to journal as software. ( here ).
I appreciate his position and can honestly say that I hate looking at poorly written code that may or maynot perform as advertised. Jesus, I’ve written alot of code that does this and I regularly revisit it to remind myself how much I’ve actually learned.
I applaud him for providing some sensible and probably not terribly challenging solutions to the problem.
That being said , I think his major complaint can be summed up in this sentence from his letter:
“although the note may be well written and the descriptions of the algorithms involved impeccably detailed and rigourous, the implementation may be hopelessly flawed”
he later touches on this key concept of computer programming:
“Computer programming is an art (Knuth, 1974), and writing maintainable, efficient and above all correct code requires training and experience.”
The last portion of that sentence is the most important part of the statement. Code must be correct. If you claim to have implemented an algorithm it better behave as described. This is the most important part of software development and it should be the primary standard by which code is reviewed and software valued.
In his guest post he outlines an example of an interaction he had with a program written in Python. During his review he looked at the code and because it had poor stylistic quality he immediately formed a judgement that the application was bad. He enumerates in great detail the stylistic problems with the code (and also highlights one of the reasons I hate Python as a white space delimited language) but at no point does he make a statement about the functionality of the code. He admits to negatively judging it because an obviously novice programmer wrote it.
As he continues his discussion and continues it in his comment section I think it becomes clear that his real goal is that people write neat, readable, documented, stylistically standardized code.
That’s a great goal but its not something anyone should be getting a rejection letter over.
My two cents:
This whole problem boils down to how we go about training and developing bioinformatics projects/students/researchers.
The lamentations in the article and following comments highlights the difference between learning to code vs. understanding how to program.The later requiring an understanding and application of algorithm design, knowledge of data structures, maths, maybe some hardware.
Currently way too many life scientists are shade tree mechanics kludging unwieldy contraptions together ,using double float for everything, with little or no knowledge of how individual pieces are working. This may make for ugly bloated code that is a pain in the ass to look at and run. It should not detract from the fact they they’re building something that is (hopefully) useful to the community and assuming it works as advertised they should get recognition for.
Until the life sciences understands the difference between taking a c++/python/java class and more “classical” computer science education this problem will not go away.
There have also been some comments made that I take exception to:
“How do we know that a program does what it is supposed to do? Well, we don’t ever really know this unless the program is very simple.”
I hope that Kelleher and Standage are discussing that it is tedious to understand code that isn’t well documented, has poor variable choices, or is intentionally obfuscated; because while code may invoke stochastic processes, upon execution it is static and will behave in a very predictable manner.
“we don’t expect perfection from scientific code”
I heartily disagree , we should expect perfection in scientific code - its probably the only time in the life sciences where we should be able to always predict how something will behave given a set of inputs.
That being said - I’m defining perfection as does it do what the author says it does not was it written in the fewest number of lines with the most done to optimize for speed.blog comments powered by Disqus