| person | software quality | research | teaching | publications |
I know it is not really popular in the research world to think about the quality of code. After all, researchers are judged by the citations of their papers, not by the readability, robustness or reusability of their software code. If anything, software is a tool for getting the results for writing the next paper, right?
I think this is a dramatic error, which seems unfortunately widespread in the research world.
I have spent years building up and maintaining a number of big software systems. Some of these are research systems: The text-to-speech synthesis system MARY TTS, and the component integration framework for emotion-oriented interactive systems SEMAINE API.
Basically I learned the hard way why code quality is important, because I have seen the result of what happens if you don't strive for it:
What is worst, the chances to build new functionality on top of existing code grow slimmer with every shortcut that your team takes to make "that deadline". Ultimately, you throw the code away and start a new project – to be thrown away in a few years...
Instead, the way I would like to work is cumulative: Getting one piece of functionality to work reliably so that you can re-use it as a building block for the next, more complex functionality. In order to do that I must be able to trust the code -- and that's where the code quality issues come into play.
Self-testing code is really a great invention, and I think we owe tribute to the inventors of unit testing -- I believe that was Kent Beck and Erich Gamma. I have summarised the reasons why I like test-driven development in the following slideset:
Good resources in my view are the following:
Have you ever worked in a team where your colleague forgot to check in a new source file? You need to make that urgent change in one part of the code and discover that some other part of the code doesn't even compile?
Preventing this kind of situation is just one of many benefits of Continuous Integration (CI). Even if your team has the diligence to always run all tests before commiting their changes (lucky you!), automatically running all tests on a CI machine after every commit just gives you this extra bit of confidence that "it runs on more than one machine".
The one thing that all team members must understand is that keeping the build stable is one of the highest priorities in the team. I have seen one failed test pollute a group of ten repositories, spread by a contribution ignoring the failed test. Fortunately, the fix was just as quick, but it is very easy for people not to care, and then the benefit of CI disappears in "red light".
I have summarised the concept in the following slideset:
Relevant resources:
How do you turn ugly code that you cannot understand into beautiful code that is a pleasure to read? One option is to throw away the old code and write it new from scratch. Sometimes that is the right choice – but often it isn't. Often you should rather "massage" the code, in small steps, so that it continues to work but slowly gets more readable and more trustworthy.
Maybe the most essential principle here is the DRY principle: Don't Repeat Yourself. Copy-paste programming may be quicker at the moment, but it will bite you, or somebody else, in the bottom later.
Martin Fowler compiled a really useful compilation of techniques for refactoring code:
I also very much like the notion of "Software Craftsmanship" put forward by Bob Martin – seeing software development as a craft: a novice can do simple things quickly, but mastering the art takes many years of intense practice. And: the skill doesn't improve through mere repetition, but effort is required to improve one's skills!
To gain trust in your system working correctly, it is really helpful to know that it will complain adequately if something goes wrong, rather than hiding the problem from you. The number of times I have seen try-catch blocks with printStackTrace() statements is really frustrating.
For our MARY TTS system I put together some guidelines how to decide on the appropriate way to handle different types of error conditions. These may not be the final answer, but they seem to be a good start:
These considerations owe a lot of conceptual scaffolding to the excellent Software Engineering Radio:
Not only the development techniques, also the organisation of the team effort altogether can benefit drastically from agile principles. I very much like the notions of Scrum to create transparency within the team and with the stakeholders; to allow the team to adapt as required; and to shield the team against outside interference during a Sprint, giving them the possibility to plan their effort.
Here are some relevant resources: