The topic of commenting source code has been discussed over and over.
So why blog about it? First, on this blog, I'm calling the shots, thank you very much. Secondly, It's easy to read about why comments are horrible and how bad comments look like, but not so much about how good comments look like.
In in this post I'd like to share with your good commenting practices I've honed over the years.
I dabbled mainly in system programming and my go-to language is C++. Some of the things we discuss apply differently for languages with introspective features or internal documentation facilities. Additionally, keep in mind system code is notoriously very dense.
The "good code is self-explanatory and doesn't require comments" is a non-sequitur. It totally misses the point as comments shouldn't plagiarize the code in the first place.
The most interesting part about the code isn't the code itself but why the code exists. This why can only be explained in comments.
Another thing you might hear is "if your need to explain what your code does, then your code needs a rewrite". Again, this misses the point and falls into the trap of believing that comments should repeat for the human being what has just been told to the compiler.
Before seeing what I consider to be good comments, let's agree together and what constitutes bad comments.
Let's start with the basics and explain why a comment must not be plagiarize code.
An obvious "I write in English what my code does" example:
// set i to five
int i = 5;
Thanks for the info, I thought you were doing a Fourier transform. Fortunately, your comment lifted the ambiguity and now I know you are setting this integer to 5.
Not every code can be made simple, but most code can. If you feel you need to explain what the code you wrote, why is that? Is it because the problem you are solving has some peculiar idiosyncrasies or is it because you are not satisfied with how you solved it?
There are, of course, valid cases where you have to write the what. For example, in assembly it can be common to describe what a group of instruction does. Heavily optimized code may also require some explanation.
Let's imagine we are writing a device driver and that we are doing Direct Memory Access (DMA).
The device is big endian, and your test platform is little endian. However, your driver should work on any platform.
On every OS I know , you have on or several functions for the bit flipping, so somewhere in your code you might have something like.
buffer[i] = RtlUlongByteSwap(v) ;
We used a Windows kernel example because the function name is very explicit:
You feel you should write a comment, because there's a trap here. And your gut feeling is right. So you write:
// change endianness
Buffer[i] = RtlUlongByteSwap(v);
Which is plagiarism. Your reader knows what RtlULongByteSwap
does, and if he doesn't, he can read the documentation.
You commented what the codes does, not why you do it.
A better comment would be:
// the device is big endian while our platform is
// little endian
// see section 3.4 of the device specification
// for more information
In this comment you explain why you change the endianness of the data and you even give a pointer to the reader to know more about the topic should the need arise.
You might retort, what about writing a normalize_endianness
function and have a self-documented code.
Again you miss the point, because although this explains better "what" you do, the why is still missing. Someone might come later and say "we don't need to normalize endianness as they are identical, let's remove this". An error a comment would prevent.
You shouldn't be explaining every function of your code, but a code without a single comment is a bad sign.
It can be also a sign that you have too many abstractions or layers that hide the heart of your functionalities.
Linus Torwald wrote about the overusage of type definitions and unnecessary abstractions can have the same effect on your code.
You certainly have something insightful to say about your decisions and pointers to other parts of the code. It shouldn't be a replacement for the design documentation, but can be a reference to it. In other words, you are giving a context to the reader to understand your code faster.
What did you think about when you wrote your code? What guided you in your choices? What were your constraints?
Talking about your constraints also help people who might be refactoring your code.
// this is the base classe of events, we use the
// CRTP idiom as it gives us better performance in
// that context and reduces the size of objects by
// saving a vtable remember we can have a lot of
// events at any given point in time
template <typename ConcreteEvent>
struct base_event
{
// the StrangeParameter is provided by the arena
// that allocates events.
// allocating objects this way slightly increases
// code complexity but greatly increases
// performances
// the mysterious_hint is used for debugging and
// benchmarking purposes
// and can safely be set to 0
template <typename StrangeParameter>
explicit base_event(StrangeParameter && sp, int mysterious_hint = 0)
};
// we are in the context of a signal handler
// we must exit as soon as possible and signal
// our background thread handler
// for this we are using a synchronization
// primitive which is async-safe: sem_post
// For further information, read the sem_post
// man page.
sem_post(__sh_thread_sem);
The problem with comments is that they are not verified by the compiler and can therefore be out of sync with the code. It's a problem of sloppiness but every developer in the world has been sloppy at some point in time.
The Eiffel language was notorious for the notion of design by contract where you can have invariants, preconditions and post conditions. It helps explaining the intent of your code, but doesn't replace comments as it doesn't answer the "Why?" question.
In system programming assertions are heavily used because stepping through a program isn't always possible and they can help trigger breakpoints in places you didn't expect. Assertions also convey information to the reader.
void * p = magic_allocator(30);
// by contract magic allocator returns cache
// aligned pointers, we absolutely need cache
// aligned pointers to avoid a performance hint
// caused by false sharing
// if the pointer isn't aligned, it will not cause
// a crash
assert(cache_aligned(p));
C++ 11 static assertion have built error message that save you from writing a comment:
static_assert(std::is_signed<Integer>::value,
"This function expect a signed integer as a parameter");
A very powerful comment is to write about your mistakes. It's not unlike sharing your design decisions and can be very helpful to your colleague or even to future you.
// don't use a spin mutex here, contention is high
// and latency isn't paramount
// we tried using a spin mutex and it greatly
// increased CPU usage without
// *any* performance gain
std::mutex _buffer_mutex;
// the buffer has to be kept alive for the duration of
// the callback pass the buffer by value to increase the
// reference counter
auto obscure_callback = [buffer](void)
{
// [...]
return std::error_code{};
};
// use an universal reference as want to accept
// objects that are moveable but not copyable
template <typename T>
void f(T && x) {}
I call fetishist comments things like:
/**************************************/
/* FUNCTION BLAH */
/* Author : John Kakashka */
/* Revision 2 */
/**************************************/
/* TODO: describe function */
/**************************************/
One day, someone forgot to write such comment for a function. Then the software crashed. That's because the compiler reads the comments and optimizes accordingly. If the comment is missing, the compiler generates random bytes coming from a dedicated chip on the motherboard. Also releases should only be done during a full moon.
The other explanation is that your organization doesn't know about Version Control Systems (VCS) and obsessive compulsive behavior is the norm.
The documentation for our C API is written using Doxygen, this is a case where you have to be systematic about your comments and it can really look like the fetishist code from the 80s.
The cumbersomeness of the process is balanced by the fact it's much easier to keep the documentation and the API in sync, it doesn't do the whole job but it really makes it easier.
No sanctuary should exist within a code base, but these comments serve the purpose of generating a discussion:
// this function can be called several thousands of
// times per second
// before you succumb to the desire of improving,
// refactoring or simply adding your personal touch
// to this function, ask yourself this question:
// "Do I feel lucky?”
// after you ignored this warning, made changes
// then reverted them, kindly increment
// the following counter: 23
`
Let's conclude with some rules of thumb for good commenting:
I'm perfectly aware there is almost no comment in the Brigand library of which I am a co-author.
That's because it's a marketing ploy to make you read the book Joel Falcou and I have been busy writing. A book that speaks about template meta-programming but whose sole purpose is to slowly cast a cloud of insanity over mankind, contributing to the return of the Old Ones (and I'm not talking about the standard committee).
Before that happens, we will just see Brigand as a manifestation of "For every principle, there is at least one valid counter-example" and leave it at that.