Jussi Pakkanen, author of the Meson build system, wrote a nice article recently called Does C++ deserve its bad rap ..? where he defends C++'s reputation as language that is hard to read and unnecessarily complex.

I've not been exposed to much C++ in a professional context. I learned C++98 at university, and the minor update C++03 was published while I was there, but was turned off by its complexity and abandoned it in favour of writing my projects in C where that was acceptable.

C++11 and onward have famously added ergonomic features that have led to a usability rennaisance; range iteration, smart pointers, lambdas, auto/type inference, et al. To show that these have made C++ "good, actually", or at least "not that bad", Jussi implemented a small program:

Find all files with the extension .txt recursively in the subdirectories of the current directory, count the number of times each word appears in them and print the ten most common words and the number of times they are used.

Jussi's C++ code is available in full, and in his post he gives a helpful rundown of the highlights. Up front, I agree that the C++ program Jussi wrote is a lot nicer than the C++ that I cut my teeth on. But what's it like compared to languages that have been created since 1985?

I decided to write a Go implementation that follows the basic structure of Jussi's, and do a rundown comparing the readability of the two. Go and C++ are really similar. Go is statically typed, designed by C programmers, and disrespected by people who don't use it.

Because Go is not exactly like C++, the implementation does a few things differently. Go's regexp package does not contain an iterating regular expression implementation, so to keep things as behaviourally similar as possible I chose to use bufio.Scanner instead. Also, because Go has explicit error channels, the Go version checks and reports on error conditions, which adds to its length.

Now, lets compare readability.

First, some general things that come up throughout the code. The main body of the C++ code is not in the std namespace, meaning that std::X appears throughout. This is what designers call "busy" or "noisy"; too many unnecessary elements in too little space. This can be avoided via using std, so this is not a huge deal, but it's worth citing up front. I don't know enough about the C++ community to know if this is representative, but since additional verbosity is the default, it's worth noting.

Next is the way that C++ object construction looks. In older versions, this was explcit, but verbose. C++11 brought with it several short cuts that cut out some repetition, but they overload syntax in confusing ways. For a simple form of this, take the input file stream (ifstream) declaration for reading from each file:

std::ifstream current_file(e.path());

In this statement, std::ifstream is the type, current_file is a variable name, and (e.path()) is... an argument list to std::ifstream's default constructor. Previously, this would look like:

std::ifstream current_file = std::ifstream::ifstream(e.path())

Which is stuttery and very noisy, but it's explicit and consistent. In another section of code, a different type of initialization is done, where the variable receives the value of the argument. This is done with alternate punctuation, {}'s:

std::string word{it->str()};

The basic function of this code is easy to understand, but the specific operation isn't because of the implicitness and the many-ways-to-do-it. Go's value construction is more limited in expressiveness, but much more consistent. The equivalent lines in the Go program are:

// to open a file stream
f, err := os.Open(path)
// to create a new string with the value of another string
word := strings.ToLower(scanner.Text())

This leads us to another readability tradeoff that is relatively widespread, which is Go's more typical use of type inference. While there is type inference used in the C++ code as well, with the expression const auto &e for the file walk iterator, it's built into Go via the := operator (sometimes called the walrus operator in other language communities). The tradeoff here is that type names are absent in the text, which can make it harder to tell at a glance what types are in use in the program.

Go's design encourages omit-by-default. This leads to less stutter and less noise, which creates cleaner looking code, even if the semantics are identical.

It's an approach that works well with modern editors that are equipped to provide rich type information and provide other search facilities even when the types are being inferred in the text of the program. These were not widespread in 1985 when C++ was first designed, and nonexistent in 1972 when its predecessor was created.

Go has explicitly type declaration syntax, var word string = strings.ToLower(...), just like C++ has auto for inference, but it's rarely used outside of cases where type inference is not allowed by the language. In the past, people have argued that type inference does indeed hurt readability, eg. Gilad Bracha argued against adding let style inference to Java 2001:

Humans benefit from the redundancy of the type declaration in two ways. First, the redundant type serves as valuable documentation - readers do not have to search for the declaration of getMap() to find out what type it returns. Second, the redundancy allows the programmer to declare the intended type, and thereby benefit from a cross check performed by the compiler.

Despite this style of objection, auto types came to Java via var in Java 10 in 2018, and I think inference is generally considered a productivity and readability benefit in all languages that have adopted it.

These design tradeoffs (plus mandatory semi-colons) combine to make the C++ code noticeably busier than the Go code. I think a lot of programmers consider these minor details compared to semantic complexity, but other things being equal, they have an impact on readability.

However, for all these differences, parts of the code do read very similarly, and not just because I tried to write the code in the same way:

struct WordCount {
    std::string word;
    int count;
};
type WordCount struct {
    word  string
    count int
}

At the top of the program, the type WordCounts looks pretty identical. The uses of std::X in the C++ code would simplify if we were in the std namespace, which we could do, so lets drop that even though it contributes to visual noise. In C++, struct's are implicitly typedef-ed, unlike C, which is a little confusing compared to Go's more consistent type declaration syntax.

for(const auto &[word, count]: word_counts) {
    word_array.emplace_back(WordCount{word, count});
}

Looping through the word_counts map feels similar, though C++ again introduces new syntax. Go's version uses the append builtin, which, while being much better named than emplace_back, is more awkward to use:

for word, count := range wordCounts {
    wordArray = append(wordArray, WordCount{word, count})
}

This is an idiom that melts away with enough Go experience, but wordArray.append() or append(&wordArray, ...) would definitely be easier for the uninitiated to understand.

The sort in C++ is a partial sort, which will likely be faster than the full sort that's being done in Go. This is nice to have at your disposal, but much wider API surface area makes C++ a much bigger thing to learn.

// count_order_desc lambda definition:
auto count_order_desc = [](const WordCount &w1, const WordCount &w2) { return w2.count < w1.count; };
// ...
std::partial_sort(word_array.begin(), word_array.begin() + final_size, word_array.end(), count_order_desc);

In Go, we're sorting the whole slice, and we're defining our functor inline, but it's much the same.

sort.Slice(wordArray, func(i, j int) bool {
  return wordArray[i].count > wordArray[j].count
})

C++ can't really do much about its inheritance of C's inside-out type declarations, but type reuse in declarations is really nice in parameter lists. The C++ lambda becomes more concise if we imagine an equivalent feature: [](const WordCount &w1, &w1) { return w2.count < w1.count }. I suppose it may be personal preference if that's more or less readable.

If we zoom out and look at the files as shapes, a clear picture emerges:

goc++

The Go program has much shorter, sparser lines. This is the result of the aggregate noise omission in the language: parentheses around loop and if conditions, line endings, type noise. Go's line of sight is much better, but this is a little superficial; the C++ program's deeper nesting is due to the Go program scanning the entire file at once, instead of iterating by line and then by token. If this were a more complex operation, their left alignment would be more similar.

This wouldn't change the C++ program's longer lines. A small amount of this is stylistic, eg. using f for the file handle, but a lot of it comes down to excess line noise in the type machinery in C++.

The Go program also works with Unicode text files, and compiles and works across Linux, OSX and Windows. Fixing the C++ code for this would likely further harm its readability.

While I think that cleanliness is a nice property to have, I think it's overrated when it comes ot readability. People will gladly write much more code in order to make all of their functions short and simple, forgetting that the readability we're most often interested is an understanding of the program in its context and not of each component in isolation.

Despite having used Go for some time, engaging in this honestly has definitely uncovered some weak spots in code that I've learned to live with, but they're worth mentioning.

The zero you pass when making a slice with reserved capacity in make(T, 0, cap) isn't great. The way that range's behaviour between map and slice types is different is sensible but a little nuanced, and there's a third behaviour for channels. Also, breaking the loop for filepath.Walk does feel a worse than having an explicit iterator, but the lack of an iterator protocol for use in range loops makes them less than ergonomic.

Despite that, I can't help but come to the clear conclusion that C++, even in its modern state, is still complex and difficult to read. C++ programs contain the additional burden of complexity-by-omission; the techniques and APIs that were superceded by new techniques weren't removed, they still exist. On the style front, places in the language where the verbosity has been reduced have adopted context aware reuse of syntax that used to have a single meaning, making decoding each statement more difficult.

Oct 16 2020