C++ needs to modernize its legacy #include mechanism to be really a new language

C++ was stagnated for many years, and many developers was confident that the language will have the same destiny as Cobol, Fortran and VB6. No new projects will be developed with it and C++ developers will do just the maintenance of existing projects. But against all odds C++ reborn from its ashes and the new standards changes a lot how the language is used.

But the legacy #include mechanism still there. After the modernisation of the language, it became the next weak link to improve. Indeed, it has many disadvantages, here are some of them from this interesting document.

Compile-time scalability: Each time a header is included, the compiler must preprocess and parse the text in that header and every header it includes, transitively. This process must be repeated for every translation unit in the application, which involves a huge amount of redundant work. In a project with Ntranslation units and M headers included in each translation unit, the compiler is performing M x N work even though most of the M headers are shared among multiple translation units. C++ is particularly bad, because the compilation model for templates forces a huge amount of code into headers.
Fragility: #include directives are treated as textual inclusion by the preprocessor, and are therefore subject to any active macro definitions at the time of inclusion. If any of the active macro definitions happens to collide with a name in the library, it can break the library API or cause compilation failures in the library header itself. For an extreme example, #define std "The C++ Standard" and then include a standard library header: the result is a horrific cascade of failures in the C++ Standard Library’s implementation. More subtle real-world problems occur when the headers for two different libraries interact due to macro collisions, and users are forced to reorder #include directives or introduce #undef directives to break the (unintended) dependency.
Conventional workarounds: C programmers have adopted a number of conventions to work around the fragility of the C preprocessor model. Include guards, for example, are required for the vast majority of headers to ensure that multiple inclusion doesn’t break the compile. Macro names are written withLONG_PREFIXED_UPPERCASE_IDENTIFIERS to avoid collisions, and some library/framework developers even use __underscored names in headers to avoid collisions with “normal” names that (by convention) shouldn’t even be macros. These conventions are a barrier to entry for developers coming from non-C languages, are boilerplate for more experienced developers, and make our headers far uglier than they should be.
Tool confusion: In a C-based language, it is hard to build tools that work well with software libraries, because the boundaries of the libraries are not clear. Which headers belong to a particular library, and in what order should those headers be included to guarantee that they compile correctly? Are the headers C, C++, Objective-C++, or one of the variants of these languages? What declarations in those headers are actually meant to be part of the API, and what declarations are present only because they had to be written as part of the header file?

The solution addressing these problems was already introduced in the first drafts of C++0x, it concern the Modules feature. However it was postponned for each new standard specification.

Modules improve access to libraries with more robust and more efficient semantic model. From the user’s perspective, the code looks only slightly different, because one uses an import declaration rather than a #include preprocessor directive, here’s an example from the C++0x draft:

import std; // Module import directive.
int main() {
    std::cout << “Hello World\n”;
}

No need to include many files of STL, just one import is sufficient, the code became more cleaner. The module import loads a binary representation of the std module and makes its API available to the application directly. Preprocessor definitions that precede the import declaration have no impact on the API provided by std because the module itself was compiled as a separate, standalone module.

Introducing modules in C++ will not be an easy task, the standardisation could be postponed many years, it was introduced in c++0x drafts, but each time it was reported to the next standard. We hope that it will not be like jisgaw (the modular system of java) which was postponed many years.

The good news is that Clang implemented the Module feature for ObjectiveC many years ago, and it will be easy for their developers to adapt it for C++ after the standardisation.

Finally if you can please participate to this Poll, to have an idea of what C++ developers want.

[poll id=”2″]