mongo/docs/cpp_style.md
Steve McClure 4eccf36ba5 SERVER-124986: Fix markdown alert formatting (#52294)
GitOrigin-RevId: 54fa142af331b129327a079ef94ba9102ee59a4b
2026-04-28 18:57:32 +00:00

1045 lines
43 KiB
Markdown

# MongoDB Server C++ Style Guide
This document describes common conventions used in the MongoDB server codebase. The document is
about C++, but there are a few places where JavaScript style is discussed as well.
A firmly established style guide can make source files unsurprising as they are more easily
navigable and regular in shape.
Style rules can eliminate wasted time on minor issues in code reviews. An author should endeavor to
be style-compliant before sending a pull request for review. This should accelerate code reviews and
establish consistent expectations on code.
The guide is carefully considered by very experienced C++ engineers. C++ code can be complex, and
there are subtle correctness and maintainability risks that can arise from certain antipatterns
addressed by the guide. Style adherence enables code authors and their reviewers to productively
write safer code without having to first rediscover those problems for themselves.
## Feedback (MongoDB internal)
This is maintained by the Server Programmability team.
- Use `#server-programmability` on Slack for discussion and clarifications. Contributors outside of
MongoDB can use Jira instead.
- For change proposals, please feel free to add entries to the MongoDB C++ Style Guide Proposals
document pinned to that channel.
- Jira and PRs are fine for small fixes unrelated to C++ style, such as typos, formatting, phrasing,
and comments.
## Style
## Names of Identifiers
There's some truth in the old joke that naming is the hardest problem in programming. It's
impossible to write catch-all rules for naming, but we can set guidelines with the intention of
avoiding friction in reviews and having some expectation of general consistency across our codebase.
- Types use `TitleCase`. First letter of each word is uppercase. Following letters are lowercase.
- Functions and variables use `camelCase`. First letter of each word after the first is uppercase.
The first letter of each word, except the first, is uppercase.
- Namespaces use `snake_case`. No uppercase letters, and words separated by underscores. (See
"[Namespaces](#namespaces)" section below).
- Spelling: Take care to avoid misspellings in names. This is more than aesthetic. It is easier on
readers. Misspelled names can harm confidence in code quality. Misspelled names might be skipped
by code searches. Our convention is to use US English spelling.
- Identifier names should be short but clear. Long sentence-like names become a laborious comparison
exercise for readers, and can form a "wall of text" that can bury significant C++ keywords and
operators. Local variable names can be particularly brief without causing confusion, provided that
the enclosing functions remain compact and focused.
- Repetition and redundancy in names should be avoided. A function name doesn't need to restate the
types of its arguments, for example. The arguments can usually speak for themselves, but explicit
disambiguation may be desirable in some cases.
- Word abbreviations should be used carefully. When used, they should be applied very consistently
and documented well. This keeps users from having to guess which words are abbreviated and which
are not.
- Private members are usually named with a leading underscore (e.g. `_detail`). This applies to data
members more consistently than to functions. Identifiers with a leading underscore followed by an
uppercase letter are reserved by C++, and must not be used. Therefore, the leading `_` should not
be used with private types and typedefs. Double underscores `__` must be avoided as well. See
[article](https://devblogs.microsoft.com/oldnewthing/20230109-00/?p=107685).
### Constants
Constants are either ordinary variables `varName` or with a `k` as a prefix word, like `kVarName`.
You'll see both in the codebase and either is acceptable. You may also find some older code using
`MACRO_STYLE` for constants. That should not be used in new code outside of macros.
### Test Access
Some entities are defined in an API purely to facilitate test access and testability. We
conventionally tack a `_forTest` suffix (or a `ForTest` suffix for types) onto its name as an
indicator that it should not be used by non-test code.
## Class Definitions
While class and struct are largely equivalent in C++, this codebase uses a convention where structs
are used for simple collections of data (possibly with methods), while classes are used for new
abstractions. As a rule, all data in a struct should be public and all data in a class should be
private. If you are unsure which to use, consider whether there are any invariants that need to be
upheld, either within or between members. If there are not, then a struct may be appropriate.
If a type is a struct or struct-like class, then consider omitting all constructors and letting it
be a [C++ aggregate](https://en.cppreference.com/w/cpp/language/aggregate_initialization), which
allows some flexibility in initialization syntax.
If a type has invariant-preserving constructors, special behaviors, and internal private details,
it's not a `struct`. It's subjective, but structs should be a mostly straightforward aggregation of
data members.
Consider a somewhat canonical example of a `Date`, consisting of `year`, `month`, `dayOfMonth`. The
valid range of a `dayOfMonth` depends on `year` and `month`, so this type either has an invariant,
or it has to be allowed to be in an invalid state. If the invariants of this type are enforced by
the type's constructors and setters, then it should be a `class`.
It's possible to leave such a `Date` type as a `struct` and enforce these invariants from the
outside through careful discipline among its users. This is what C APIs have to do. We should prefer
using data encapsulation and `class` for such complex objects.
### Order of Class Members
Within a class or struct definition, try to stick to this ordering by default. A consistent
convention makes it easier for a reader to quickly understand and navigate a class declaration.
Group public API at the top, and details at the bottom.
- `public`
- `protected`
- `private`
Within each of these visibility sections, there's a preferred order of declarations.
- Attributes of the class come first:
- Types and type aliases, including declarations and enums
- Static constants and static data members
- Static functions
- Then declarations that are relevant to each instance of the class:
- Constructors
- Destructor
- Copy and assignment operators
- Member functions
- Data members
As always, technical concerns override style, and this order sometimes cannot be exactly followed
for technical reasons, but it should be the predominant weakly-binding preference when laying out a
class in the absence of motivation to diverge from it. Private data members have a leading
underscore followed by a camel case name like `_fooBarBaz`. Protected members may or may not have a
leading underscore, depending on how logically internal they are. This convention doesn't apply to
types.
### Naming of Class Members
```c++
class Foo {
public:
// This is just for demonstration purposes. Classes/structs should rarely
// have a mix of public and private data members.
int publicMember;
protected:
// We've never had a convention about protected members. Both are
// widespread, so either is okay. It depends on how "private" the variable
// is to the derived classes.
int x;
int _y;
private:
int _privateMember;
};
```
### User-facing Names That Include Units (not strictly a C++ issue)
This section applies to names that users can see, like BSON field names or server parameters, but
not necessarily to C++ identifiers.
In things like `serverStatus`, include the units in the field name if there is any chance of
ambiguity. For example, `writtenMB` or `timeMs`.
- For bytes: use `MB` and show in megabytes unless you know it will be tiny. Note you can use a
float so `0.1MB` is fine to show.
- Durations:
- Use milliseconds by default. Prefer the suffix `Millis`, but be aware that `Ms` is also used.
- Use `Secs` and a floating point number for times that are expected to be very long.
- For microseconds, use `Micros` as the suffix (e.g., `timeMicros`).
## Documentation
- API docs should appear directly above the thing being documented and use `/**` or `///` style
comments.
- If it fits, a comment can be to the right of a variable with `///< doc`. (See
[Doxygen syntax](https://www.doxygen.nl/manual/docblocks.html#memberdoc)). The `<` is important,
as it tells tooling such as clangd to bind backwards to the preceding decl rather than the
following one.
- We don't run Doxygen or recommend other Doxygen markup, this style of comment delimiter
distinguishes API docs from other comments.
- Use complete, grammatical sentences for API docs. Reviewers should pay attention to the clarity of
documentation as it would appear to a reasonably-experienced server engineer who may not be a
domain expert on the code.
- Avoid overly conversational tone, unnecessary personal references (like "I", or "Pat"), slang, or
jargon. Comments should strive for professionalism, but without rigid formality.
- Comment syntax
```c++
stdx::thread _thread; ///< Empty until init is called.
/** Single line doc. */
void easyFunction(int x, int y);
/**
* Multi line doc. Spans multiple lines.
* The top and bottom lines of this comment block are blank.
*/
void complexFunction(int x, int y) {
// Interior implementation details use line comments like this.
return someFunc(x + y);
}
```
- Give the right amount of information. Make some attempt to give the gist of complex processes.
Avoid being unnecessarily vague to avoid explanation that would be helpful to the consumer of the
API. Conversely, try to avoid going too much into implementation details in doc-comments (or at
least clearly state when doing so using words like "currently") unless those details are part of
the API that consumers should rely on.
- Comments should be descriptive rather than imperative, e.g. "Frobnicates the widget", not
"Frobnicate the widget". The subject of the initial sentence is assumed to be the thing being
documented and should generally be omitted, e.g. don't say "This function frobnicates the widget".
```c++
/** Calculates the sum. (GOOD: descriptive verb) */
/** Calculate the sum. (BAD: imperative verb) */
```
There's no need to be very formal about their formatting or use elaborate Doxygen/Javadoc etc tags.
A smattering of text-like markdown is good. Some IDE features or other tooling might pick up on it,
but it shouldn't interfere with the primary use case of viewing the comments as text while browsing
a header file.
Reader attention is a precious resource, so try to write concise comments, and obvious things need
not get a comment. Comments should be adding information. Do not restate the name and signature,
unless there is a subtle detail that should be highlighted.
Assume the reader knows the language. Special member functions like the copy constructor do not need
comments saying what they are. `operator==` should only get a comment if there is something
interesting about it like omitting a member, or being order-sensitive.
Most classes and functions should default to having at least a 1-liner comment, but sometimes
context and good naming can make even that a redundant formality to be omitted. While this is a
subjective decision, remember that later readers will need more hints than the original
implementers.
```c++
/**
* If the current command does not override Foo, then it comes from a system-wide default
* value set by the "foo" server parameter. (GOOD: nonobvious).
*/
Foo getFoo() const;
/** Gets the bar (BAD: obvious, no info). */
const Bar& getBar() const;
```
### TODOs
To cite a ticket as a TODO in the code, use this format, with a short reason for the link. A Jira
bot will create reminders when the cited target ticket is resolved. The target of the TODO cannot be
the current ticket. Suppose SERVER-12345 was a ticket to fix the frobber, and we're documenting some
workaround code:
```c++
// TODO(SERVER-12345): Remove this code when the frobber works again.
```
In comments, a function may be referred to using just its name `foo`, or by `foo()`, or
`foo(int,int)`, depending on context and whether the other forms are ambiguous.
## C++ Code
Much of the guide has been about cosmetics like layout and formatting, comments, and naming
conventions. This section presents more substantial technical issues.
### Minimal Syntax
If a keyword or operator is a "noise" word with no technical benefit, omit it. The philosophy here
is that it's better to write the code as plainly as possible. Code should not look like it's doing
something special when it isn't.
Some examples of "noise" syntax:
- Redundantly marking members and bases as `public`, `protected` or `private`, etc when they already
are.
- Marking a function decl to be `extern` (they're already extern).
- Using `virtual` on a function that's already `override` or `final` (see
"[Overriding Virtuals](#overriding-virtuals)").
### Constructors
Constructors that can be called with single arguments should be `explicit`, unless implicit
conversion is desired, in which case use `explicit(false)` to explicitly show that intent. Non-unary
constructors should NOT be `explicit` unless it is important to disable bare braced initialization.
If a constructor takes a variable number of arguments such that it is possibly unary, make it
`explicit`.
### `= default`
Prefer `= default;` when needed over defining an empty or trivial function body `{}`. But where
possible, it is usually better to omit the declarations for lifetime methods entirely and let the
compiler declare them implicitly.
Consider that for some classes it may be useful to declare a function normally in a `.h` file and
provide `= default;` as the implementation in a `.cpp` file.
### Noexcept
The `noexcept` feature is easy to overuse. Do not use it solely as "documentation" since it affects
runtime behavior. It's a large topic, covered in the
[Exception Architecture](https://github.com/mongodb/mongo/blob/master/docs/exception_architecture.md#using-noexcept)
document.
### Overriding Virtuals
Use `override` wherever it can be used. Tighten this to `final` when necessary, and where further
overrides would introduce opportunities to break base class guarantees.
Each declaration should have at most one `virtual`, `override`, or `final`.
Like many style rules, there are rare technical situations to bend this rule. In this case it can be
used to force compilation errors on unintentional hiding.
If a class is known to be a leaf in a hierarchy of polymorphic types, annotating the class with
`final` can be a useful optimization to enable its `virtual` functions to be devirtualized in some
contexts.
### Rules For `.h` Files
- Use `#pragma once` as an include guard, as the first line after the copyright notice.
- No unnamed namespaces in headers at all. (See the "Namespaces" section below).
- Use `inline` or `extern` on namespace-scope variables in headers, so that each translation unit
does not get its own copy. Note that `inline` variables provide some init order guarantees which
may add a small startup cost, so define them as `constexpr` or `constinit` if possible.
- Keep complex code out of headers. If a function is not performance sensitive, and it is longer
than a few lines, put it in the corresponding .cpp file. This practice should help to reduce the
number of include statements needed in headers, which is good for modularity and for compilation
speed. That said, simple getters and setters should generally be inline.
### Rules For `.cpp` Files
Entities with "external linkage" are usable from outside the .cpp file where they are defined. It's
the default linkage for functions, variables, and types defined at namespace scope, making this
unintentional exporting a common error in C++.
Export with intent. Avoid defining anything with external linkage unless it's declared in the
header. We don't want to have surprising link-time name collisions or other multi-definition
problems as the codebase evolves. When code has no more callers, it can be readily identified as
dead code if it has internal linkage.
Use either unnamed namespaces or `static` to make definitions with "internal linkage". These are
private to the .cpp file in which they appear. (See
"[Linkage](https://en.cppreference.com/w/cpp/language/storage_duration#Linkage)").
### API Conventions
#### Integer Ranks
We don't typically use the `long` or `long long` integer ranks, except in the BSON API or when
interfacing with third_party or system APIs. In particular, we should never use plain `long`
directly unless required by some outside API since it is 32 bits on some of our supported platforms.
We use `int`, `size_t`, and the explicit width typedefs `int32_t`, `uint32_t`, `int64_t`,
`uint64_t`, etc. Prefer `size_t` for string/array/container/sequence sizes and indexes, since that's
what C++ does.
#### `const`
- Our code uses "west const" (`const X x;`) rather than "east const" (`X const x;`).
- `const` is not required on local variables.
- Making `const` data members of a movable class can lead to problems with move and assign
operations, and is usually not necessary. On the other hand, it can be useful for types that are
never moved or copied. In particular, for types that are accessed concurrently it is useful to
mark members that are not modified after construction as `const` because they cannot participate
in data races.
- Don't use `volatile` qualifications. It's an oft-misunderstood feature and only appropriate in
very precise technical scenarios.
### Strings
- We do not use `std::string_view`. Use `StringData` from `base/string_data.h` instead. For
interoperability with functions that accept or return `std::string_view` (e.g. `std::string`), use
the pair of conversion functions `toStdStringViewForInterop` and `toStringDataForInterop`.
- Working with `char*` strings can be notoriously error-prone. Convert such data to `StringData` or
`std::string` for safety, or use utilities in `util/str.h` for this sort of thing.
### Performing String Formatting
There are at least two kinds of generic string formatting available. We have stream-oriented
formatting with `StringBuilder` and its wrapper `str::stream()` (using a stripped-down
`std::ostream`-like API), and newer `libfmt` formatting (using Python-like syntax). We do not use
`std::format`. `sprintf`-style formatting is very rarely used.
```c++
#include <fmt/format.h>
takesString(fmt::format("x={}, y={}\n", xValue, yValue));
```
```c++
#include "mongo/util/str.h"
takesString(str::stream() << "x=" << xValue << ", y=" << yValue << "\n");
```
### Output Parameters
Use pointers or mutable references as "in/out" or "output" parameters, but prefer returning values
to using pure output parameters. Mutable references used to be banned, but this is no longer the
case, and they are now encouraged for many cases, especially if the callee will not require the
reference to be valid after returning. That said, some types, such as `OperationContext` are
conventionally passed by pointer. It is best to stick to established conventions for such types to
avoid needing a lot of additional `&opCtx` and `*opCtx` noise at call sites between functions using
different conventions.
```c++
void appendData(const std::string& tag, std::vector<MyType>& out) {
out.push_back(_getData(tag));
}
```
### Namespaces
- Namespace names use `snake_case`. No uppercase letters, and words separated by underscores.
- Contents of `namespace` scopes are not indented.
- Close namespaces with a comment. `clang-format` automatically adds these comments.
```c++
namespace foo {
int fooVar;
namespace bar {
int barVar;
} // namespace bar
} // namespace foo
```
- Do not use "using directives" (i.e. `using namespace foo;`) for arbitrary namespaces as a naming
shortcut. Some namespaces are designed to be used this way in restricted contexts, but still never
at namespace-scope in header files. These carefully curated namespaces contain only a few
definitions. Examples of these limited exceptional namespaces would include:
- The `std::literals`, `fmt::literals`, and similar namespaces that hold user-defined literal
operators. Using directives are necessary for importing user-defined literals.
- The `std::placeholders` namespace containing `_1`, `_2`, for use with the `std::bind` API (which
we have banned anyway).
As an alternative, a namespace _alias_ may help to declutter local scopes.
```c++
namespace bc = timeseries::bucket_catalog;
namespace bfs = boost::filesystem;
```
- No unnamed namespaces in headers at all. They can produce subtle correctness risks, particularly
in the form of
[ODR (One Definition Rule)](https://en.cppreference.com/w/cpp/language/definition#One_Definition_Rule)
violations.
- In .cpp files, use unnamed namespaces to strip definitions of their linkage. Headers should
generally only be declaring entitiees with external linkage.
- Most server code should be in the `mongo` namespace, and we have several sub-namespaces nested
within that, often used to help organize code by team, by project, or by large feature.
- Defining a new nested namespace as an API point is cheap, but can be a little fiddly for users if
we have too many of them, so they should be substantial and relatively coarse-grained (a handful
per team).
- Use a component-unique namespace, eg `future_details` or `duration_detail`, to give names to
pseudo-"private" details in headers. It's important to include the component name here. Using
`mongo::detail` or `mongo::internal` doesn't mitigate the problem of name collisions between
components.
- As a matter of namespace etiquette and modularity, avoid using anything in a component's `detail`
or `internal` -suffixed namespaces from outside the component. If you need to use such a private
name, that should ideally involve a conversation with the code owners about promoting it out of
the detail namespace.
- Combine immediately-nested namespace blocks where possible:
```c++
namespace mongo::foo::bar {
int barVar;
} // namespace mongo::foo::bar
```
### Control flow
- Place exceptional path first.
- Return early.
- Avoid `else` after a returning `if` statement.
```c++
Status ifElseSpaghetti() {
Status err;
if (err = doStuff1(); err.isOK()) {
if (err = doStuff2(); err.isOK()) {
if (err = doStuff3(); err.isOK()) {
if (err = doStuff4(); err.isOK()) {
// Expected path obscure and indented
} else {
}
} else {
}
} else {
}
} else {
}
return err;
}
Status withEarlyReturns() {
if (auto err = doStuff1(); !err.isOK())
return err;
if (auto err = doStuff2(); !err.isOK())
return err;
if (auto err = doStuff3(); !err.isOK())
return err;
if (auto err = doStuff4(); !err.isOK())
return err;
// Expected path obvious and prominent.
return Status::OK();
}
```
#### Range-Based `for` Loops
[Range-based for loops](https://en.cppreference.com/w/cpp/language/range-for) can have subtle
issues. The usual practice is to use a forwarding reference (`auto&&`) as the item variable.
Applying this pattern as a default practice prevents subtle copies and conversions of the range
elements.
```c++
for (auto&& item : someRange)
```
For ranges that have pair or tuple elements, particularly maps, it's common to use structured
bindings to give names to the parts of the item:
```c++
for (auto&& [key, value]: someMap)
```
It's worth a note of caution about the dangers of the range expression in a range-based for loop, as
this is a common and subtle source of bugs.
The range expression is bound to an implicit range variable, and its lifetime will be extended if
it's a temporary, as usual with C++ initializers.
But other temporaries created in the initializer expression will die after the initializer. They are
not extended to the lifetime of the for loop.
```c++
// ok: temporary is bound to implicit range variable.
for (auto&& item: makeVector())
// BUG: the result of obj() is destroyed.
for (auto&& item: obj().view())
```
The rules here change in C++23, such that all temporaries in the range initializer are extended. The
fix is a theoretically a breaking change for some code. But the risk tradeoff overwhelmingly favored
making this change anyway.
<!-- prettier-ignore -->
> [!WARNING]
> The compilers we are using have not all implemented this feature yet, even on the v5 toolchain. So
> we still need to be extremely careful with range expressions that rely on intermediate temporaries.
It would be helpful to read the
[CppReference](https://en.cppreference.com/w/cpp/language/range-for#Temporary_range_initializer) on
this topic. Some good
[bug examples](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2644r0.pdf) are listed in
the single-page ISO C++ proposal to fix the problem.
### Assertions
This is a large topic. See the
[Exception Architecture](https://github.com/mongodb/mongo/blob/master/docs/exception_architecture.md)
architecture guide.
### Logging and Output
We use a custom logging system, documented in the
[Logging](https://github.com/mongodb/mongo/blob/master/docs/logging.md) architecture guide. Direct
output to `stdout` or `stderr` streams is only done by special server code.
### Numeric Constants
Large, round numeric constants should be written in a user-friendly way.
- If a number is derived from a simple numeric expression, expressing it as an expression can help a
reader verify and maintain it. For example, prefer `50 * 1024 * 1024` to `52'428'800`.
- Use digit separators `'` for large numeric constants. 3-digit groups for decimal. Conventionally,
use 4-digit or 8-digit groups for hexadecimal or binary.
- Use a bit-shifted form for power-of-two exponentiation. eg, `1<<13` to express 2<sup>13</sup>.
Make sure the "1" is wide enough for the shift if it's large (e.g. `uint64_t{1} << 52`). A
`* 1024` sequence is also acceptable, as it's a recognizable idiom for kiB and MiB expressions.
- Do not assume suffixes like `ULL` will produce specifically typed quantities like `uint64`. Use a
numeric literal and the compiler will give it a wide-enough type. Where the exact type matters,
use an explicitly typed expression.
```c++
const int tenMillion = 10'000'000;
const int miBiByte = 1 << 20;
const uint64 exBiByte = 1ull << 60; // Arithmetic expressions may need a particular type.
const uint32 crc32Polynomial = 0xEDB8'8320;
const uint32 asciiMask = 0b0111'1111;
arrayBuilder.append(uint64_t{1234}); // Force argument type.
```
### Casting
- Do not use C-style cast syntax (parentheses around the preceding type) ever. See
[this CGL rule](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es49-if-you-must-use-a-cast-use-a-named-cast)
and [this Google rule](https://google.github.io/styleguide/cppguide.html#Casting) for discussion.
- Use `static_cast` as needed. Use `const_cast` when necessary.
- Be aware that `dynamic_cast`, unlike other casts, is done at runtime. You should always check for
`dynamic_cast<T*>` returning null pointer.
- `reinterpret_cast` should be used sparingly. It is typically done for low-level layout conversions
and accessing objects in ways that may break the protections of the type system and exhibit
undefined behavior if misapplied.
- When down-casting from a base type where the program logic guarantees that the runtime type is
correct, consider using `checked_cast` from `mongo/base/checked_cast.h`. It is equivalent to
`static_cast` in release builds, but adds an invariant to debug builds that ensures the cast is
valid.
### RAII and Smart Pointers
- Embrace RAII (Resource Acquisition Is Initialization). This means that resources should generally
be managed by objects that automatically release them when going out of scope.
- By default, the assumption in our codebase is that raw pointers are views/borrows and never
owning. Document exceptions to that rule, and try to avoid having owning raw pointers as part of
your API.
- Make heavy use of smart pointers such as `std::unique_ptr` and `std::shared_ptr`. For some types
we use `boost::intrusive_ptr` instead.
- Generally, bare calls to `new`/`delete` and `malloc`/`free` outside of the implementation of an
RAII type should be red flags and draw extra scrutiny in review. Prefer factory functions like
`std::make_unique` and `std::make_shared`.
- Use `ScopeGuard` or `ON_BLOCK_EXIT` to protect other resources that must be released (e.g.
`fopen`/`fclose` pairs), or perform some other action when leaving scope. It is often a good idea
to put "undo X" logic right after the "do X" logic rather than at the bottom of the function to
ensure that the logic stays correct if someone adds an early return or throws. Or, write an object
to do this for you via its constructor and destructor.
### The `WithLock` Convention
It is common practice in our codebase for a larger "business logic" class to have an obvious primary
mutex member. These tend to have some private functions that require that this mutex be held. These
functions often take a `WithLock` as the first parameter to document the contract and provide some
checking of the callers. The parameter should usually be unnamed. This is a technical check that
forces callers to present a lock-holding resource handle (e.g. `unique_lock`) to call the function.
See [with_lock.h](../src/mongo/util/concurrency/with_lock.h).
## Files (Physical Design)
### Components
A component is a grouping of classes, entities, and functions that is built as a single packaged
unit. There are 1 or more components in a library. A component should represent a grouping of
functionality and interrelated classes and functions that work together.
A component normally consists of a `.h`, a `.cpp`, and a `_test.cpp` file. Source filenames use
lowercase words separated by underscores (i.e. snake_case).
In uncommon cases, there are other files in the component for technical or internal organizational
reasons. These might be a `foo_internal.h` auxiliary header, or a `foo_test_part4.cpp` test
fragment, but these extra files are not meant to serve as its main interface or present its main
idea. They're helper details and they should have the component name as a prefix of their file
names.
A component will commonly be dominated by a single dominant class, and for discoverability, it
should therefore use that class name, in snake_case, as its filename. That said, we have no rule
limiting the number of declarations in a file, and it is useful to define related classes together
in a single component.
### Using `#include`
- To make a declaration available, we require inclusion of a header file that provides it. There
should not be any implicit reliance on transitive includes, even if the code compiles. As an
exception to this general rule, `foo.cpp` and `foo_test.cpp` do not need to duplicate the includes
from `foo.h`.
- Do not make forward declarations to avoid an inclusion. It may be tempting to do this as an
optimization, but we don't do it, as there are correctness and modularity risks.
- Do not include headers that are not needed. Do not blindly copy large blocks of include
statements.
- An "umbrella" interface header may provide several related transitive includes, but these umbrella
headers should be documented as such, and they should be provided by the library maintainer. Use
IWYU (include what you use) pragma comments to prevent tools and editors from incorrectly
auto-suggesting the private headers.
In the public header (e.g. `unittest/unittest.h`):
```c++
#include "mongo/unittest/assert.h" // IWYU pragma: export
```
In the private headers (e.g. `unittest/assert.h`):
```c++
// IWYU pragma: private, include "mongo/unittest/unittest.h"
// IWYU pragma: friend "mongo/unittest/.*"
```
- A header should also be "self-contained", and include everything it needs. It must not rely on
other headers having been included above it by its users.
- Use "double quotes" to include headers under `mongo/`, and \<angle brackets\> for headers under
`third_party/`, or for system libraries.
- Always use the forward relative path from `mongo/src/`. "Forward" means to not refer to the parent
directory `../`.
- Don't use `third_party/` as part of include paths. Use `<>` and omit it.
```c++
#include <boost/optional.hpp> // Yes
#include "third_party/boost/optional.hpp" // No: omit "third_party/" and use <>
#include "boost/optional.hpp" // No: use <>
#include "mongo/db/namespace_details.h" // Yes
#include "../db/namespace_details.h" // No: ".." is disallowed
```
### Ordering and Grouping of C++ `#include` Directives
We have a standard order for the include directives at the top of a C++ file. It is automatically
applied by our configuration of clang-format. The purpose of this ordering is to keep the list
organized to aid in visual scanning, and to catch headers that are missing includes.
The include directives are organized into several blocks. Within each block, the include directives
are sorted alphabetically. Follow each block with a blank line.
- Main header
For the `.cpp` and `_test.cpp` files of a component, include the component's `.h` file if
applicable as the first include. This is a safety practice that helps us ensure that a `.h` file
doesn't rely on any preceding inclusions.
- First-party headers
All include directives using `""` and starting with `mongo/`.
E.g. `"mongo/db/db.h"`.
- C++ stdlib headers
Include directives using `<>`, with no `/` or `.` in path.
E.g. `<vector>`, `<cmath>`.
- Unnamespaced headers
Include directives using `<>`, with no `/` in path. Typically these are system C headers ending in
`.h`
E.g. `<unistd.h>`.
- Remaining third-party headers
Include directives using `<>`, with `/` in path.
E.g. `<boost/optional/optional.hpp>`, `<sys/types.h>`.
To summarize, a typical .cpp file "classy.cpp" might have up to 5 sorted blocks of include
directives:
```c++
/** (Copyright notice would appear at the top, then...) */
#include "mongo/db/classy.h"
#include "mongo/db/db.h"
#include "mongo/db/namespace_details.h"
#include "mongo/util/concurrency/qlock.h"
#include <cstdio>
#include <string>
#include <unistd.h>
#include <boost/thread/thread.hpp>
```
Any headers that are conditionally included under the control of `#if` directives (if technically
possible) will appear after these blocks.
Clang-format will not reorder includes across anything other than a blank line or other includes. In
the rare case where some header must be included before or after all other headers, you can use a
comment line to separate it from other includes like:
```cpp
#include <last/normal/header.h>
// This header must be after all others:
#include <a/weird/header.h>
```
If you see a comment line in old code that is unintentionally preventing proper header ordering, you
are encouraged to clean that up when adding or removing includes.
### For `js` Files (JavaScript only)
- Disable formatting for
[template literals](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals)
```js
// clang-format off
newCode = `load("${overridesFile}"); (${jsCode})();`;
// clang-format on
```
### Copyright Notices
- All new C++ files added to the MongoDB code base that will be upstreamed for public consumption
(such as anything upstreamed to `mongodb/mongo`) should use the following copyright notice and
SSPL license language, substituting the current year for `YYYY` as appropriate:
```c++
/**
* Copyright (C) YYYY-present MongoDB, Inc.
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the Server Side Public License, version 1,
* as published by MongoDB, Inc.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* Server Side Public License for more details.
*
* You should have received a copy of the Server Side Public License
* along with this program. If not, see
* <http://www.mongodb.com/licensing/server-side-public-license>.
*
* As a special exception, the copyright holders give permission to link the
* code of portions of this program with the OpenSSL library under certain
* conditions as described in each individual source file and distribute
* linked combinations including the program with the OpenSSL library. You
* must comply with the Server Side Public License in all respects for
* all of the code used other than as permitted herein. If you modify file(s)
* with this exception, you may extend this exception to your version of the
* file(s), but you are not obligated to do so. If you do not wish to do so,
* delete this exception statement from your version. If you delete this
* exception statement from all source files in the program, then also delete
* it in the license file.
*/
```
- Enterprise source code is not SSPL, and must bear a shorter copyright notice:
```c++
/**
* Copyright (C) YYYY-present MongoDB, Inc.
*/
```
## Basic Formatting Conventions in C++ Code
There are several matters of file formatting expected in source files, and we enforce these when we
can. If you use our recommended
[config](https://github.com/mongodb/mongo/blob/master/.vscode_defaults/linux-virtual-workstation.code-workspace)
for VSCode, much of this will be handled automatically for you.
### Whitespace
- Use spaces, no TAB characters.
- 4 spaces per indentation.
- Limit lines to 100 columns.
- Use Posix text format for source files. All lines (including the final line) end with a LF (ASCII
"line feed" aka `\n`) character. We don't use the Windows CRLF (`\r\n`) line endings in source
files.
In VS Code, `files.eol` should be set to "\n", and `files.insertFinalNewline` set to true to help
with this. A Git config option on Windows can convert line endings automatically
(`core.autocrlf`).
### Braces
Our braces style is that the opening brace appears at the end of the line. We do not open a new line
just for the opening brace that is part of a control flow structure (`if`, `while`, etc). Braces are
optional for sufficiently simple statements.
```c++
if (condition)
doStuff();
if (condition) {
doStuff();
}
while (condition)
doStuff();
while (condition) {
doStuff();
}
do {
doStuff();
} while (condition);
```
### ESLint (JavaScript only)
All JS files must be linted by ESLint before they are formatted by clang-format.
We use [ESLint](http://eslint.org/) to lint JS code. ESLint is a JS linting tool that uses the
config file located at `.eslintrc.yml`, in the root of the mongo repository, to control the linting
of the JS code.
[Plugins](http://eslint.org/docs/user-guide/integrations) are available for most editors that will
automatically run ESLint on file save. It is recommended to use one of these plugins.
Use the wrapper script `buildscripts/eslint.py` to check that the JS code is linted correctly as
well as to fix linting errors in the code. This wrapper selects the appropriate version of eslint to
be used.
```sh
python buildscripts/eslint.py lint # lint js code
python buildscripts/eslint.py fix # auto-fix js code
```
### Clang-Format
All code changes must be formatted by [clang-format](http://clang.llvm.org/docs/ClangFormat.html)
before they are checked in. Use `bazel run format` to reformat C++ and JS code. Clang-format is a
C/C++ & JS code formatting tool that uses the config files located at `src/mongo/.clang-format` and
`jstests/.clang-format` to control the format of the code. The version and configuration of
clang-format is selected by `bazel run format`.
Plugins are available for most editors that will automatically run clang-format on file save.
Clang-format is essential, but we should not let it create unreadable code. There are some ways to
keep it from producing a mess:
- It will not join a line that ends in a (potentially empty) `//` comment.
- It also recognizes comma-terminated lists as significant hints.
- As a last resort, it honors `clang-format off` and `clang-format on` in comments. This should only
be used where it is really important, since it may result in indentation drift with the
surrounding code as we upgrade clang-format or change settings.
```c++
void clangFormatExamples() {
// Trailing comma prevents joining braces with data.
std::array arr{
123, 234, 456, 567, 678,
};
std::vector<std::vector<int>> vvi{
{
123,
345,
},
{
456,
},
};
// Just one leading EOL comment '//' prevents joining all lines.
b //
.append(x, 123)
.append(y, 234)
.append(z, 345);
}
// Example tabular data that would be harmed by reformatting.
// clang-format off
#define EXPAND_TABLE(X) \
/* (id, val , shortName , logName , parent) */ \
X(kDefault, = 0 , "default" , "-" , kNumLogComponents) \
X(kAccessControl, , "accessControl", "ACCESS" , kDefault) \
X(kAssert, , "assert" , "ASSERT" , kDefault) \
X(kCommand, , "command" , "COMMAND" , kDefault) \
X(kControl, , "control" , "CONTROL" , kDefault) \
X(kExecutor, , "executor" , "EXECUTOR", kDefault) \
X(kGeo, , "geo" , "GEO" , kDefault)
// clang-format on
```
---
# Additional Learning Resources
- [Learn C++](http://learncpp.com), free C++ tutorial.
- CppCon "Back to Basics" track playlist.
[link](https://www.youtube.com/playlist?list=PLHTh1InhhwT4TJaHBVWzvBOYhp27UO7mI)
- "A Tour of C++", Stroustrup. ISBN: 9780133549003
- "Large-Scale C++: Process and Architecture, Volume 1", Lakos. ISBN 9780133927665
- All of Herb Sutter's "Exceptional" series of books.
- All of Alexandrescu books
- All of Scott Meyer's "Effective" books (getting very old but still great)
# References
- [MongoDB C++ Style Guide Proposals](https://docs.google.com/document/d/1nvmEnjw-5DNFIoXPa7WzM1PbOOl1fN19jl1sz9cpzAg)
Roadmap and suggestion box for this document.
- [Server Code Style](https://github.com/mongodb/mongo/wiki/Server-Code-Style) on mongo github wiki
to be replaced by this document.
- [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html) We used to default to
this for all things not explicitly covered by our own guide, but that is no longer the case.
- [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines) Interesting
reading. Diverges significantly at times from our style.
- [cppreference.com](https://cppreference.com) The best C++ reference site
- [C++ SUPER FAQ](https://isocpp.org/faq)
- [Compiler Explorer](https://goldbolt.org) Great for demonstrating C++ ideas on multiple compilers.
- [VSCode workspace file](https://github.com/mongodb/mongo/blob/master/.vscode_defaults/linux-virtual-workstation.code-workspace)
A default configuration for server engineers who use VSCode. It's configured to handle editor
configuration and formatting issues in accordance with this guide.