SERVER-124136 Format markdown via prettier: wrap lines and use width of 100 (#52231)

GitOrigin-RevId: 3305c1e2ee3a6a2c3a5b2b7883b0f491a59ed646
This commit is contained in:
Steve McClure 2026-04-21 14:30:35 -04:00 committed by MongoDB Bot
parent e66373f938
commit 32e8f260de
205 changed files with 12720 additions and 8000 deletions

View File

@ -2,18 +2,24 @@
This folder is for custom pull request templates. Templates are Markdown (\*.md) files. This folder is for custom pull request templates. Templates are Markdown (\*.md) files.
These custom templates can be used for example, by individual teams to have a custom pull request template with team specific testing or documentation instructions. These custom templates can be used for example, by individual teams to have a custom pull request
template with team specific testing or documentation instructions.
Read more in [Github's docs](https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository) Read more in
[Github's docs](https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository)
If you update the default PR template, you also need to update the commit metadata in github branch rulesets. If you update the default PR template, you also need to update the commit metadata in github branch
rulesets.
# How To Use This Folder # How To Use This Folder
To create a custom template, create a new markdown file in this folder. To create a custom template, create a new markdown file in this folder.
Then create a link of the form `https://github.com/mongodb/mongo/compare/main...my-branch?quick_pull=1&template=your_new_template.md` Then create a link of the form
`https://github.com/mongodb/mongo/compare/main...my-branch?quick_pull=1&template=your_new_template.md`
Share that link in your team docs to use for creating PRs. By selecting an unused values for `my-branch` it should show a branch selector when following the link. Share that link in your team docs to use for creating PRs. By selecting an unused values for
`my-branch` it should show a branch selector when following the link.
Read more in [Github's docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/using-query-parameters-to-create-a-pull-request) Read more in
[Github's docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/using-query-parameters-to-create-a-pull-request)

View File

@ -1 +1,2 @@
Anything in this description will be included in the commit message. Replace or delete this text before merging. Add links to testing in the comments of the PR. Anything in this description will be included in the commit message. Replace or delete this text
before merging. Add links to testing in the comments of the PR.

View File

@ -15,6 +15,13 @@
"parser": "yaml", "parser": "yaml",
"tabWidth": 4 "tabWidth": 4
} }
},
{
"files": "*.md",
"options": {
"proseWrap": "always",
"printWidth": 100
}
} }
] ]
} }

View File

@ -49,8 +49,7 @@ You can install compass using the `install_compass` script packaged with MongoDB
$ ./install_compass $ ./install_compass
``` ```
This will download the appropriate MongoDB Compass package for your platform This will download the appropriate MongoDB Compass package for your platform and install it.
and install it.
## Drivers ## Drivers
@ -88,9 +87,9 @@ https://www.mongodb.com/cloud/atlas
## LICENSE ## LICENSE
MongoDB is free and the source is available. Versions released prior to MongoDB is free and the source is available. Versions released prior to October 16, 2018 are
October 16, 2018 are published under the AGPL. All versions released after published under the AGPL. All versions released after October 16, 2018, including patch fixes for
October 16, 2018, including patch fixes for prior versions, are published prior versions, are published under the
under the [Server Side Public License (SSPL) v1](LICENSE-Community.txt). [Server Side Public License (SSPL) v1](LICENSE-Community.txt). See individual files for details
See individual files for details which will specify the license applicable which will specify the license applicable to each file. Files subject to the SSPL will be noted in
to each file. Files subject to the SSPL will be noted in their headers. their headers.

View File

@ -1,10 +1,13 @@
# Building Bazel from Source to target the PPC64LE Architecture # Building Bazel from Source to target the PPC64LE Architecture
Bazel doesn't release to the PPC64LE architecture. To address this, MongoDB maintains our own Bazel build that we perform on our PPC64LE development systems. Bazel doesn't release to the PPC64LE architecture. To address this, MongoDB maintains our own Bazel
build that we perform on our PPC64LE development systems.
# JDK? # JDK?
Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't support PPC64LE. To get around this, an external JDK must be present on both the system compiling the Bazel executable itself as well as the host running Bazel as a build system. Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't
support PPC64LE. To get around this, an external JDK must be present on both the system compiling
the Bazel executable itself as well as the host running Bazel as a build system.
On the MongoDB PPC64LE Evergreen static hosts and dev hosts, the OpenJDK 21 installation exists at: On the MongoDB PPC64LE Evergreen static hosts and dev hosts, the OpenJDK 21 installation exists at:

View File

@ -1,10 +1,13 @@
# Building Bazel from Source to target the S390X Architecture # Building Bazel from Source to target the S390X Architecture
Bazel doesn't release to the S390X architecture. To address this, MongoDB maintains our own Bazel build that we perform on our S390X development systems. Bazel doesn't release to the S390X architecture. To address this, MongoDB maintains our own Bazel
build that we perform on our S390X development systems.
# JDK? # JDK?
Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't support S390X. To get around this, an external JDK must be present on both the system compiling the Bazel executable itself as well as the host running Bazel as a build system. Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't
support S390X. To get around this, an external JDK must be present on both the system compiling the
Bazel executable itself as well as the host running Bazel as a build system.
On the MongoDB S390X Evergreen static hosts and dev hosts, the OpenJDK 21 installation exists at: On the MongoDB S390X Evergreen static hosts and dev hosts, the OpenJDK 21 installation exists at:

View File

@ -1,3 +1,4 @@
# MongoDB Bazel Best Practices # MongoDB Bazel Best Practices
Please refer to https://bazel.build/configure/best-practices as a baseline. This doc will be updated with MongoDB-specific best practices as they're defined. Please refer to https://bazel.build/configure/best-practices as a baseline. This doc will be updated
with MongoDB-specific best practices as they're defined.

View File

@ -4,7 +4,8 @@ This document describes the Server Developer workflow for modifying Bazel build
# Creating a new BUILD.bazel file # Creating a new BUILD.bazel file
A build target is defined in the directory where its source code exists. To create a target that compiles **src/mongo/hello_world.cpp**, you would create **src/mongo/BUILD.bazel**. A build target is defined in the directory where its source code exists. To create a target that
compiles **src/mongo/hello_world.cpp**, you would create **src/mongo/BUILD.bazel**.
src/mongo/BUILD.bazel would contain: src/mongo/BUILD.bazel would contain:
@ -15,7 +16,8 @@ src/mongo/BUILD.bazel would contain:
], ],
} }
Once you've obtained bazel by running **python buildscripts/install_bazel.py**, you can then build this target via "bazel build": Once you've obtained bazel by running **python buildscripts/install_bazel.py**, you can then build
this target via "bazel build":
bazel build //src/mongo:hello_world bazel build //src/mongo:hello_world
@ -23,13 +25,17 @@ Or run this target via "bazel run":
bazel run //src/mongo:hello_world bazel run //src/mongo:hello_world
The full target name is a combination between the directory of the BUILD.bazel file and the target name: The full target name is a combination between the directory of the BUILD.bazel file and the target
name:
//{BUILD.bazel dir}:{targetname} //{BUILD.bazel dir}:{targetname}
# Adding a New Header / Source File # Adding a New Header / Source File
Bazel makes use of static analysis wherever possible to improve execution and querying speed. As part of this, source and header files must not be declared dynamically (ex. glob, wildcard, etc). Instead, you'll need to manually add a reference to each header or source file you add into your build target. Bazel makes use of static analysis wherever possible to improve execution and querying speed. As
part of this, source and header files must not be declared dynamically (ex. glob, wildcard, etc).
Instead, you'll need to manually add a reference to each header or source file you add into your
build target.
mongo_cc_binary( mongo_cc_binary(
name = "hello_world", name = "hello_world",
@ -44,13 +50,15 @@ Bazel makes use of static analysis wherever possible to improve execution and qu
## Adding a New Library ## Adding a New Library
The DevProd Build Team created MongoDB-specific macros for the different types of build targets you may want to specify. These include: The DevProd Build Team created MongoDB-specific macros for the different types of build targets you
may want to specify. These include:
- mongo_cc_binary - mongo_cc_binary
- mongo_cc_library - mongo_cc_library
- idl_generator - idl_generator
Creating a new library is similar to the steps above for creating a new binary. A new **mongo_cc_library** definition would be created in the BUILD.bazel file. Creating a new library is similar to the steps above for creating a new binary. A new
**mongo_cc_library** definition would be created in the BUILD.bazel file.
mongo_cc_library( mongo_cc_library(
name = "new_library", name = "new_library",
@ -61,7 +69,9 @@ Creating a new library is similar to the steps above for creating a new binary.
## Declaring Dependencies ## Declaring Dependencies
If a library or binary depends on another library, this must be declared in the **deps** section of the target. The syntax for referring to the library is the same syntax used in the bazel build/run command. If a library or binary depends on another library, this must be declared in the **deps** section of
the target. The syntax for referring to the library is the same syntax used in the bazel build/run
command.
mongo_cc_library( mongo_cc_library(
name = "new_library", name = "new_library",
@ -82,16 +92,20 @@ If a library or binary depends on another library, this must be declared in the
## Running clang-tidy via Bazel ## Running clang-tidy via Bazel
Note: This feature is still in development; see https://jira.mongodb.org/browse/SERVER-80396 for details) Note: This feature is still in development; see https://jira.mongodb.org/browse/SERVER-80396 for
details)
To run clang-tidy via Bazel, do the following: To run clang-tidy via Bazel, do the following:
1. To analyze all code, run `bazel build --config=clang-tidy src/...` 1. To analyze all code, run `bazel build --config=clang-tidy src/...`
2. To analyze a single target (e.g.: `environment_buffer`), run the following command (note that `_with_debug` suffix on the target): `bazel build --config=clang-tidy src/mongo/db/commands:environment_buffer_with_debug` 2. To analyze a single target (e.g.: `environment_buffer`), run the following command (note that
`_with_debug` suffix on the target):
`bazel build --config=clang-tidy src/mongo/db/commands:environment_buffer_with_debug`
Testing notes: Testing notes:
- If you want to test whether clang-tidy is in fact finding bugs, you can inject the following code into a `cpp` file to generate a `bugprone-incorrect-roundings` warning: - If you want to test whether clang-tidy is in fact finding bugs, you can inject the following code
into a `cpp` file to generate a `bugprone-incorrect-roundings` warning:
``` ```
const double f = 1.0; const double f = 1.0;
@ -105,12 +119,24 @@ const int foo = (int)(f + 0.5);
Follow this loop to figure out where the header needs to be added Follow this loop to figure out where the header needs to be added
1. Build directly with bazel to speed up the loop: `bazel build //src/...` 1. Build directly with bazel to speed up the loop: `bazel build //src/...`
2. This will fail on the first missing header dependency, search the bazel build files for the library the header is defined on. Currently there are cases where headers are incorrectly located so you'll need to use your best judgement. If the header exists on some library, add that library as a dep, for example `scoped_timer.h` is part of `scope_timer` library so add `//src/mongo/db/exec:scoped_timer` to deps field (this will take care of `scoped_timer.h` transitive dependencies). If not add the header directly to the hdrs field of the library that's failing to compile. 2. This will fail on the first missing header dependency, search the bazel build files for the
library the header is defined on. Currently there are cases where headers are incorrectly located
so you'll need to use your best judgement. If the header exists on some library, add that library
as a dep, for example `scoped_timer.h` is part of `scope_timer` library so add
`//src/mongo/db/exec:scoped_timer` to deps field (this will take care of `scoped_timer.h`
transitive dependencies). If not add the header directly to the hdrs field of the library that's
failing to compile.
3. Build directly with bazel `bazel build //src/...` 3. Build directly with bazel `bazel build //src/...`
4. If there is a cycle remove the dependency from Step #2, add the header as direct dependency to the hdrs field, and then start back at Step #1 4. If there is a cycle remove the dependency from Step #2, add the header as direct dependency to
the hdrs field, and then start back at Step #1
### The header I want to add is referenced in dozens or more locations, and adding it to the proper location requires a large refactor that is blocking critical work, what should I do? ### The header I want to add is referenced in dozens or more locations, and adding it to the proper location requires a large refactor that is blocking critical work, what should I do?
If you've put in a significant amount of work to try to get a header added and have found to get it added to the right place (usually alongside the associated .cpp file, having all dependents add that library as a dep) will take a significant refactor, create a SERVER ticket explaining the problem, solution, and complexity required to resolve it. Then, open up src/mongo/BUILD.bazel and add the header to "core_headers" file group referencing your ticket in a TODO comment. If you've put in a significant amount of work to try to get a header added and have found to get it
added to the right place (usually alongside the associated .cpp file, having all dependents add that
library as a dep) will take a significant refactor, create a SERVER ticket explaining the problem,
solution, and complexity required to resolve it. Then, open up src/mongo/BUILD.bazel and add the
header to "core_headers" file group referencing your ticket in a TODO comment.
This is very much a last resort and should only be done if the refactor will take a very significant amount of time and is blocking other work. This is very much a last resort and should only be done if the refactor will take a very significant
amount of time and is blocking other work.

View File

@ -1,7 +1,9 @@
# EngFlow Certification Installation # EngFlow Certification Installation
MongoDB uses EngFlow to enable remote execution with Bazel. This dramatically speeds up the build process, but is only available to internal MongoDB employees. MongoDB uses EngFlow to enable remote execution with Bazel. This dramatically speeds up the build
process, but is only available to internal MongoDB employees.
Bazel uses a wrapper script to check the credentials on each invocation, if for some reason thats not working, you can also manually perform this process with this command alternatively: Bazel uses a wrapper script to check the credentials on each invocation, if for some reason thats
not working, you can also manually perform this process with this command alternatively:
python buildscripts/engflow_auth.py python buildscripts/engflow_auth.py

View File

@ -1,8 +1,12 @@
# Header Relocation and Cycle Resolution # Header Relocation and Cycle Resolution
1. Locate all the targets that reference the header file in BUILD.bazel files. 1. Locate all the targets that reference the header file in BUILD.bazel files.
2. Find an ideal target to declare the header under. This is usually under the target that features the .cpp file of the same name. Otherwise, the header can be placed in its own library. 2. Find an ideal target to declare the header under. This is usually under the target that features
3. Ensure that all the targets that need this header can depend on the target the header was moved to. the .cpp file of the same name. Otherwise, the header can be placed in its own library.
4. Run `bazel build //src/...` to check for build failures (look for failures related to dependency cycles). 3. Ensure that all the targets that need this header can depend on the target the header was moved
5. If the build fails because of a dependency cycle, you may need to split up the dependent library or relocate the header. to.
4. Run `bazel build //src/...` to check for build failures (look for failures related to dependency
cycles).
5. If the build fails because of a dependency cycle, you may need to split up the dependent library
or relocate the header.
6. Once the build succeeds, please create a PR and include `devprod-build` for review. 6. Once the build succeeds, please create a PR and include `devprod-build` for review.

View File

@ -1,8 +1,7 @@
# Remote execution images # Remote execution images
The Dockerfiles for remote execution images are autogenerated to pin all The Dockerfiles for remote execution images are autogenerated to pin all versions and allow for
versions and allow for updates at the same time. To repin the image hashes and updates at the same time. To repin the image hashes and package versions:
package versions:
```bash ```bash
# With Bazel # With Bazel

View File

@ -1,16 +1,22 @@
# About # About
This documents some useful tools, concepts, and debugging strategies for bazel toolchains. This documents some useful tools, concepts, and debugging strategies for bazel toolchains. This
This information was gathered while developing the WASI SDK toolchain. information was gathered while developing the WASI SDK toolchain.
# Concepts # Concepts
[Toolchain](https://bazel.build/extending/toolchains#debugging-toolchains) and [Platform](https://bazel.build/extending/platforms) are the core relevant concepts. [Toolchain](https://bazel.build/extending/toolchains#debugging-toolchains) and
Toolchains define the tools used to compile, and the platform defines either the execution platform (for the compilation/compiler tools) and target platform (for the binary). [Platform](https://bazel.build/extending/platforms) are the core relevant concepts. Toolchains
Bazel tries to search for a toolchain based on these constraints. define the tools used to compile, and the platform defines either the execution platform (for the
compilation/compiler tools) and target platform (for the binary). Bazel tries to search for a
toolchain based on these constraints.
We also made use of [transitions](https://bazel.build/rules/lib/builtins/transition) which allow bazel to reconfigure itself before building a target to avoid passing irrelevant or incorrect compiler flags (e.g. WASI SDK doesn't support shared objects). We also made use of [transitions](https://bazel.build/rules/lib/builtins/transition) which allow
Similarly, we used [actions](https://bazel.build/docs/cc-toolchain-config-reference#using-action-config) instead of the tool paths attribute because of, [possibly historical, lack of support for remote resources in tool paths](https://stackoverflow.com/questions/73504780/bazel-reference-binaries-from-packages-in-custom-toolchain-definition/73505313#73505313). bazel to reconfigure itself before building a target to avoid passing irrelevant or incorrect
compiler flags (e.g. WASI SDK doesn't support shared objects). Similarly, we used
[actions](https://bazel.build/docs/cc-toolchain-config-reference#using-action-config) instead of the
tool paths attribute because of,
[possibly historical, lack of support for remote resources in tool paths](https://stackoverflow.com/questions/73504780/bazel-reference-binaries-from-packages-in-custom-toolchain-definition/73505313#73505313).
# Debugging tools # Debugging tools
@ -20,13 +26,15 @@ Similarly, we used [actions](https://bazel.build/docs/cc-toolchain-config-refere
bazel ... --toolchain_resolution_debug=.* ... bazel ... --toolchain_resolution_debug=.* ...
``` ```
The above flag can be used to debug toolchain resolution as bazel tries to automatically satisfy constraints. The above flag can be used to debug toolchain resolution as bazel tries to automatically satisfy
constraints.
## Debugging Remote Resources ## Debugging Remote Resources
Toolchains may be remotely fetched, but the directory structure of the build environment after these remote resources are fetched may not be clear. Toolchains may be remotely fetched, but the directory structure of the build environment after these
`bazel info` can be used to find the bazel directory and inspect it `bazel info output_base`. remote resources are fetched may not be clear. `bazel info` can be used to find the bazel directory
Note: this may be different depending on your configuration and level of sandboxing. and inspect it `bazel info output_base`. Note: this may be different depending on your configuration
and level of sandboxing.
This is particularly useful when used in combination with the `find` command as shown below. This is particularly useful when used in combination with the `find` command as shown below.
@ -42,10 +50,11 @@ Note: this command is directory dependent because output_base is per bazel insta
bazel ... -s ... bazel ... -s ...
``` ```
This will show verbose output such as cd actions and compiler/linker invocations. This will show verbose output such as cd actions and compiler/linker invocations. Note: bazel may
Note: bazel may recast paths relative to the exec directory. recast paths relative to the exec directory.
## Debugging on Engflow ## Debugging on Engflow
Engflow has a lot of helpful views showing remote execution stats and the remote file structure. Engflow has a lot of helpful views showing remote execution stats and the remote file structure. We
We don't intent to duplicate their documentation but be careful as some of their data (particularly remotely executed actions) may not be accurate immediately after execution. don't intent to duplicate their documentation but be careful as some of their data (particularly
remotely executed actions) may not be accurate immediately after execution.

View File

@ -38,18 +38,21 @@ resmoke_suite_test(
### Test Sharding ### Test Sharding
Test sharding allows you to split a large test suite across multiple parallel test executions, significantly reducing total test time. When `shard_count` is specified, Bazel will: Test sharding allows you to split a large test suite across multiple parallel test executions,
significantly reducing total test time. When `shard_count` is specified, Bazel will:
1. Run the test target multiple times in parallel (up to the specified shard count) 1. Run the test target multiple times in parallel (up to the specified shard count)
2. Each shard receives a unique shard index (0 to N-1) 2. Each shard receives a unique shard index (0 to N-1)
3. The resmoke runner uses these values to determine which subset of tests to run in each shard 3. The resmoke runner uses these values to determine which subset of tests to run in each shard
4. Each shard produces its own test output and logs 4. Each shard produces its own test output and logs
Note: sharding is an alternative to the resmoke `--jobs` flag, which should not be used with `resmoke_suite_test`. Note: sharding is an alternative to the resmoke `--jobs` flag, which should not be used with
`resmoke_suite_test`.
### Test Logs and Output Directory ### Test Logs and Output Directory
Bazel creates a dedicated output directory for each test run under the `bazel-testlogs` symlink in your workspace root. Bazel creates a dedicated output directory for each test run under the `bazel-testlogs` symlink in
your workspace root.
For a test target `//jstests/suites/query-execution:core`, the outputs are like: For a test target `//jstests/suites/query-execution:core`, the outputs are like:
@ -78,7 +81,8 @@ bazel test //jstests/suites/query-execution:core --test_sharding_strategy=disabl
#### Run with additional resmoke flags: #### Run with additional resmoke flags:
Any `--test_arg` in the bazel command will be propagated as a flag to resmoke.py. To modify the resmoke invocation with any of resmoke's flags, add them as `--test_arg`s. Any `--test_arg` in the bazel command will be propagated as a flag to resmoke.py. To modify the
resmoke invocation with any of resmoke's flags, add them as `--test_arg`s.
``` ```
# Runs all tests from the core suite with timeseries in their name, twice, with all feature flags enabled. # Runs all tests from the core suite with timeseries in their name, twice, with all feature flags enabled.

View File

@ -11,7 +11,8 @@ To use the WASI SDK apply the `wasi_compatible` with a select statement:
}) })
``` ```
If your target is defined in terms of a traditional bazel C/C++ target you can use the WASI transition in order to ensure the bazel options are WASI compatible. If your target is defined in terms of a traditional bazel C/C++ target you can use the WASI
transition in order to ensure the bazel options are WASI compatible.
```python ```python
load("//bazel/toolchains/cc/wasm/toolchain:with_wasi_config.bzl", "with_wasi_config") load("//bazel/toolchains/cc/wasm/toolchain:with_wasi_config.bzl", "with_wasi_config")

View File

@ -17,8 +17,8 @@ For background on Antithesis, the base images, and the broader CI pipeline, see
Scripts must be executable and live directly under the template directory (not in subdirectories). Scripts must be executable and live directly under the template directory (not in subdirectories).
The prefix of the filename determines scheduling behavior. Any file that doesn't match a known The prefix of the filename determines scheduling behavior. Any file that doesn't match a known
prefix — including files in subdirectories or files prefixed with `helper_` — is ignored by prefix — including files in subdirectories or files prefixed with `helper_` — is ignored by Test
Test Composer and can be used for shared logic. Composer and can be used for shared logic.
### Driver commands ### Driver commands
@ -27,18 +27,18 @@ Run during fault injection periods. At least one driver or `anytime_*` command i
- **`parallel_driver_<name>`** — runs concurrently with other parallel drivers, including itself. - **`parallel_driver_<name>`** — runs concurrently with other parallel drivers, including itself.
Use for continuous client operations, parallel workloads, and availability checks under faults. Use for continuous client operations, parallel workloads, and availability checks under faults.
- **`singleton_driver_<name>`** — runs as the only active driver in a history branch. - **`singleton_driver_<name>`** — runs as the only active driver in a history branch. Use for
Use for porting existing integration tests or workloads that shouldn't overlap with other drivers. porting existing integration tests or workloads that shouldn't overlap with other drivers.
- **`serial_driver_<name>`** — runs only when no other driver commands are active. - **`serial_driver_<name>`** — runs only when no other driver commands are active. Use for
Use for validation steps and operations that require quiescence. validation steps and operations that require quiescence.
### Quiescent commands ### Quiescent commands
Run in the absence of faults. Run in the absence of faults.
- **`first_<name>`** — optional one-time setup that runs once before any driver commands start. - **`first_<name>`** — optional one-time setup that runs once before any driver commands start. Use
Use for data initialization, schema setup, and bootstrapping. for data initialization, schema setup, and bootstrapping.
- **`eventually_<name>`** — runs after driver commands start; halts all drivers and stops faults, - **`eventually_<name>`** — runs after driver commands start; halts all drivers and stops faults,
creating a new history branch. Use for testing eventual consistency and post-recovery state. creating a new history branch. Use for testing eventual consistency and post-recovery state.
@ -57,8 +57,8 @@ Run in the absence of faults.
### `basic_js_commands` ### `basic_js_commands`
Parallel JavaScript workload against a single `mongod`. All commands share retry logic defined in Parallel JavaScript workload against a single `mongod`. All commands share retry logic defined in
[`js/commands.js`](basic_js_commands/js/commands.js) that handles transient network errors, [`js/commands.js`](basic_js_commands/js/commands.js) that handles transient network errors, server
server selection failures, and retryable write errors. selection failures, and retryable write errors.
| Script | Function | Notes | | Script | Function | Notes |
| ------------------------------------------------ | ----------------------------- | --------------------------------------------------------------------------- | | ------------------------------------------------ | ----------------------------- | --------------------------------------------------------------------------- |
@ -86,13 +86,13 @@ infrastructure for Test Composer. Both scripts use
## Best practices ## Best practices
- **Retry logic** — always handle transient network errors and server selection failures. - **Retry logic** — always handle transient network errors and server selection failures. See
See [`commands.js`](basic_js_commands/js/commands.js) for a reusable retry wrapper. [`commands.js`](basic_js_commands/js/commands.js) for a reusable retry wrapper.
- **Randomize** — the more variation you introduce, the more state space Antithesis can explore. - **Randomize** — the more variation you introduce, the more state space Antithesis can explore.
Antithesis controls and can reproduce the random seed, so interesting paths can be re-explored. Antithesis controls and can reproduce the random seed, so interesting paths can be re-explored.
- **Idempotency** — design scripts to tolerate being killed and restarted at any point. - **Idempotency** — design scripts to tolerate being killed and restarted at any point.
- **Start simple** — begin with a `singleton_driver_*` to port an existing test, then evolve - **Start simple** — begin with a `singleton_driver_*` to port an existing test, then evolve toward
toward parallel drivers as confidence grows. parallel drivers as confidence grows.
## Running locally ## Running locally
@ -126,8 +126,8 @@ docker compose -f docker_compose/<suite_name>/docker-compose.yml \
/opt/antithesis/test/v1/basic_js_commands/parallel_driver_mongod_aggregate.sh /opt/antithesis/test/v1/basic_js_commands/parallel_driver_mongod_aggregate.sh
``` ```
The `/scripts/print_connection_string.sh` helper used by each script is generated automatically The `/scripts/print_connection_string.sh` helper used by each script is generated automatically from
from the resmoke fixture's connection string and placed in the config image during the build step. the resmoke fixture's connection string and placed in the config image during the build step.
## Adding a new template ## Adding a new template

View File

@ -4,13 +4,19 @@ This directory is a bazel rule we use to ship common code between bazel repos
# Using in your repo # Using in your repo
1. Look at the latest version in [this](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml) file 1. Look at the latest version in
[this](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml)
file
2. Get the sha of the latest release at https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz.sha256 2. Get the sha of the latest release at
https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz.sha256
3. Get the link to the latest version at https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz 3. Get the link to the latest version at
https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz
4. Add this as a http archive to your repo and implement the dependencies listed in the [WORKSPACE](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/WORKSPACE.bazel) file. It will look something like this 4. Add this as a http archive to your repo and implement the dependencies listed in the
[WORKSPACE](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/WORKSPACE.bazel)
file. It will look something like this
``` ```
# Poetry rules for managing Python dependencies # Poetry rules for managing Python dependencies
@ -50,7 +56,8 @@ poetry(
) )
``` ```
5. Use the rule however you see fit! For example to add `bazel run codeowners` to your repo you can add the following to your root `BUILD.bazel` file 5. Use the rule however you see fit! For example to add `bazel run codeowners` to your repo you can
add the following to your root `BUILD.bazel` file
``` ```
alias( alias(
@ -61,5 +68,7 @@ alias(
# Deploying # Deploying
When you are ready for a new version to be released, bump the version in the [pyproject.toml](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml) file. When you are ready for a new version to be released, bump the version in the
This will be deployed the next time the `package_bazel_rules_mongo` task runs (nightly). You can schedule this earlier in the waterfall when your pr is merged if you want it quicker. [pyproject.toml](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml)
file. This will be deployed the next time the `package_bazel_rules_mongo` task runs (nightly). You
can schedule this earlier in the waterfall when your pr is merged if you want it quicker.

View File

@ -3,4 +3,5 @@ This is cltcache.py.txt taken from
CLTCACHE_URL = "https://raw.githubusercontent.com/freedick/cltcache/1.2.2/src/cltcache/cltcache.py" CLTCACHE_URL = "https://raw.githubusercontent.com/freedick/cltcache/1.2.2/src/cltcache/cltcache.py"
CLTCACHE_SHA256 = "30d9bf6d3615eab1826d5e24aea54873de034014c1e77506c9ff983e1e858b3c" CLTCACHE_SHA256 = "30d9bf6d3615eab1826d5e24aea54873de034014c1e77506c9ff983e1e858b3c"
A small simple clang tidy cacher used with vscode which does not use bazel to run clang tidy. The extension is used to avoid linting and changing the file from its source. A small simple clang tidy cacher used with vscode which does not use bazel to run clang tidy. The
extension is used to avoid linting and changing the file from its source.

View File

@ -18,7 +18,8 @@ source python3-venv/bin/activate
(python3-venv) bazel build --config=opt install-devcore (python3-venv) bazel build --config=opt install-devcore
``` ```
3. Run mongod instance (only for CBR calibration, because join_start.py manages mongod's lifecycle itself): 3. Run mongod instance (only for CBR calibration, because join_start.py manages mongod's lifecycle
itself):
```sh ```sh
(python3-venv) bazel-bin/install-mongod/bin/mongod --setParameter internalMeasureQueryExecutionTimeInNanoseconds=true (python3-venv) bazel-bin/install-mongod/bin/mongod --setParameter internalMeasureQueryExecutionTimeInNanoseconds=true
@ -74,16 +75,21 @@ source cm/bin/activate
```sh ```sh
(cm) python join_start.py (cm) python join_start.py
``` ```
To skip the constant calibration (warm scan, CPU, sequential I/O, random I/O) and only run the join algorithm comparison: To skip the constant calibration (warm scan, CPU, sequential I/O, random I/O) and only run the
join algorithm comparison:
```sh ```sh
(cm) python join_start.py --join-only (cm) python join_start.py --join-only
``` ```
To iterate quickly on cost model changes, reuse pre-recorded execution times from a previous full run. This skips actual query execution, only running `queryPlanner` explains to collect fresh cost estimates: To iterate quickly on cost model changes, reuse pre-recorded execution times from a previous full
run. This skips actual query execution, only running `queryPlanner` explains to collect fresh cost
estimates:
```sh ```sh
(cm) python join_start.py --execution-times join_output/join_times_in-cache.csv join_output/join_times_exceeds-cache.csv (cm) python join_start.py --execution-times join_output/join_times_in-cache.csv join_output/join_times_exceeds-cache.csv
``` ```
**Note:** For CBR calibration, the first time it will take a while since it has to generate the data. Afterwards, as long as you aren't modifying the collections, you can comment out `await generator.populate_collections()` in `start.py` - this will make it a lot faster. **Note:** For CBR calibration, the first time it will take a while since it has to generate the
data. Afterwards, as long as you aren't modifying the collections, you can comment out
`await generator.populate_collections()` in `start.py` - this will make it a lot faster.
8. When done, deactivate the environment: 8. When done, deactivate the environment:

View File

@ -1 +1,2 @@
> Content moved to [buildscripts/resmokeconfig/suites/README.md](../../buildscripts/resmokeconfig/suites/README.md). > Content moved to
> [buildscripts/resmokeconfig/suites/README.md](../../buildscripts/resmokeconfig/suites/README.md).

View File

@ -1,13 +1,14 @@
# mongo gpg builds # mongo gpg builds
This directory contains a script to produce **portable `gpg` binaries** for all our supported linux platforms: This directory contains a script to produce **portable `gpg` binaries** for all our supported linux
platforms:
- **Linux** (`manylinux2014` glibc 2.17 baseline): `x86_64`, `aarch64`, `s390x`, `ppc64le` - **Linux** (`manylinux2014` glibc 2.17 baseline): `x86_64`, `aarch64`, `s390x`, `ppc64le`
In particular, it builds gnupg-2.5.16 from source. In particular, it builds gnupg-2.5.16 from source.
This script is used to generate the binaries that we use bring into bazel as a dependency to sign test extensions. This script is used to generate the binaries that we use bring into bazel as a dependency to sign
All artifacts are placed in the `dist/` directory. test extensions. All artifacts are placed in the `dist/` directory.
--- ---
@ -61,8 +62,8 @@ ARCH=ppc64le PLATFORM=linux/ppc64le ./build_gpg_manylinux.sh
## 📜 License & Attribution ## 📜 License & Attribution
These scripts build **gpg** and its required dependencies from sources originally obtained from: These scripts build **gpg** and its required dependencies from sources originally obtained from: 👉
👉 <https://www.gnupg.org/ftp/gcrypt/gnupg/> and <https://gnupg.org/download/index.html> <https://www.gnupg.org/ftp/gcrypt/gnupg/> and <https://gnupg.org/download/index.html>
The exact sources can be obtained at the following URLs: The exact sources can be obtained at the following URLs:

View File

@ -1,12 +1,14 @@
# mongo rapidyaml wheel builds # mongo rapidyaml wheel builds
This directory contains scripts to produce versioned `rapidyaml` wheels that can be uploaded to S3 and consumed directly instead of building from the git dependency in `pyproject.toml`. This directory contains scripts to produce versioned `rapidyaml` wheels that can be uploaded to S3
and consumed directly instead of building from the git dependency in `pyproject.toml`.
The scripts default to the `rapidyaml` commit currently pinned in `pyproject.toml`: The scripts default to the `rapidyaml` commit currently pinned in `pyproject.toml`:
- `a5d485fd44719e1c03e059177fc1f695fc462b66` - `a5d485fd44719e1c03e059177fc1f695fc462b66`
They also require `RAPIDYAML_VERSION` to be set explicitly. The MongoDB fork does not currently publish git tags, so `setuptools-scm` cannot infer a stable release version on its own. They also require `RAPIDYAML_VERSION` to be set explicitly. The MongoDB fork does not currently
publish git tags, so `setuptools-scm` cannot infer a stable release version on its own.
All artifacts are written to `dist/`. All artifacts are written to `dist/`.
@ -47,11 +49,14 @@ RAPIDYAML_VERSION=0.9.0.post0 ARCH=ppc64le PLATFORM=linux/ppc64le ./build_rapidy
### macOS ### macOS
Run the script on each target macOS architecture you want to publish. The script intentionally builds for the host arch only, which keeps wheel tags and interpreter usage straightforward. Run the script on each target macOS architecture you want to publish. The script intentionally
builds for the host arch only, which keeps wheel tags and interpreter usage straightforward.
The script creates and uses a temporary virtualenv, so it works with Homebrew-managed Python installations that reject direct `pip install` into the system environment. The script creates and uses a temporary virtualenv, so it works with Homebrew-managed Python
installations that reject direct `pip install` into the system environment.
It also leaves `Python.framework` external during delocation, so the wheel should be built with the same Python distribution family you expect consumers to use. It also leaves `Python.framework` external during delocation, so the wheel should be built with the
same Python distribution family you expect consumers to use.
```bash ```bash
RAPIDYAML_VERSION=0.9.0.post0 PYTHON_BIN=python3.13 ./build_rapidyaml_macos.sh RAPIDYAML_VERSION=0.9.0.post0 PYTHON_BIN=python3.13 ./build_rapidyaml_macos.sh
@ -67,15 +72,19 @@ $env:PYTHON_BIN = "C:\Python313\python.exe"
.\build_rapidyaml_windows_x64.ps1 .\build_rapidyaml_windows_x64.ps1
``` ```
Note: `pyproject.toml` currently excludes `rapidyaml` on Windows, so a Windows wheel is only needed if that marker changes later. Note: `pyproject.toml` currently excludes `rapidyaml` on Windows, so a Windows wheel is only needed
if that marker changes later.
## Build Behavior ## Build Behavior
- The Linux script builds inside the appropriate `manylinux2014` image and runs `auditwheel repair`. - The Linux script builds inside the appropriate `manylinux2014` image and runs `auditwheel repair`.
- The macOS script creates a temporary virtualenv, installs its build tooling there, and runs `delocate-wheel` while excluding `Python.framework` from bundling. - The macOS script creates a temporary virtualenv, installs its build tooling there, and runs
`delocate-wheel` while excluding `Python.framework` from bundling.
- The Windows script runs `delvewheel repair` after building. - The Windows script runs `delvewheel repair` after building.
- Every script clones the `mongodb-forks/rapidyaml` repo, checks out the requested ref, initializes submodules, builds a wheel, and performs a simple `import ryml` smoke test. - Every script clones the `mongodb-forks/rapidyaml` repo, checks out the requested ref, initializes
- Linux defaults to `cp313-cp313`, which matches the repo's current Python version. Override that when you need a wheel for a different interpreter. submodules, builds a wheel, and performs a simple `import ryml` smoke test.
- Linux defaults to `cp313-cp313`, which matches the repo's current Python version. Override that
when you need a wheel for a different interpreter.
## Environment Variables ## Environment Variables
@ -94,7 +103,8 @@ Note: `pyproject.toml` currently excludes `rapidyaml` on Windows, so a Windows w
## Consuming the Wheels ## Consuming the Wheels
Once the wheels are uploaded, you can replace the current git dependency in `pyproject.toml` with URL-based entries scoped by platform markers. Once the wheels are uploaded, you can replace the current git dependency in `pyproject.toml` with
URL-based entries scoped by platform markers.
For example: For example:

View File

@ -1,12 +1,14 @@
# mongo ripgrep builds # mongo ripgrep builds
This directory contains scripts to produce **portable, high-performance `ripgrep` binaries** for all major platforms: This directory contains scripts to produce **portable, high-performance `ripgrep` binaries** for all
major platforms:
- **Linux** (`manylinux2014` glibc 2.17 baseline): `x86_64`, `aarch64`, `s390x`, `ppc64le` - **Linux** (`manylinux2014` glibc 2.17 baseline): `x86_64`, `aarch64`, `s390x`, `ppc64le`
- **macOS** universal2 (`x86_64` + `arm64`) - **macOS** universal2 (`x86_64` + `arm64`)
- **Windows** x86_64 (MSVC) - **Windows** x86_64 (MSVC)
Each build uses **bundled static PCRE2**, **LTO**, and conservative CPU baselines to maximize portability. Each build uses **bundled static PCRE2**, **LTO**, and conservative CPU baselines to maximize
portability.
All artifacts are placed in the `dist/` directory. All artifacts are placed in the `dist/` directory.
--- ---

View File

@ -1,54 +1,79 @@
# Block-on-Red # Block-on-Red
> **TL;DR:** During times of high BF volume, code approvals and merging in 10gen/mongo master will be restricted to only allow changes that help reduce BFs, Bugs, Performance Regressions, and paying down technical debt. > **TL;DR:** During times of high BF volume, code approvals and merging in 10gen/mongo master will
> be restricted to only allow changes that help reduce BFs, Bugs, Performance Regressions, and
> paying down technical debt.
### Motivation ### Motivation
The master branch should remain stable to develop the Server efficiently, and to be within 30 days of releasing at all times. If it becomes too unstable, or "too red," we want to aggressively focus on getting it back into the green. As a side benefit to releasability, a "greener" build should make patch build failures more meaningful. This will also reduce release time stress by having the release time period look and feel more like normal business. The master branch should remain stable to develop the Server efficiently, and to be within 30 days
of releasing at all times. If it becomes too unstable, or "too red," we want to aggressively focus
on getting it back into the green. As a side benefit to releasability, a "greener" build should make
patch build failures more meaningful. This will also reduce release time stress by having the
release time period look and feel more like normal business.
### Strategy ### Strategy
Each team carries a quota (see below for details). When a team exceeds their quota - they enter a "code lockdown". Each team carries a quota (see below for details). When a team exceeds their quota - they enter a
"code lockdown".
- **Team Level**: The intention here is to stop work with a small blast radius in the first instance, and address the releasability risk from that team and their owned code. - **Team Level**: The intention here is to stop work with a small blast radius in the first
- **VP Level**: We roll the quotas up to a VPs entire organization as the next step of "code lockdown". The expectation is that redirecting resources within a VPs organization to help address BFs is likely more effective and less disruptive than a global freeze. instance, and address the releasability risk from that team and their owned code.
- **Global Level**: Finally, if the global quota is exceeded, the entire server organization enters a "code lockdown" until we meet the threshold for unfreezing. - **VP Level**: We roll the quotas up to a VPs entire organization as the next step of "code
lockdown". The expectation is that redirecting resources within a VPs organization to help
address BFs is likely more effective and less disruptive than a global freeze.
- **Global Level**: Finally, if the global quota is exceeded, the entire server organization enters
a "code lockdown" until we meet the threshold for unfreezing.
## Impact of a "Code Lockdown" ## Impact of a "Code Lockdown"
### Allowed Code Changes ### Allowed Code Changes
During a "code lockdown," Code Owners are expected to only approve **work that closes BFs or helps us reduce/avoid the _next_ Blocking state**. i.e. aimed at fixing a BF, a class of BFs, bugs, performance regression, etc. During a "code lockdown," Code Owners are expected to only approve **work that closes BFs or helps
us reduce/avoid the _next_ Blocking state**. i.e. aimed at fixing a BF, a class of BFs, bugs,
performance regression, etc.
If your PR does not meet this criteria, it may be pending for some time until the system becomes unblocked. There are of course reasonable exceptions, below. If your PR does not meet this criteria, it may be pending for some time until the system becomes
unblocked. There are of course reasonable exceptions, below.
### Feature Work ### Feature Work
**All feature work stops** during a "code lockdown." **All feature work stops** during a "code lockdown." In exceptional circumstances VPs can approve
In exceptional circumstances VPs can approve exceptions. exceptions.
### Non-feature Work ### Non-feature Work
We understand that in many cases addressing the larger BF problem requires refactoring, modularity improvements, changes to our test and paying down other kinds of **technical debt**. During a "code lockdown" this work is **expressly permitted and mergeable** - with the guidance that teams index heavily on risk when deciding what to work on. If a piece of work feels like it makes the BF problem worse before it gets better, talk to your director about how to proceed. We understand that in many cases addressing the larger BF problem requires refactoring, modularity
improvements, changes to our test and paying down other kinds of **technical debt**. During a "code
lockdown" this work is **expressly permitted and mergeable** - with the guidance that teams index
heavily on risk when deciding what to work on. If a piece of work feels like it makes the BF problem
worse before it gets better, talk to your director about how to proceed.
Allowable Examples (not exclusive): Allowable Examples (not exclusive):
- Refactoring components to make them more unit testable - Refactoring components to make them more unit testable
- Increasing code coverage through high quality tests that block PRs - Increasing code coverage through high quality tests that block PRs
- Making the development loop faster (decreasing build times, fixing slow tests, etc) - Making the development loop faster (decreasing build times, fixing slow tests, etc)
- Improving guardrails that improve code quality (fixing clang-tidy warnings, compiler warnings, etc) - Improving guardrails that improve code quality (fixing clang-tidy warnings, compiler warnings,
etc)
If a team is in a lockdown, but the rest of the org is not - their focus should likely skew towards work that expedites their lockdown exit. If a team is in a lockdown, but the rest of the org is not - their focus should likely skew towards
work that expedites their lockdown exit.
If the org is in a lockdown, but a team doesnt have BFs to work on - they should balance helping other teams with the work theyve identified as addressing the underlying BF problem. If the org is in a lockdown, but a team doesnt have BFs to work on - they should balance helping
other teams with the work theyve identified as addressing the underlying BF problem.
The higher the risk of the work, the more involvement the Staff+ engineers and the Director/VP should have in the decision about what is ok to merge and what isnt. The higher the risk of the work, the more involvement the Staff+ engineers and the Director/VP
should have in the decision about what is ok to merge and what isnt.
### Code Owner Responsibilities ### Code Owner Responsibilities
Code Owners should join the `#10gen-mongo-code-lockdown` Slack channel to receive daily updates on the status of the build. It produces daily metrics with instructions if there is a state change. Code Owners should join the `#10gen-mongo-code-lockdown` Slack channel to receive daily updates on
the status of the build. It produces daily metrics with instructions if there is a state change.
If we change to a blocking state, code owners should use their discretion to only approve changes that are allowed (see above). If we exit the blocking state, code owners should approve PRs as usual. If we change to a blocking state, code owners should use their discretion to only approve changes
that are allowed (see above). If we exit the blocking state, code owners should approve PRs as
usual.
## Quotas and State-Changes ## Quotas and State-Changes
@ -74,21 +99,31 @@ This shows relevant JIRA queries for a more live and interactive view of the sta
### BFs remaining open only on older branches ### BFs remaining open only on older branches
Some teams may fix a BF in master, but are "waiting for fix" on older branches, which keeps the BF counted against the thresholds. Guidance here is currently evolving. Some teams may fix a BF in master, but are "waiting for fix" on older branches, which keeps the BF
counted against the thresholds. Guidance here is currently evolving.
If the build failure is not frequently occurring, it can be marked as P5-Trivial, and it wont count towards your teams build failures for the block merge. If the build failure is not frequently occurring, it can be marked as P5-Trivial, and it wont count
towards your teams build failures for the block merge.
As we iterate on our processes for this, the `exclude-from-master-quota` label can be used to exclude BFs that should not be included in these quotas. The expectation is that this is an interim solution as we improve our processes especially around BFs that remain open pending backports. As we iterate on our processes for this, the `exclude-from-master-quota` label can be used to
exclude BFs that should not be included in these quotas. The expectation is that this is an interim
solution as we improve our processes especially around BFs that remain open pending backports.
Specifically: Specifically:
- If a BF is only waiting for a backport on a branch older than master, apply the `exclude-from-master-quota` label to the ticket. - If a BF is only waiting for a backport on a branch older than master, apply the
- If a BF is failing on master, not a serious bug (or a test-only issue that can't affect the real clients), not noisy, and we are choosing not to fix it, set the Priority to `P5 - Trivial` and apply the `keep-trivial` label. `exclude-from-master-quota` label to the ticket.
- If a BF is failing on an older branch and we are choosing not to backport a fix, set the `Priority to P5 - Trivial` and apply the `keep-trivial-X.Y` label appropriately. - If a BF is failing on master, not a serious bug (or a test-only issue that can't affect the real
clients), not noisy, and we are choosing not to fix it, set the Priority to `P5 - Trivial` and
apply the `keep-trivial` label.
- If a BF is failing on an older branch and we are choosing not to backport a fix, set the
`Priority to P5 - Trivial` and apply the `keep-trivial-X.Y` label appropriately.
## Contributing ## Contributing
For any new proposals, changes to thresholds, or concerns regarding their application, please escalate to your Director/VP. **We want advocacy from all levels to make this a successful change to our engineering culture.** For any new proposals, changes to thresholds, or concerns regarding their application, please
escalate to your Director/VP. **We want advocacy from all levels to make this a successful change to
our engineering culture.**
### CLI ### CLI
@ -100,7 +135,9 @@ python buildscripts/monitor_build_status/cli.py --help
### Testing locally ### Testing locally
For Jira API authentication, use the `JIRA_AUTH_PAT` env variable. More about Jira Personal Access Tokens (PATs) can be found [here](https://wiki.corp.mongodb.com/pages/viewpage.action?pageId=218995581). For Jira API authentication, use the `JIRA_AUTH_PAT` env variable. More about Jira Personal Access
Tokens (PATs) can be found
[here](https://wiki.corp.mongodb.com/pages/viewpage.action?pageId=218995581).
Use your PAT to run the following and output its results: Use your PAT to run the following and output its results:
@ -112,4 +149,6 @@ The above will _not_ send notifications to the Slack channel.
### Slack Notifications ### Slack Notifications
Slack notifications use a webhook from the Devprod Correctness Slack app (rather than user credentials) for security. The webhook URL is read from the `mongo-code-lockdown-webhook` Evergreen expansion, which points to the `#10gen-mongo-code-lockdown` Slack channel. Slack notifications use a webhook from the Devprod Correctness Slack app (rather than user
credentials) for security. The webhook URL is read from the `mongo-code-lockdown-webhook` Evergreen
expansion, which points to the `#10gen-mongo-code-lockdown` Slack channel.

View File

@ -3,27 +3,24 @@
## Summary ## Summary
Matrix Suites are defined as a combination of explict Matrix Suites are defined as a combination of explict
[suite files](../../../buildscripts/resmokeconfig/suites/README.md) [suite files](../../../buildscripts/resmokeconfig/suites/README.md) and a set of "overrides" for
and a set of "overrides" for specific keys. The intention is specific keys. The intention is to avoid duplication of suite definitions as much as possible with
to avoid duplication of suite definitions as much as the eventual goal of having most suites be fully composed of reusable sections.
possible with the eventual goal of having most suites be
fully composed of reusable sections.
## Usage ## Usage
Matrix suites behave like regular suites for all functionality in resmoke.py, Matrix suites behave like regular suites for all functionality in resmoke.py, including
including `list-suites`, `find-suites` and `run --suites=[SUITE]`. `list-suites`, `find-suites` and `run --suites=[SUITE]`.
## Writing a matrix suite mapping file. ## Writing a matrix suite mapping file.
Matrix suites consist of a mapping, and a set of overrides in Matrix suites consist of a mapping, and a set of overrides in their eponymous directories. When you
their eponymous directories. When you are done writing the mapping file, you must are done writing the mapping file, you must
[generate the matrix suite file.](#generating-matrix-suites) [generate the matrix suite file.](#generating-matrix-suites)
The "mappings" directory contains YAML files that each contain a suite definition. The "mappings" directory contains YAML files that each contain a suite definition. Each suite
Each suite definition includes `base_suite`, and a list of definition includes `base_suite`, and a list of modifiers. There is also an optional `description`
modifiers. There is also an optional `description` field that will get output field that will get output with the local resmoke invocation.
with the local resmoke invocation.
The fields of modifiers are the following: The fields of modifiers are the following:
@ -33,30 +30,29 @@ The fields of modifiers are the following:
4. extends 4. extends
Each modifier field is a dot-delimited-notation representing the file and field of the modification. Each modifier field is a dot-delimited-notation representing the file and field of the modification.
All modifier fields must be in a yaml file in the `overrides` directory All modifier fields must be in a yaml file in the `overrides` directory For example
For example `encryption.mongodfixture_ese` would reference the `mongodfixture_ese` field `encryption.mongodfixture_ese` would reference the `mongodfixture_ese` field inside of the
inside of the `encryption.yml` file inside of the `overrides` directory. `encryption.yml` file inside of the `overrides` directory.
### overrides ### overrides
All fields referenced in the `overrides` section of the mappings file will overwrite the specified All fields referenced in the `overrides` section of the mappings file will overwrite the specified
fields in the `base_suite`. fields in the `base_suite`. The `overrides` modifier takes precidence over the `excludes` and `eval`
The `overrides` modifier takes precidence over the `excludes` and `eval` modifiers. modifiers. The `overrides` list will be processed in order so order can matter if multiple override
The `overrides` list will be processed in order so order can matter if multiple override modifiers modifiers try to overwrite the same field in the base_suite.
try to overwrite the same field in the base_suite.
### excludes ### excludes
All fields referenced in the `excludes` section of the mappings file will append to the specified All fields referenced in the `excludes` section of the mappings file will append to the specified
`exclude` fields in the base suite. `exclude` fields in the base suite. The only two valid options in the referenced modifier field are
The only two valid options in the referenced modifier field are `exclude_with_any_tags` and `exclude_with_any_tags` and `exclude_files`. They are appended in the order they are specified in
`exclude_files`. They are appended in the order they are specified in the mappings file. the mappings file.
### eval ### eval
All fields referenced in the `eval` section of the mappings file will append to the specified All fields referenced in the `eval` section of the mappings file will append to the specified
`config.shell_options.eval` field in the base suite. `config.shell_options.eval` field in the base suite. They are appended in the order they are
They are appended in the order they are specified in the mappings file. specified in the mappings file.
### extends ### extends
@ -69,9 +65,8 @@ modifiers), the key being extended must already exist and also be a list.
The generated matrix suites live in the `buildscripts/resmokeconfig/matrix_suites/generated_suites` The generated matrix suites live in the `buildscripts/resmokeconfig/matrix_suites/generated_suites`
directory. These files may be edited for local testing but must remain consistent with the mapping directory. These files may be edited for local testing but must remain consistent with the mapping
files. There is a task in the commit queue that enforces this. To generate a new version of these files. There is a task in the commit queue that enforces this. To generate a new version of these
matrix suites, you may run matrix suites, you may run `buildscripts/resmoke.py generate-matrix-suites`. This command will
`buildscripts/resmoke.py generate-matrix-suites`. This command overwrite the current generated matrix suites on disk so make sure you do not have any unsaved
will overwrite the current generated matrix suites on disk so make sure you do not have any unsaved
changes to these files. changes to these files.
## Validating matrix suites ## Validating matrix suites
@ -82,5 +77,4 @@ ensures that the files are validated.
## FAQ ## FAQ
For questions about the user or authorship experience, For questions about the user or authorship experience, please reach out in #server-testing.
please reach out in #server-testing.

View File

@ -2,7 +2,8 @@
Test "suites" are configuration files that group which tests to run, and how. Test "suites" are configuration files that group which tests to run, and how.
Yaml files enumerate the test files that the suite encompasses, as well as any test fixtures and their configurations to leverage, options for the shell, hooks, and more. Yaml files enumerate the test files that the suite encompasses, as well as any test fixtures and
their configurations to leverage, options for the shell, hooks, and more.
## Minimal Example ## Minimal Example
@ -64,7 +65,8 @@ Example:
test_kind: js_test test_kind: js_test
``` ```
See all supported kinds in [`buildscripts/resmokelib/testing/testcases`](../../../buildscripts/resmokelib/testing/testcases/README.md). See all supported kinds in
[`buildscripts/resmokelib/testing/testcases`](../../../buildscripts/resmokelib/testing/testcases/README.md).
## `selector` ## `selector`
@ -89,25 +91,34 @@ File path(s) of test files to include. If a path without a glob is provided, it
### `selector.root` ### `selector.root`
A file containing glob patterns, one per line, typically used by test_kind cpp_unit_test (usually build/unittests.txt). Specifies which tests to consider for including into the suite. If no other options are specified, these are the tests that will be run. Glob patterns are supported (and common) here. A file containing glob patterns, one per line, typically used by test_kind cpp_unit_test (usually
build/unittests.txt). Specifies which tests to consider for including into the suite. If no other
options are specified, these are the tests that will be run. Glob patterns are supported (and
common) here.
### `selector.include_files` ### `selector.include_files`
A list of strings representing glob patterns. Includes only this subset of tests in the suite. These files will be included even if they would otherwise be excluded by tags. Will error if a test specified here was not included in the roots. A list of strings representing glob patterns. Includes only this subset of tests in the suite. These
files will be included even if they would otherwise be excluded by tags. Will error if a test
specified here was not included in the roots.
### `selector.exclude_files` ### `selector.exclude_files`
A list of strings representing glob patterns. Excludes this list of tests from the suite. These files will be excluded even if they would otherwise be included by tags. Will error if a test specified here was not included in the roots. A list of strings representing glob patterns. Excludes this list of tests from the suite. These
files will be excluded even if they would otherwise be included by tags. Will error if a test
specified here was not included in the roots.
### `selector.include_with_any_tags` ### `selector.include_with_any_tags`
A list of strings. Only jstests which define a list of tags which includes any of these tags will be included in the suite, unless otherwise excluded by filename. A list of strings. Only jstests which define a list of tags which includes any of these tags will be
included in the suite, unless otherwise excluded by filename.
To see all tags referenced across suites, run `./buildscripts/resmoke.py list-tags`. To see all tags referenced across suites, run `./buildscripts/resmoke.py list-tags`.
### `selector.exclude_with_any_tags` ### `selector.exclude_with_any_tags`
A list of strings. Any jstest which defines a list of tags which includes any of these tags will be excluded from the suite, unless otherwise included by filename. A list of strings. Any jstest which defines a list of tags which includes any of these tags will be
excluded from the suite, unless otherwise included by filename.
To see all tags referenced across suites, run `./buildscripts/resmoke.py list-tags`. To see all tags referenced across suites, run `./buildscripts/resmoke.py list-tags`.
@ -118,9 +129,8 @@ Defines how the tests will be executed.
### `executor.config` ### `executor.config`
This section contains additional configuration for each test. The structure of this can vary This section contains additional configuration for each test. The structure of this can vary
significantly based on the `test_kind`. For specific information, you can look at the significantly based on the `test_kind`. For specific information, you can look at the implementation
implementation of the `test_kind` of concern in the `buildscripts/resmokelib/testing/testcases` of the `test_kind` of concern in the `buildscripts/resmokelib/testing/testcases` directory.
directory.
Example: Example:
@ -147,7 +157,9 @@ Any parameters (besides `global_vars`) will directly be passed to the mongo shel
##### `executor.config.shell_options.global_vars` ##### `executor.config.shell_options.global_vars`
Will use this as the base for the string passed to `--eval`. Anything specified in `shell_options.eval` will be appended after these. Formats any objects so that they will evaluate properly as a string. Will use this as the base for the string passed to `--eval`. Anything specified in
`shell_options.eval` will be appended after these. Formats any objects so that they will evaluate
properly as a string.
`global_vars` allows for setting global variables. A `TestData` object is a special global variable `global_vars` allows for setting global variables. A `TestData` object is a special global variable
that is used to hold testing data. Parts of `TestData` can be updated via `resmoke` command-line that is used to hold testing data. Parts of `TestData` can be updated via `resmoke` command-line
@ -156,8 +168,8 @@ intelligently and made available to the `js_test` running. Behavior can vary on
in general this is the order of precedence: (1) resmoke command-line (2) [suite].yml (3) in general this is the order of precedence: (1) resmoke command-line (2) [suite].yml (3)
runtime/default. runtime/default.
The mongo shell can also be invoked with flags & The mongo shell can also be invoked with flags & named arguments. Flags must have the `''` value,
named arguments. Flags must have the `''` value, such as in the case for `nodb` above. such as in the case for `nodb` above.
`eval` can also be used to run generic javascript code in the shell. You can directly include `eval` can also be used to run generic javascript code in the shell. You can directly include
javascript code, or you can put it in a separate script & `load` it. javascript code, or you can put it in a separate script & `load` it.
@ -166,11 +178,12 @@ javascript code, or you can put it in a separate script & `load` it.
Specify hooks to run before, after, and between individual tests to execute specified logic. Specify hooks to run before, after, and between individual tests to execute specified logic.
> Read more about hooks in [buildscripts/resmokelib/testing/hooks/README.md](../../../buildscripts/resmokelib/testing/hooks/README.md) > Read more about hooks in
> [buildscripts/resmokelib/testing/hooks/README.md](../../../buildscripts/resmokelib/testing/hooks/README.md)
The hook name in the `.yml` must match its Python class name of the hook. Parameters can also be included in the `.yml` The hook name in the `.yml` must match its Python class name of the hook. Parameters can also be
and will be passed to the hook's constructor (the `hook_logger` & `fixture` parameters are included in the `.yml` and will be passed to the hook's constructor (the `hook_logger` & `fixture`
automatically included, so those should not be included in the `.yml`). parameters are automatically included, so those should not be included in the `.yml`).
Example: Example:
@ -190,9 +203,11 @@ hooks:
Specify a test fixture to run around the tests. Specify a test fixture to run around the tests.
> Read more about fixtures in [buildscripts/resmokelib/testing/fixtures/README.md](../../../buildscripts/resmokelib/testing/fixtures/README.md). > Read more about fixtures in
> [buildscripts/resmokelib/testing/fixtures/README.md](../../../buildscripts/resmokelib/testing/fixtures/README.md).
The `class` sub-field corresponds to the Python class name of a fixture. All other sub-fields are passed into the constructor of the fixture. These sub-fields will vary based on the fixture used. The `class` sub-field corresponds to the Python class name of a fixture. All other sub-fields are
passed into the constructor of the fixture. These sub-fields will vary based on the fixture used.
Example: Example:
@ -238,4 +253,5 @@ Read more about [hooks](../../../buildscripts/resmokelib/testing/hooks/README.md
#### `executor.archive.tests` #### `executor.archive.tests`
Specify a list of test files to archive on failure. Wildcard selection a valid. Set to `true` to archive _all_ tests. Specify a list of test files to archive on failure. Wildcard selection a valid. Set to `true` to
archive _all_ tests.

View File

@ -2,11 +2,13 @@
Resmoke is MongoDB's integration test runner. Resmoke is MongoDB's integration test runner.
The JS Tests it can run live in the `jstests/` directory - reference its [README](../../jstests/README.md) to learn about their content. The JS Tests it can run live in the `jstests/` directory - reference its
[README](../../jstests/README.md) to learn about their content.
## Build ## Build
Though the source is built with bazel, resmoke is not yet integrated. This means that the source has to be built prior to using resmoke, eg: Though the source is built with bazel, resmoke is not yet integrated. This means that the source has
to be built prior to using resmoke, eg:
``` ```
bazel build install-dist-test bazel build install-dist-test
@ -41,11 +43,13 @@ bazel build install-dist-test
Generate a mongod.conf and mongos.conf using config fuzzer. Generate a mongod.conf and mongos.conf using config fuzzer.
``` ```
Note: `bisect`, `setup-multiversion`, and `symbolize` commands have been moved to [`db-contrib-tool`](https://github.com/10gen/db-contrib-tool#readme). Note: `bisect`, `setup-multiversion`, and `symbolize` commands have been moved to
[`db-contrib-tool`](https://github.com/10gen/db-contrib-tool#readme).
## Suites ## Suites
Many of the above commands use the concept of a "suite". Loosely, suites group which tests run, and how. Many of the above commands use the concept of a "suite". Loosely, suites group which tests run, and
how.
Read more about suites [here](../../buildscripts/resmokeconfig/suites/README.md). Read more about suites [here](../../buildscripts/resmokeconfig/suites/README.md).
@ -59,43 +63,47 @@ The most typical approach is to run a particular JS test file given a suite, eg:
buildscripts/resmoke.py run --suites=no_passthrough jstests/noPassthrough/shell/js/string.js buildscripts/resmoke.py run --suites=no_passthrough jstests/noPassthrough/shell/js/string.js
``` ```
That executes the content of that file, using the suite configuration as a fixture setup. The suite "no_passthrough" is associated with the file [buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml). That executes the content of that file, using the suite configuration as a fixture setup. The suite
"no_passthrough" is associated with the file
[buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml).
Run has **100+ flags**! Use `resmoke run --help` to inspect them. To avoid risk of multiple sources of truth that can drift and become stale, **we do not attempt to document them all here** - they should each be self-descriptive and documented within the CLI help. Run has **100+ flags**! Use `resmoke run --help` to inspect them. To avoid risk of multiple sources
of truth that can drift and become stale, **we do not attempt to document them all here** - they
should each be self-descriptive and documented within the CLI help.
Below are very high-level descriptions for high-usage flags. Below are very high-level descriptions for high-usage flags.
### Suites (`--suites`) ### Suites (`--suites`)
The run subcommand can run suites (list of tests and the MongoDB topology and The run subcommand can run suites (list of tests and the MongoDB topology and configuration to run
configuration to run them against), and explicitly named test files. them against), and explicitly named test files.
A single suite can be specified using the `--suite` flag, and multiple suites A single suite can be specified using the `--suite` flag, and multiple suites can be specified by
can be specified by providing a comma separated list to the `--suites` flag. providing a comma separated list to the `--suites` flag.
Additional documentation on our suite configuration can be found in Additional documentation on our suite configuration can be found in
[buildscripts/resmokeconfig/suites/README.md](../../buildscripts/resmokeconfig/suites/README.md). [buildscripts/resmokeconfig/suites/README.md](../../buildscripts/resmokeconfig/suites/README.md).
### Testable Installations (`--installDir`) ### Testable Installations (`--installDir`)
resmoke can run tests against any testable installation of MongoDB (such resmoke can run tests against any testable installation of MongoDB (such as ASAN, Debug, Release).
as ASAN, Debug, Release). When possible, resmoke will automatically locate and When possible, resmoke will automatically locate and run with a locally built copy of MongoDB
run with a locally built copy of MongoDB Server, so long as that build was Server, so long as that build was installed to a subdirectory of the root of the git repository, and
installed to a subdirectory of the root of the git repository, and there is there is exactly one build. In other situations, the `--installDir` flag, passed to run subcommand,
exactly one build. In other situations, the `--installDir` flag, passed to run can be used to indicate the location of the mongod/mongos binaries.
subcommand, can be used to indicate the location of the mongod/mongos binaries.
As an alternative, you may instead prefer to use the resmoke.py wrapper script As an alternative, you may instead prefer to use the resmoke.py wrapper script located in the same
located in the same directory as the mongod binary, which will automatically directory as the mongod binary, which will automatically set `installDir` for you.
set `installDir` for you.
Note that this wrapper is unavailable in packaged installations of MongoDB Note that this wrapper is unavailable in packaged installations of MongoDB Server, such as those
Server, such as those provided by Homebrew, and other package managers. If you provided by Homebrew, and other package managers. If you would like to run tests against a packaged
would like to run tests against a packaged installation, you must explicitly installation, you must explicitly pass `--installDir` to resmoke.py
pass `--installDir` to resmoke.py
### Resmoke test telemetry ### Resmoke test telemetry
We capture telemetry from resmoke using open telemetry. We capture telemetry from resmoke using open telemetry.
Using open telemetry (OTel) we capture more specific information about the internals of resmoke. This data is used for improvements specifically when running in evergreen. This data is captured on every resmoke invocation but only sent to honeycomb when running in evergreen. More info about how we use OTel in resmoke can be found [here](otel_resmoke.md). Using open telemetry (OTel) we capture more specific information about the internals of resmoke.
This data is used for improvements specifically when running in evergreen. This data is captured on
every resmoke invocation but only sent to honeycomb when running in evergreen. More info about how
we use OTel in resmoke can be found [here](otel_resmoke.md).

View File

@ -1,10 +1,12 @@
# Extensions # Extensions
This module provides utilities for setting up and configuring MongoDB extensions in resmoke test suites. This module provides utilities for setting up and configuring MongoDB extensions in resmoke test
suites.
## Overview ## Overview
Extensions are dynamically loaded shared objects (`.so` files) that provide additional functionality to MongoDB. The utilities in this folder can handle: Extensions are dynamically loaded shared objects (`.so` files) that provide additional functionality
to MongoDB. The utilities in this folder can handle:
1. Discovering extension `.so` files in build directories 1. Discovering extension `.so` files in build directories
2. Generating `.conf` configuration files for extensions 2. Generating `.conf` configuration files for extensions
@ -12,7 +14,8 @@ Extensions are dynamically loaded shared objects (`.so` files) that provide addi
## Configuration File Generation in Tests ## Configuration File Generation in Tests
Extension `.conf` files are YAML configuration files that tell the server how to load an extension. They contain: Extension `.conf` files are YAML configuration files that tell the server how to load an extension.
They contain:
- `sharedLibraryPath`: Path to the `.so` file - `sharedLibraryPath`: Path to the `.so` file
- `extensionOptions`: Optional configuration parameters for the extension - `extensionOptions`: Optional configuration parameters for the extension
@ -30,9 +33,11 @@ extensionOptions:
The `generate_extension_configs.py` module creates `.conf` files: The `generate_extension_configs.py` module creates `.conf` files:
1. Receives a list of `.so` file paths (either from automatic discovery via `find_and_generate_extension_configs.py`, or manually via `--so-files` command-line argument) 1. Receives a list of `.so` file paths (either from automatic discovery via
`find_and_generate_extension_configs.py`, or manually via `--so-files` command-line argument)
2. For each `.so`, creates a `.conf` file in the temp directory (`/tmp/mongo/extensions/`) 2. For each `.so`, creates a `.conf` file in the temp directory (`/tmp/mongo/extensions/`)
3. Looks up corresponding extension options from `src/mongo/db/extension/test_examples/configurations.yml`, if any are specified 3. Looks up corresponding extension options from
`src/mongo/db/extension/test_examples/configurations.yml`, if any are specified
4. Writes the config file with `sharedLibraryPath` and any `extensionOptions` 4. Writes the config file with `sharedLibraryPath` and any `extensionOptions`
### Automatic Discovery and Generation ### Automatic Discovery and Generation

View File

@ -2,7 +2,10 @@
This is a testing feature of the mongod and mongos, built into resmoke.py! This is a testing feature of the mongod and mongos, built into resmoke.py!
The config fuzzer is a resmoke feature that randomizes various server parameters of both mongod and mongos on startup. These fuzzed parameters should not affect the correctness of any tests. Therefore, the config fuzzer can be enabled for any test or suite run with resmoke to ensure the database is resilient to abnormal server configurations. The config fuzzer is a resmoke feature that randomizes various server parameters of both mongod and
mongos on startup. These fuzzed parameters should not affect the correctness of any tests.
Therefore, the config fuzzer can be enabled for any test or suite run with resmoke to ensure the
database is resilient to abnormal server configurations.
More information can be displayed in the resmoke --help output: More information can be displayed in the resmoke --help output:
@ -25,15 +28,22 @@ The bulk of the fuzzing logic is in [mongo_fuzzer_configs.py](./mongo_fuzzer_con
## How does it work? ## How does it work?
The config fuzzer assigns random values to various tunable parameters. Server parameters and their ranges are specified manually by developers and are not discovered automatically in any way. The config fuzzer assigns random values to various tunable parameters. Server parameters and their
ranges are specified manually by developers and are not discovered automatically in any way.
When the above resmoke flags are used, the [plugin](./plugin.py) implicitly enables the [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) hook for testing. When the above resmoke flags are used, the [plugin](./plugin.py) implicitly enables the
[FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py)
hook for testing.
## Where and When does it run on evergreen? ## Where and When does it run on evergreen?
The config fuzzer is represented as a handful of evergreen tasks with "_config_fuzzer_" in the name. Search "config_fuzzer" in the [etc/](../../../etc) directory to find all the evergreen tasks. The config fuzzer is represented as a handful of evergreen tasks with "_config_fuzzer_" in the name.
Search "config_fuzzer" in the [etc/](../../../etc) directory to find all the evergreen tasks.
Arguably the simplest evergreen task, `config_fuzzer_jsCore`, runs the "core" (i.e. `jstests/core`) resmoke suite with the config fuzzer parameters to resmoke set, and excludes some incompatible tests ([src link](https://github.com/mongodb/mongo/blob/a2e7e83a135c3096de7f360b88de1b3cdc1caaf2/etc/evergreen_yml_components/tasks/resmoke/server_divisions/durable_transactions_and_availability/tasks.yml#L1956-L1975)). Here is a sampling of some of the task names: Arguably the simplest evergreen task, `config_fuzzer_jsCore`, runs the "core" (i.e. `jstests/core`)
resmoke suite with the config fuzzer parameters to resmoke set, and excludes some incompatible tests
([src link](https://github.com/mongodb/mongo/blob/a2e7e83a135c3096de7f360b88de1b3cdc1caaf2/etc/evergreen_yml_components/tasks/resmoke/server_divisions/durable_transactions_and_availability/tasks.yml#L1956-L1975)).
Here is a sampling of some of the task names:
- `config_fuzzer_concurrency_replication` - `config_fuzzer_concurrency_replication`
- `config_fuzzer_concurrency_sharded_replication` - `config_fuzzer_concurrency_sharded_replication`
@ -41,7 +51,10 @@ Arguably the simplest evergreen task, `config_fuzzer_jsCore`, runs the "core" (i
## Reproducing a config fuzzer failure ## Reproducing a config fuzzer failure
In the Evergreen task view, click on the Logs tab, then Task Logs, and open in Parsely. Search for "Fuzzed" ([source link](https://github.com/mongodb/mongo/blob/ca1c935aca43ca2e028507e2a878d4e12f50355b/buildscripts/resmokelib/run/__init__.py#L352-L366)). The output will look similar to this: In the Evergreen task view, click on the Logs tab, then Task Logs, and open in Parsely. Search for
"Fuzzed"
([source link](https://github.com/mongodb/mongo/blob/ca1c935aca43ca2e028507e2a878d4e12f50355b/buildscripts/resmokelib/run/__init__.py#L352-L366)).
The output will look similar to this:
<details> <details>
<summary>Logs</summary> <summary>Logs</summary>
@ -112,13 +125,22 @@ In the Evergreen task view, click on the Logs tab, then Task Logs, and open in P
</details> </details>
The log line starting with "resmoke.py invocation for local usage" and the one with "configFuzzSeed" provide an option `--configFuzzSeed=5583430894313922699` that can be used to generate the same fuzzed server parameters locally in resmoke. The log line starting with "resmoke.py invocation for local usage" and the one with "configFuzzSeed"
provide an option `--configFuzzSeed=5583430894313922699` that can be used to generate the same
fuzzed server parameters locally in resmoke.
## Running the config fuzzer locally ## Running the config fuzzer locally
Before running the Resmoke config fuzzer command, you need to obtain the necessary binaries. You can download them from the "Files" section of the `archive_dist_test` task in Evergreen (e.g., binaries from the `amazon2-arm64-compile` variant). Alternatively, if you don't require those specific binaries, you can use `db-contrib-tool` to download the binaries (e.g., by running `bazel run db-contrib-tool -- setup-repro-env master`). Before running the Resmoke config fuzzer command, you need to obtain the necessary binaries. You can
download them from the "Files" section of the `archive_dist_test` task in Evergreen (e.g., binaries
from the `amazon2-arm64-compile` variant). Alternatively, if you don't require those specific
binaries, you can use `db-contrib-tool` to download the binaries (e.g., by running
`bazel run db-contrib-tool -- setup-repro-env master`).
To re-run a command locally that failed through the config fuzzer, you can navigate to the specific test that failed, and under files you can find a name titled "Resmoke.py Invocation for Local Usage". If you are replicating an older config fuzzer invocation, remove the command line argument "`--installDir=dist-test/bin`". A simple example command is shown below: To re-run a command locally that failed through the config fuzzer, you can navigate to the specific
test that failed, and under files you can find a name titled "Resmoke.py Invocation for Local
Usage". If you are replicating an older config fuzzer invocation, remove the command line argument
"`--installDir=dist-test/bin`". A simple example command is shown below:
``` ```
buildscripts/resmoke.py run jstests/noPassthrough/bulk_write_w0.js \ buildscripts/resmoke.py run jstests/noPassthrough/bulk_write_w0.js \
@ -127,7 +149,12 @@ buildscripts/resmoke.py run jstests/noPassthrough/bulk_write_w0.js \
--configFuzzSeed=7956511060361033919 --configFuzzSeed=7956511060361033919
``` ```
It is easiest to pipe the output to another text file and then to analyze the output through there. The format of the file is slightly different, as you will not be able to explicitly look up Fuzzed, but you can look up one of the fuzzed config parameters to find the list of fuzzed config parameter settings. A subset of a log from running the above command on [this version](https://github.com/mongodb/mongo/commit/856e4ecd8612b19c8ba281cf23450d74b5838650) of master yields is the following: It is easiest to pipe the output to another text file and then to analyze the output through there.
The format of the file is slightly different, as you will not be able to explicitly look up Fuzzed,
but you can look up one of the fuzzed config parameters to find the list of fuzzed config parameter
settings. A subset of a log from running the above command on
[this version](https://github.com/mongodb/mongo/commit/856e4ecd8612b19c8ba281cf23450d74b5838650) of
master yields is the following:
``` ```
js_test:bulk_write_w0] Skip waiting to connect to node with pid=2522712, port=20040 js_test:bulk_write_w0] Skip waiting to connect to node with pid=2522712, port=20040
@ -140,7 +167,8 @@ js_test:bulk_write_w0] Skip waiting to connect to node with pid=2522712, port=20
## Adding a new parameter to be fuzzed to the config fuzzer ## Adding a new parameter to be fuzzed to the config fuzzer
There are two broad categories of parameters in the config fuzzer, that each have two sub-categories of parameters: There are two broad categories of parameters in the config fuzzer, that each have two sub-categories
of parameters:
1. mongo parameters 1. mongo parameters
- mongod parameters - mongod parameters
@ -151,25 +179,43 @@ There are two broad categories of parameters in the config fuzzer, that each hav
### Adding new mongo parameters ### Adding new mongo parameters
Mongo parameters and their properties (e.g. min, max, default) are stored in [config_fuzzer_limits.py](./config_fuzzer_limits.py). Mongo parameters and their properties (e.g. min, max, default) are stored in
[config_fuzzer_limits.py](./config_fuzzer_limits.py).
Below is a list of ways to fuzz configs which are supported without having to also change [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). Below is a list of ways to fuzz configs which are supported without having to also change
Please ensure that you add it correctly to the `mongod` or `mongos` subdictionary. [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). Please ensure that you add it correctly to the
`mongod` or `mongos` subdictionary.
You need to specify if your parameter should be fuzzed at runtime, startup, or both by declaring the `fuzz_at` key for the parameter. The `fuzz_at` key should be a list that can contain the values `startup`, `runtime`, or both. The eligible values are specified in the `set_at` keys of the corresponding `.idl` files. You need to specify if your parameter should be fuzzed at runtime, startup, or both by declaring the
`fuzz_at` key for the parameter. The `fuzz_at` key should be a list that can contain the values
`startup`, `runtime`, or both. The eligible values are specified in the `set_at` keys of the
corresponding `.idl` files.
For a parameter that is only fuzzed at startup, the fuzzer will generate a fuzzed value for the parameter and set it when starting up the server. For a parameter that is only fuzzed at startup, the fuzzer will generate a fuzzed value for the
parameter and set it when starting up the server.
For a parameter fuzzed at runtime, the fuzzer will generate a fuzzed value for the parameter while running the server based on a `period` key that is required for fuzzed runtime parameters. For a parameter fuzzed at runtime, the fuzzer will generate a fuzzed value for the parameter while
The `period` key describes how often the parameter should be changed, in seconds. Every `period` seconds, the fuzzer will select a new random value for the parameter and use the setParameter command to update the value of the running the server based on a `period` key that is required for fuzzed runtime parameters. The
parameter on every node in the cluster while the suite is running. This is perfomed by the [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) hook. `period` key describes how often the parameter should be changed, in seconds. Every `period`
seconds, the fuzzer will select a new random value for the parameter and use the setParameter
command to update the value of the parameter on every node in the cluster while the suite is
running. This is perfomed by the
[FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py)
hook.
For parameters with complex fuzzing logic or interdependencies with other parameters, you can set `"custom_fuzz_value_assignment": True` to bypass the standard fuzzing logic. Parameters with this flag must be handled explicitly in the special handling functions (`generate_special_mongod_startup_parameters()` for startup parameters or `generate_special_runtime_parameters()` for runtime parameters). Note that parameter dependency logic is currently only supported for startup fuzzing - runtime fuzzing operates on individual parameters. See the section below on parameters requiring special handling for more details. For parameters with complex fuzzing logic or interdependencies with other parameters, you can set
`"custom_fuzz_value_assignment": True` to bypass the standard fuzzing logic. Parameters with this
flag must be handled explicitly in the special handling functions
(`generate_special_mongod_startup_parameters()` for startup parameters or
`generate_special_runtime_parameters()` for runtime parameters). Note that parameter dependency
logic is currently only supported for startup fuzzing - runtime fuzzing operates on individual
parameters. See the section below on parameters requiring special handling for more details.
Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have as a value. Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have
The parameters are added in order of priority chosen in the if-elif-else statement in `generate_normal_mongo_parameters()` as a value. The parameters are added in order of priority chosen in the if-elif-else statement in
in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). `generate_normal_mongo_parameters()` in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). So, if
So, if you added the fields `default`, `min`, and `max` for a `param`, case 4 would get evaluated over case 5. you added the fields `default`, `min`, and `max` for a `param`, case 4 would get evaluated over
case 5.
1. `param = rng.uniform(min, max)` 1. `param = rng.uniform(min, max)`
@ -218,41 +264,59 @@ So, if you added the fields `default`, `min`, and `max` for a `param`, case 4 wo
"param": {"default": default} "param": {"default": default}
``` ```
> Note: For the default case, please add the value `"fuzz_at": ["startup"]` (the default value gets set at "startup"). > Note: For the default case, please add the value `"fuzz_at": ["startup"]` (the default value
> gets set at "startup").
If you have a parameter that depends on another parameter being generated (see `throughputProbingInitialConcurrency` needing to be initialized before If you have a parameter that depends on another parameter being generated (see
`throughputProbingMinConcurrency` and `throughputProbingMaxConcurrency` as an example in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that `throughputProbingInitialConcurrency` needing to be initialized before
differs from the above cases, please do the following steps: `throughputProbingMinConcurrency` and `throughputProbingMaxConcurrency` as an example in
[mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that differs from the above cases,
please do the following steps:
1. Add the parameter and the needed information to [config_fuzzer_limits.py](./config_fuzzer_limits.py) (ensure to correctly add to the `mongod` or `mongos` sub-dictionary), including `"custom_fuzz_value_assignment": True` to indicate it requires special handling 1. Add the parameter and the needed information to
[config_fuzzer_limits.py](./config_fuzzer_limits.py) (ensure to correctly add to the `mongod` or
`mongos` sub-dictionary), including `"custom_fuzz_value_assignment": True` to indicate it
requires special handling
In [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py): In [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py):
2. Add the parameter's special handling in `generate_special_mongod_startup_parameters()` or `generate_special_mongos_startup_parameters()` for startup parameters, or `generate_special_runtime_parameters()` for runtime parameters 2. Add the parameter's special handling in `generate_special_mongod_startup_parameters()` or
`generate_special_mongos_startup_parameters()` for startup parameters, or
`generate_special_runtime_parameters()` for runtime parameters
> Note: Parameter dependencies (where one parameter's value constrains another) are currently only supported for startup fuzzing. Runtime fuzzing handles parameters individually. > Note: Parameter dependencies (where one parameter's value constrains another) are currently only
> supported for startup fuzzing. Runtime fuzzing handles parameters individually.
If you add a flow control parameter, please add the the parameter's name to `flow_control_params` in `generate_mongod_parameters`. If you add a flow control parameter, please add the the parameter's name to `flow_control_params` in
`generate_mongod_parameters`.
> Note: The main distinction between min/max vs. lower-bound/upper_bound is there is some transformation involving the lower and upper bounds, > Note: The main distinction between min/max vs. lower-bound/upper_bound is there is some
> while the min/max should be the true min/max of the parameters. You should also include the true min/max of the parameter so this can be logged. > transformation involving the lower and upper bounds, while the min/max should be the true min/max
> If the min/max is not inclusive, this is added as a note above the parameter. > of the parameters. You should also include the true min/max of the parameter so this can be
> logged. If the min/max is not inclusive, this is added as a note above the parameter.
### Adding new WiredTiger parameters ### Adding new WiredTiger parameters
WiredTiger parameters and their properties (e.g. min, max, default) are stored in [config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py). WiredTiger parameters and their properties (e.g. min, max, default) are stored in
[config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py).
> These _can not_ be fuzzed with the [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) hook because they are only set on startup (these parameters are used in the wt configuration string). > These _can not_ be fuzzed with the
> [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py)
> hook because they are only set on startup (these parameters are used in the wt configuration
> string).
Below is a list of ways to fuzz configs which are supported without having to also change [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). Below is a list of ways to fuzz configs which are supported without having to also change
Please ensure that you add it correctly to the `wt` (eviction parameters) or `wt_table` subdictionary.
Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have as a value.
The parameters are added in order of priority chosen in the if-elif-else statement in `generate_normal_wt_parameters()` in
[mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py).
Please ensure that you add it correctly to the `wt` (eviction parameters) or `wt_table`
subdictionary.
Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have
as a value.
The parameters are added in order of priority chosen in the if-elif-else statement in
`generate_normal_wt_parameters()` in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py).
1. `param = rng.choices(choices)`, where choices is an array 1. `param = rng.choices(choices)`, where choices is an array
Add: Add:
@ -281,25 +345,32 @@ The parameters are added in order of priority chosen in the if-elif-else stateme
"param": {"min": min, "max": max} "param": {"min": min, "max": max}
``` ```
If you have a parameter that depends on another parameter being generated (see `eviction_target` needing to be initialized before If you have a parameter that depends on another parameter being generated (see `eviction_target`
`eviction_trigger` as an example in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that differs from the above cases, needing to be initialized before `eviction_trigger` as an example in
[mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that differs from the above cases,
please do the following steps: please do the following steps:
1. Add the parameter and the needed information to [config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py) (ensure to correctly add to the `wt` or `wt_table` sub-dictionary) 1. Add the parameter and the needed information to
[config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py) (ensure to correctly add to the `wt`
or `wt_table` sub-dictionary)
In [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py): In [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py):
2. Add the parameter to `excluded_normal_params` in `generate_eviction_configs()` or `generate_table_configs()` 2. Add the parameter to `excluded_normal_params` in `generate_eviction_configs()` or
3. Add the parameter's special handling in `generate_special_eviction_configs()` or `generate_special_table_configs()` `generate_table_configs()`
3. Add the parameter's special handling in `generate_special_eviction_configs()` or
`generate_special_table_configs()`
> The main distinction between min/max vs. lower-bound/upper_bound is there is some transformation involving the lower and upper bounds, > The main distinction between min/max vs. lower-bound/upper_bound is there is some transformation
> while the min/max should be the true min/max of the parameters. You should also include the true min/max of the parameter so this can be logged. > involving the lower and upper bounds, while the min/max should be the true min/max of the
> If the min/max is not inclusive, this is added as a note above the parameter. > parameters. You should also include the true min/max of the parameter so this can be logged. If
> the min/max is not inclusive, this is added as a note above the parameter.
## Exclusions ## Exclusions
- `jstests/libs/override_methods/config_fuzzer_incompatible_commands.js` - `jstests/libs/override_methods/config_fuzzer_incompatible_commands.js`
- These commands are too impactful to run with the config fuzzer - These commands are too impactful to run with the config fuzzer
- The `does_not_support_config_fuzzer` jstest tag - The `does_not_support_config_fuzzer` jstest tag
- Tests with this tag may manually specify server parameters modified by the fuzzer or read global state that is modified in some way by the fuzzer. - Tests with this tag may manually specify server parameters modified by the fuzzer or read global
state that is modified in some way by the fuzzer.
- Just because a test is failing does not mean it is incompatible with the config fuzzer. - Just because a test is failing does not mean it is incompatible with the config fuzzer.

View File

@ -3,7 +3,9 @@
There are two main ways of running the core analyzer. There are two main ways of running the core analyzer.
1. Running the core analyzer with local core dumps and binaries. 1. Running the core analyzer with local core dumps and binaries.
2. Running the core analyzer with core dumps and binaries from an evergreen task. Note that some analysis might fail if you are not on the same AMI (Amazon Machine Image) that the task was run on. 2. Running the core analyzer with core dumps and binaries from an evergreen task. Note that some
analysis might fail if you are not on the same AMI (Amazon Machine Image) that the task was run
on.
To run the core analyzer with local core dumps and binaries: To run the core analyzer with local core dumps and binaries:
@ -11,7 +13,9 @@ To run the core analyzer with local core dumps and binaries:
python3 buildscripts/resmoke.py core-analyzer python3 buildscripts/resmoke.py core-analyzer
``` ```
This will look for binaries in the build/install directory, and it will look for core dumps in the current directory. If your local environment is different you can include `--install-dir` and `--core-dir` in your invocation to specify other locations. This will look for binaries in the build/install directory, and it will look for core dumps in the
current directory. If your local environment is different you can include `--install-dir` and
`--core-dir` in your invocation to specify other locations.
To run the core analyzer with core dumps and binaries from an evergreen task: To run the core analyzer with core dumps and binaries from an evergreen task:
@ -19,11 +23,15 @@ To run the core analyzer with core dumps and binaries from an evergreen task:
python3 buildscripts/resmoke.py core-analyzer --task-id={task_id} python3 buildscripts/resmoke.py core-analyzer --task-id={task_id}
``` ```
This will download all of the core dumps and binaries from the task and put them into the configured `--working-dir`, this defaults to the `core-analyzer` directory. This will download all of the core dumps and binaries from the task and put them into the configured
`--working-dir`, this defaults to the `core-analyzer` directory.
All of the task analysis will be added to the `analysis` directory inside the configured `--working-dir`. All of the task analysis will be added to the `analysis` directory inside the configured
`--working-dir`.
Note: Currently the core analyzer only runs on linux. Windows uses the legacy hang analyzer but will be switched over when we run into issues or have time to do the transition. We have not tackled the problem of getting core dumps on macOS so we have no core dump analysis on that operating system. Note: Currently the core analyzer only runs on linux. Windows uses the legacy hang analyzer but will
be switched over when we run into issues or have time to do the transition. We have not tackled the
problem of getting core dumps on macOS so we have no core dump analysis on that operating system.
### Getting core dumps ### Getting core dumps
@ -37,28 +45,33 @@ sequenceDiagram
Hang Analyzer ->> Core Dumps: Attach to pid and generate core dumps Hang Analyzer ->> Core Dumps: Attach to pid and generate core dumps
``` ```
When a task times out, it hits the [timeout](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2694) section in the defined evergreen config. When a task times out, it hits the
In this timeout section, we run [this](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2302) task which runs the hang-analyzer with the following invocation: [timeout](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2694)
section in the defined evergreen config. In this timeout section, we run
[this](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2302)
task which runs the hang-analyzer with the following invocation:
``` ```
python3 buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python python3 buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python
``` ```
This tells the hang-analyzer to look for all of the python processes (we are specifically looking for resmoke) on the machine and to signal them. This tells the hang-analyzer to look for all of the python processes (we are specifically looking
When resmoke is [signaled](https://github.com/mongodb/mongo/blob/08a99b15eea7ae0952b2098710d565dd7f709ff6/buildscripts/resmokelib/sighandler.py#L25), it again invokes the hang analyzer with the specific pids of it's child processes. for resmoke) on the machine and to signal them. When resmoke is
It will look similar to this most of the time: [signaled](https://github.com/mongodb/mongo/blob/08a99b15eea7ae0952b2098710d565dd7f709ff6/buildscripts/resmokelib/sighandler.py#L25),
it again invokes the hang analyzer with the specific pids of it's child processes. It will look
similar to this most of the time:
``` ```
python3 buildscripts/resmoke.py hang-analyzer -o file -o stdout -k -c -d pid1,pid2,pid3 python3 buildscripts/resmoke.py hang-analyzer -o file -o stdout -k -c -d pid1,pid2,pid3
``` ```
The things to note here are the `-k` which kills the process and `-c` which takes core dumps. The things to note here are the `-k` which kills the process and `-c` which takes core dumps. The
The resulting core dumps are put into the current running directory. resulting core dumps are put into the current running directory.
#### When a test times out #### When a test times out
An optional test timeout (`--testTimeout=N` seconds) can be used when running resmoke that will run the hang-analyzer on all processes related to that test. An optional test timeout (`--testTimeout=N` seconds) can be used when running resmoke that will run
When a test times out, it will analyze: the hang-analyzer on all processes related to that test. When a test times out, it will analyze:
- The proccess the testcase created. - The proccess the testcase created.
- Any child of the testcase process. - Any child of the testcase process.
@ -75,23 +88,31 @@ When a test times out, it will analyze:
| |-mongo (ENV_MARKER=2, pgid 9) | |-mongo (ENV_MARKER=2, pgid 9)
``` ```
Caution: Should a process be created in a new process group as `bar` is in the above example, it may be missed on MacOS. If `foo` crashes/exits, `bar` is orphaned and reparented to the `init` process. It is no longer a "child" and it is not generally possible to read environment variables of arbitrary processes on MacOS with System Integrity Protection (SIP) enabled. Caution: Should a process be created in a new process group as `bar` is in the above example, it may
be missed on MacOS. If `foo` crashes/exits, `bar` is orphaned and reparented to the `init` process.
It is no longer a "child" and it is not generally possible to read environment variables of
arbitrary processes on MacOS with System Integrity Protection (SIP) enabled.
#### When a task fails normally #### When a task fails normally
When a task fails normally, core dumps may also be generated by the linux kernel and put into the working directory. When a task fails normally, core dumps may also be generated by the linux kernel and put into the
working directory.
#### Note on archival/upload in Evergreen #### Note on archival/upload in Evergreen
We use a non-standard way of uploading core dumps to evergreen due to [timeout issues](https://jira.mongodb.org/browse/SERVER-73171) we were facing when archiving and uploading them normally through evergreen commands. We use a non-standard way of uploading core dumps to evergreen due to
After investigation of the above issue, we found that compressing and uploading core dumps was slow for a couple reasons: [timeout issues](https://jira.mongodb.org/browse/SERVER-73171) we were facing when archiving and
uploading them normally through evergreen commands. After investigation of the above issue, we found
that compressing and uploading core dumps was slow for a couple reasons:
1. Tarring all of the core dumps into one file takes up a lot of disk IO and disk IO was the bottleneck. 1. Tarring all of the core dumps into one file takes up a lot of disk IO and disk IO was the
bottleneck.
2. Gzip is single threaded. 2. Gzip is single threaded.
3. Uploading a big file synchronously is not fast. 3. Uploading a big file synchronously is not fast.
We made a [script](https://github.com/mongodb/mongo/blob/master/buildscripts/fast_archive.py) that gzips all of the core dumps in parallel and uploads them to S3 individually asynchronously. We made a [script](https://github.com/mongodb/mongo/blob/master/buildscripts/fast_archive.py) that
This solved all of the problems listed above. gzips all of the core dumps in parallel and uploads them to S3 individually asynchronously. This
solved all of the problems listed above.
### Generating the core analyzer task ### Generating the core analyzer task
@ -104,18 +125,26 @@ sequenceDiagram
Generated Task ->> Core Analyzer Output: Overwrite output with<br/> core dump analysis Generated Task ->> Core Analyzer Output: Overwrite output with<br/> core dump analysis
``` ```
In the [post task](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2665) section, we [define](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2184) the evergreen function used to generate the core analyzer task. In the
This [script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/buildscripts/resmokelib/hang_analyzer/gen_hang_analyzer_tasks.py) runs on every task (passing or failing) and is independent of anything else that happened prior in the task and does all of the checks to ensure it should run. [post task](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2665)
These checks include: section, we
[define](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2184)
the evergreen function used to generate the core analyzer task. This
[script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/buildscripts/resmokelib/hang_analyzer/gen_hang_analyzer_tasks.py)
runs on every task (passing or failing) and is independent of anything else that happened prior in
the task and does all of the checks to ensure it should run. These checks include:
1. The task is being run on an operating system supported by the core analyzer. 1. The task is being run on an operating system supported by the core analyzer.
2. The task has any core dumps uploaded and attached to it. 2. The task has any core dumps uploaded and attached to it.
3. At least one of the binaries uploaded is from a binary we know how to process. 3. At least one of the binaries uploaded is from a binary we know how to process.
The output from this script is a json file in the format evergreen expects. The output from this script is a json file in the format evergreen expects. We then pass this json
We then pass this json file into the `generate.tasks` evergreen command to generate the task. file into the `generate.tasks` evergreen command to generate the task.
After the task is generated, we have [another script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2213) that finds the task that was just generated and attaches it to the current task being ran. After the task is generated, we have
[another script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2213)
that finds the task that was just generated and attaches it to the current task being ran.
The reason we upload a temporary file to the original task is to attach that s3 file link to the task. The reason we upload a temporary file to the original task is to attach that s3 file link to the
Evergreen does not currently have a way to attach files to a task after it was ran so we need to upload something while the original task is in progress. task. Evergreen does not currently have a way to attach files to a task after it was ran so we need
to upload something while the original task is in progress.

View File

@ -1,17 +1,15 @@
# Powercycle README # Powercycle README
Power cycling is the process of turning hardware off and then turning it on again. Power cycling is the process of turning hardware off and then turning it on again. Powercycle test
Powercycle test is designed to work across two machines, one machine is a "server" is designed to work across two machines, one machine is a "server" that controls and monitors the
that controls and monitors the workflow and a "client" that runs Mongo server and workflow and a "client" that runs Mongo server and is remotely crashed by "server" regularly.
is remotely crashed by "server" regularly.
In evergreen the localhost that runs the task acts as a "server" and the remote In evergreen the localhost that runs the task acts as a "server" and the remote host which is
host which is created by `host.create` evergreen command acts as a "client". created by `host.create` evergreen command acts as a "client".
Powercycle test is the part of resmoke. Python 3.13+ with python venv is required to Powercycle test is the part of resmoke. Python 3.13+ with python venv is required to run the resmoke
run the resmoke (python3 from [mongodbtoolchain](http://mongodbtoolchain.build.10gen.cc/) (python3 from [mongodbtoolchain](http://mongodbtoolchain.build.10gen.cc/) is highly recommended).
is highly recommended). Python venv can be set up by running in the root mongo repo Python venv can be set up by running in the root mongo repo directory:
directory:
``` ```
python3 -m venv python3-venv python3 -m venv python3-venv
@ -48,20 +46,18 @@ buildscripts/resmokelib/powercycle/__init__.py
### Set up EC2 instance ### Set up EC2 instance
1. `Evergreen host.create command` - in Evergreen the remote host is created with 1. `Evergreen host.create command` - in Evergreen the remote host is created with the same distro as
the same distro as the localhost runs and some initial connections are made to ensure the localhost runs and some initial connections are made to ensure it's up before further steps
it's up before further steps 2. `Resmoke powercycle setup-host command` - prepares remote host via ssh to run the powercycle
2. `Resmoke powercycle setup-host command` - prepares remote host via ssh to run test:
the powercycle test:
``` ```
python buildscripts/resmoke.py powercycle setup-host python buildscripts/resmoke.py powercycle setup-host
``` ```
Powercycle setup-host operations are located in Powercycle setup-host operations are located in
`buildscripts/resmokelib/powercycle/setup/__init__.py`. `buildscripts/resmokelib/powercycle/setup/__init__.py`. `expansions.yml` file is used to load the
`expansions.yml` file is used to load the configuration to run operations which is configuration to run operations which is created by `expansions.write` command in Evergreen.
created by `expansions.write` command in Evergreen.
It runs several operations via ssh: It runs several operations via ssh:
@ -69,12 +65,12 @@ It runs several operations via ssh:
- copy `buildscripts` and `mongoDB executables` from localhost to the remote host - copy `buildscripts` and `mongoDB executables` from localhost to the remote host
- set up python venv on the remote host - set up python venv on the remote host
- set up curator to collect system & process stats on the remote host - set up curator to collect system & process stats on the remote host
- install [NotMyFault](https://docs.microsoft.com/en-us/sysinternals/downloads/notmyfault) - install [NotMyFault](https://docs.microsoft.com/en-us/sysinternals/downloads/notmyfault) to crash
to crash Windows (only on Windows) Windows (only on Windows)
Remote operation via ssh implementation is located in Remote operation via ssh implementation is located in
`buildscripts/resmokelib/powercycle/lib/remote_operations.py`. `buildscripts/resmokelib/powercycle/lib/remote_operations.py`. The following operations are
The following operations are supported: supported:
- `copy_to` - copy files from the localhost to the remote host - `copy_to` - copy files from the localhost to the remote host
- `copy_from` - copy files from the remote host to the localhost - `copy_from` - copy files from the remote host to the localhost
@ -82,9 +78,8 @@ The following operations are supported:
### Run powercycle test ### Run powercycle test
`Resmoke powercycle run command` - runs the powercycle test on the localhost `Resmoke powercycle run command` - runs the powercycle test on the localhost which runs remote
which runs remote operations on the remote host via ssh and local validation operations on the remote host via ssh and local validation checks:
checks:
``` ```
python buildscripts/resmoke.py powercycle run \ python buildscripts/resmoke.py powercycle run \
@ -95,26 +90,26 @@ python buildscripts/resmoke.py powercycle run \
###### Resmoke powercycle run arguments ###### Resmoke powercycle run arguments
The arguments for resmoke powercycle run command are defined in `add_subcommand()` The arguments for resmoke powercycle run command are defined in `add_subcommand()` function in
function in `buildscripts/resmokelib/powercycle/__init__.py`. When powercycle test `buildscripts/resmokelib/powercycle/__init__.py`. When powercycle test runs remote operations on the
runs remote operations on the remote host it calls the copied version of this script remote host it calls the copied version of this script on the remote host. Thus, some resmoke
on the remote host. Thus, some resmoke powercycle run command arguments are needed powercycle run command arguments are needed for the remote call and shouldn't be used when calling
for the remote call and shouldn't be used when calling the script on the localhost. the script on the localhost.
`--taskName` argument is used to get powercycle task configurations that are stored `--taskName` argument is used to get powercycle task configurations that are stored in
in `buildscripts/resmokeconfig/powercycle/powercycle_tasks.yml` `buildscripts/resmokeconfig/powercycle/powercycle_tasks.yml`
There is a known issue with `--setParameter` mongod options incorrectly processed There is a known issue with `--setParameter` mongod options incorrectly processed from
from `mongod_options` that is described in [SERVER-47621](https://jira.mongodb.org/browse/SERVER-47621) `mongod_options` that is described in [SERVER-47621](https://jira.mongodb.org/browse/SERVER-47621)
###### Powercycle test implementation ###### Powercycle test implementation
The powercycle test main implementation is located in `main()` function in The powercycle test main implementation is located in `main()` function in
`buildscripts/resmokelib/powercycle/powercycle.py`. `buildscripts/resmokelib/powercycle/powercycle.py`.
The value of `--remoteOperation` argument is used to distinguish if we are running the script The value of `--remoteOperation` argument is used to distinguish if we are running the script on the
on the localhost or on the remote host. localhost or on the remote host. `remote_handler()` function performs the following remote
`remote_handler()` function performs the following remote operations: operations:
- `noop` - do nothing - `noop` - do nothing
- `crash_server` - internally crash the server - `crash_server` - internally crash the server
@ -157,17 +152,17 @@ When running on localhost the powercycle test loops do the following steps:
### Save diagnostics ### Save diagnostics
`Resmoke powercycle save-diagnostics command` - copies powercycle diagnostics `Resmoke powercycle save-diagnostics command` - copies powercycle diagnostics files from the remote
files from the remote host to the localhost (mainly used by Evergreen): host to the localhost (mainly used by Evergreen):
``` ```
python buildscripts/resmoke.py powercycle save-diagnostics python buildscripts/resmoke.py powercycle save-diagnostics
``` ```
Powercycle save-diagnostics operations are located in Powercycle save-diagnostics operations are located in
`buildscripts/resmokelib/powercycle/save_diagnostics/__init__.py`. `buildscripts/resmokelib/powercycle/save_diagnostics/__init__.py`. `expansions.yml` file is used to
`expansions.yml` file is used to load the configuration to run operations which is load the configuration to run operations which is created by `expansions.write` command in
created by `expansions.write` command in Evergreen. Evergreen.
It runs several operations via ssh: It runs several operations via ssh:
@ -188,15 +183,14 @@ It runs several operations via ssh:
### Remote hang analyzer (optional) ### Remote hang analyzer (optional)
`Resmoke powercycle remote-hang-analyzer command` - runs hang analyzer on the `Resmoke powercycle remote-hang-analyzer command` - runs hang analyzer on the remote host (mainly
remote host (mainly used by Evergreen): used by Evergreen):
``` ```
$python buildscripts/resmoke.py powercycle remote-hang-analyzer $python buildscripts/resmoke.py powercycle remote-hang-analyzer
``` ```
Powercycle remote-hang-analyzer command calls resmoke hang analyzer on the Powercycle remote-hang-analyzer command calls resmoke hang analyzer on the remote host and is
remote host and is located in located in `buildscripts/resmokelib/powercycle/remote_hang_analyzer/__init__.py` `expansions.yml`
`buildscripts/resmokelib/powercycle/remote_hang_analyzer/__init__.py` file is used to load the configuration to run this command which is created by `expansions.write`
`expansions.yml` file is used to load the configuration to run this command which is command in Evergreen.
created by `expansions.write` command in Evergreen.

View File

@ -4,24 +4,39 @@ Fixtures define a specific topology that tests run against.
## Supported Fixtures ## Supported Fixtures
Specify any of the following as the `fixture` in your [Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config: Specify any of the following as the `fixture` in your
[Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config:
- [`BulkWriteFixture`](./bulk_write.py) - Fixture which provides JSTests with a set of clusters to run tests against. - [`BulkWriteFixture`](./bulk_write.py) - Fixture which provides JSTests with a set of clusters to
- [`ExternalFixture`](./external.py) - Fixture which provides JSTests capability to connect to external (non-resmoke) cluster. run tests against.
- [`ExternalShardedClusterFixture`](./shardedcluster.py) - Fixture to interact with external sharded cluster fixture. - [`ExternalFixture`](./external.py) - Fixture which provides JSTests capability to connect to
- [`MongoDFixture`](./standalone.py) - Fixture which provides JSTests with a standalone mongod to run against. external (non-resmoke) cluster.
- [`MongoTFixture`](./mongot.py) - Fixture which provides JSTests with a mongot to run alongside a mongod. - [`ExternalShardedClusterFixture`](./shardedcluster.py) - Fixture to interact with external sharded
- [`MultiReplicaSetFixture`](./multi_replica_set.py) - Fixture which provides JSTests with a set of replica sets to run against. cluster fixture.
- [`MultiShardedClusterFixture`](./multi_sharded_cluster.py) - Fixture which provides JSTests with a set of sharded clusters to run against. - [`MongoDFixture`](./standalone.py) - Fixture which provides JSTests with a standalone mongod to
- [`ReplicaSetFixture`](./replicaset.py) - Fixture which provides JSTests with a replica set to run against. run against.
- [`ShardedClusterFixture`](./shardedcluster.py) - Fixture which provides JSTests with a sharded cluster to run against. - [`MongoTFixture`](./mongot.py) - Fixture which provides JSTests with a mongot to run alongside a
- Used when the MongoDB deployment is started by the JavaScript test itself with `MongoRunner`, `ReplSetTest`, or `ShardingTest`. mongod.
- [`YesFixture`](./yesfixture.py) - Fixture which spawns several `yes` executables to generate lots of log messages. - [`MultiReplicaSetFixture`](./multi_replica_set.py) - Fixture which provides JSTests with a set of
replica sets to run against.
- [`MultiShardedClusterFixture`](./multi_sharded_cluster.py) - Fixture which provides JSTests with a
set of sharded clusters to run against.
- [`ReplicaSetFixture`](./replicaset.py) - Fixture which provides JSTests with a replica set to run
against.
- [`ShardedClusterFixture`](./shardedcluster.py) - Fixture which provides JSTests with a sharded
cluster to run against.
- Used when the MongoDB deployment is started by the JavaScript test itself with `MongoRunner`,
`ReplSetTest`, or `ShardingTest`.
- [`YesFixture`](./yesfixture.py) - Fixture which spawns several `yes` executables to generate lots
of log messages.
## Interfaces ## Interfaces
- [`Fixture`](./interface.py) - Base class for all fixtures. - [`Fixture`](./interface.py) - Base class for all fixtures.
- [`MultiClusterFixture`](./interface.py) - Base class for fixtures that may consist of multiple independent participant clusters. - [`MultiClusterFixture`](./interface.py) - Base class for fixtures that may consist of multiple
- The participant clusters can function independently without coordination, but are bound together only for some duration as they participate in some process such as a migration. The participant clusters are fixtures themselves. independent participant clusters.
- The participant clusters can function independently without coordination, but are bound together
only for some duration as they participate in some process such as a migration. The participant
clusters are fixtures themselves.
- [`NoOpFixture`](./interface.py) - A Fixture implementation that does not start any servers. - [`NoOpFixture`](./interface.py) - A Fixture implementation that does not start any servers.
- [`ReplFixture`](./interface.py) - Base class for all fixtures that support replication. - [`ReplFixture`](./interface.py) - Base class for all fixtures that support replication.

View File

@ -4,84 +4,145 @@ Hooks are a mechanism to run routines _around_ the tests, at the test content bo
## Supported hooks ## Supported hooks
Specify any of the following as the `hooks` in your [Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config: Specify any of the following as the `hooks` in your
[Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config:
- [`AnalyzeShardKeysInBackground`](./analyze_shard_key.py) - A hook for running `analyzeShardKey` commands while a test is running. - [`AnalyzeShardKeysInBackground`](./analyze_shard_key.py) - A hook for running `analyzeShardKey`
- [`AntithesisLogging`](./antithesis_logging.py) - Prints antithesis commands before & after test run. commands while a test is running.
- [`AntithesisLogging`](./antithesis_logging.py) - Prints antithesis commands before & after test
run.
- [`BackgroundInitialSync`](./initialsync.py) - Background Initial Sync - [`BackgroundInitialSync`](./initialsync.py) - Background Initial Sync
- After every test, this hook checks if a background node has finished initial sync and if so validates it, tears it down, and restarts it. - After every test, this hook checks if a background node has finished initial sync and if so
- This test accepts a parameter `n` that specifies a number of tests after which it will wait for replication to finish before validating and restarting the initial sync node. validates it, tears it down, and restarts it.
- This requires the ReplicaSetFixture to be started with `start_initial_sync_node=True`. If used at the same time as `CleanEveryN`, the `n` value passed to this hook should be equal to the `n` value for `CleanEveryN`. - This test accepts a parameter `n` that specifies a number of tests after which it will wait for
- [`CheckClusterIndexConsistency`](./cluster_index_consistency.py) - Checks that indexes are the same across chunks for the same collections. replication to finish before validating and restarting the initial sync node.
- [`CheckMetadataConsistencyInBackground`](./metadata_consistency) - Check the metadata consistency of a sharded cluster. - This requires the ReplicaSetFixture to be started with `start_initial_sync_node=True`. If used
- [`CheckOrphansDeleted`](./orphans.py) - Check if the range deleter failed to delete any orphan documents. at the same time as `CleanEveryN`, the `n` value passed to this hook should be equal to the `n`
- [`CheckReplDBHashInBackground`](./dbhash_background.py) - A hook for comparing the dbhashes of all replica set members while a test is running. value for `CleanEveryN`.
- [`CheckClusterIndexConsistency`](./cluster_index_consistency.py) - Checks that indexes are the
same across chunks for the same collections.
- [`CheckMetadataConsistencyInBackground`](./metadata_consistency) - Check the metadata consistency
of a sharded cluster.
- [`CheckOrphansDeleted`](./orphans.py) - Check if the range deleter failed to delete any orphan
documents.
- [`CheckReplDBHashInBackground`](./dbhash_background.py) - A hook for comparing the dbhashes of all
replica set members while a test is running.
- [`CheckReplDBHash`](./dbhash.py) - Check if the dbhashes match. - [`CheckReplDBHash`](./dbhash.py) - Check if the dbhashes match.
- [`CheckReplOplogs`](./oplog.py) - Check that `local.oplog.rs` matches on the primary and secondaries. - [`CheckReplOplogs`](./oplog.py) - Check that `local.oplog.rs` matches on the primary and
- [`CheckReplPreImagesConsistency`](./preimages_consistency.py) - Check that `config.system.preimages` is consistent between the primary and secondaries. secondaries.
- [`CheckRoutingTableConsistency`](./routing_table_consistency.py) - Verifies the absence of corrupted entries in config.chunks and config.collections. - [`CheckReplPreImagesConsistency`](./preimages_consistency.py) - Check that
- [`CheckShardFilteringMetadata`](./shard_filtering_metadata.py) - Inspect filtering metadata on shards `config.system.preimages` is consistent between the primary and secondaries.
- [`CheckRoutingTableConsistency`](./routing_table_consistency.py) - Verifies the absence of
corrupted entries in config.chunks and config.collections.
- [`CheckShardFilteringMetadata`](./shard_filtering_metadata.py) - Inspect filtering metadata on
shards
- [`CleanEveryN`](./cleanup.py) - Restart the fixture after it has ran `n` tests. - [`CleanEveryN`](./cleanup.py) - Restart the fixture after it has ran `n` tests.
- [`CleanupConcurrencyWorkloads`](./cleanup_concurrency_workloads.py) - Drop all databases, except those that have been excluded. - [`CleanupConcurrencyWorkloads`](./cleanup_concurrency_workloads.py) - Drop all databases, except
- For concurrency tests that run on different DBs, drop all databases except ones in `exclude_dbs`. For tests that run on the same DB, drop all databases except ones in `exclude_dbs` and the DB used by the test/workloads. For tests that run on the same collection, drop all collections in all databases except for `exclude_dbs` and the collection used by the test/workloads. those that have been excluded.
- For concurrency tests that run on different DBs, drop all databases except ones in
`exclude_dbs`. For tests that run on the same DB, drop all databases except ones in
`exclude_dbs` and the DB used by the test/workloads. For tests that run on the same collection,
drop all collections in all databases except for `exclude_dbs` and the collection used by the
test/workloads.
- On mongod-related fixtures, this will clear the dbpath - On mongod-related fixtures, this will clear the dbpath
- [`ClusterParameter`](./cluster_parameter.py) - Sets the specified cluster server parameter. - [`ClusterParameter`](./cluster_parameter.py) - Sets the specified cluster server parameter.
- [`ContinuousAddRemoveShard`](./add_remove_shards.py) - Continuously adds and removes shards at regular intervals. If running with `configsvr` transitions, will transition in/out of config shard mode. - [`ContinuousAddRemoveShard`](./add_remove_shards.py) - Continuously adds and removes shards at
- [`ContinuousInitialSync`](./continuous_initial_sync.py) - Periodically initial sync nodes then step them up. regular intervals. If running with `configsvr` transitions, will transition in/out of config shard
- [`ContinuousStepdown`](./stepdown.py) - regularly connect to replica sets and send a `replSetStepDown` command. mode.
- [`ContinuousTransition`](./replicaset_transition_to_and_from_csrs.py) - connects to replica sets and transitions them from replica set to CSRS node in the background. - [`ContinuousInitialSync`](./continuous_initial_sync.py) - Periodically initial sync nodes then
- [`DoReconfigInBackground`](./reconfig_background.py) - A hook for running a safe reconfig against a replica set while a test is running. step them up.
- [`DropConfigCacheCollections`](./drop_config_cache_collections.py) - A hook for dropping random entries of config.cache.collections in shards. - [`ContinuousStepdown`](./stepdown.py) - regularly connect to replica sets and send a
- [`DropSessionsCollection`](./drop_sessions_collection.py) - A hook for dropping and recreating config.system.sessions while tests are running. `replSetStepDown` command.
- [`ContinuousTransition`](./replicaset_transition_to_and_from_csrs.py) - connects to replica sets
and transitions them from replica set to CSRS node in the background.
- [`DoReconfigInBackground`](./reconfig_background.py) - A hook for running a safe reconfig against
a replica set while a test is running.
- [`DropConfigCacheCollections`](./drop_config_cache_collections.py) - A hook for dropping random
entries of config.cache.collections in shards.
- [`DropSessionsCollection`](./drop_sessions_collection.py) - A hook for dropping and recreating
config.system.sessions while tests are running.
- [`DropUserCollections`](./drop_user_collections.py) - Drops all user collections. - [`DropUserCollections`](./drop_user_collections.py) - Drops all user collections.
- [`EnableSpuriousWriteConflicts`](./enable_spurious_write_conflicts.py) - Toggles write conflicts. - [`EnableSpuriousWriteConflicts`](./enable_spurious_write_conflicts.py) - Toggles write conflicts.
- [`FCVUpgradeDowngradeInBackground`](./fcv_upgrade_downgrade.py) - A hook to run background FCV upgrade and downgrade against test servers while a test is running. - [`FCVUpgradeDowngradeInBackground`](./fcv_upgrade_downgrade.py) - A hook to run background FCV
- [`FuzzRuntimeParameters`](./fuzz_runtime_parameters.py) - Regularly connect to nodes and sends them a `setParameter` command; uses the [Config Fuzzer](../../../../buildscripts/resmokelib/generate_fuzz_config/README.md). upgrade and downgrade against test servers while a test is running.
- [`FuzzRuntimeStress`](./fuzz_runtime_stress.py) - Test hook that periodically changes the amount of stress the system is experiencing. - [`FuzzRuntimeParameters`](./fuzz_runtime_parameters.py) - Regularly connect to nodes and sends
them a `setParameter` command; uses the
[Config Fuzzer](../../../../buildscripts/resmokelib/generate_fuzz_config/README.md).
- [`FuzzRuntimeStress`](./fuzz_runtime_stress.py) - Test hook that periodically changes the amount
of stress the system is experiencing.
- [`FuzzerRestoreSettings`](./fuzzer_restore_settings.py) - Cleans up unwanted changes from fuzzer. - [`FuzzerRestoreSettings`](./fuzzer_restore_settings.py) - Cleans up unwanted changes from fuzzer.
- [`GenerateAndCheckPerfResults`](./generate_and_check_perf_results.py) - Combine JSON results from individual benchmarks and check their reported values against any thresholds set for them. - [`GenerateAndCheckPerfResults`](./generate_and_check_perf_results.py) - Combine JSON results from
- Combines test results from individual benchmark files to a single file. This is useful for generating the json file to feed into the Evergreen performance visualization plugin. individual benchmarks and check their reported values against any thresholds set for them.
- Combines test results from individual benchmark files to a single file. This is useful for
generating the json file to feed into the Evergreen performance visualization plugin.
- [`HelloDelays`](./hello_failures.py) - Sets Hello fault injections. - [`HelloDelays`](./hello_failures.py) - Sets Hello fault injections.
- [`IntermediateInitialSync`](./initialsync.py) - Intermediate Initial Sync - [`IntermediateInitialSync`](./initialsync.py) - Intermediate Initial Sync
- This hook accepts a parameter `n` that specifies a number of tests after which it will start up a node to initial sync, wait for replication to finish, and then validate the data. - This hook accepts a parameter `n` that specifies a number of tests after which it will start up
a node to initial sync, wait for replication to finish, and then validate the data.
- This requires the ReplicaSetFixture to be started with 'start_initial_sync_node=True'. - This requires the ReplicaSetFixture to be started with 'start_initial_sync_node=True'.
- [`LagOplogApplicationInBackground`](./secondary_lag.py) - Toggles secondary oplog application lag. - [`LagOplogApplicationInBackground`](./secondary_lag.py) - Toggles secondary oplog application lag.
- [`LibfuzzerHook`](./cpp_libfuzzer.py) - Merges inputs after a fuzzer run. - [`LibfuzzerHook`](./cpp_libfuzzer.py) - Merges inputs after a fuzzer run.
- [`MagicRestoreEveryN`](./magic_restore.py) - Open a backup cursor and run magic restore process after `n` tests have run. - [`MagicRestoreEveryN`](./magic_restore.py) - Open a backup cursor and run magic restore process
after `n` tests have run.
- Requires the use of `MagicRestoreFixture`. - Requires the use of `MagicRestoreFixture`.
- [`PeriodicKillSecondaries`](./periodic_kill_secondaries.py) - Periodically kills the secondaries in a replica set. - [`PeriodicKillSecondaries`](./periodic_kill_secondaries.py) - Periodically kills the secondaries
- Also verifies that the secondaries can reach the SECONDARY state without having connectivity to the primary after an unclean shutdown. in a replica set.
- [`PeriodicStackTrace`](./periodic_stack_trace.py) - Test hook that sends the stacktracing signal to mongo processes at randomized intervals. - Also verifies that the secondaries can reach the SECONDARY state without having connectivity to
- [`QueryableServerHook`](./queryable_server_hook.py) - Starts the queryable server before each test for queryable restores. Restarts the queryable server between tests. the primary after an unclean shutdown.
- [`RotateExecutionControlParams`](./rotate_execution_control_params.py) - Periodically rotates 'executionControlConcurrencyAdjustmentAlgorithm' and deprioritization server parameters to random valid values. - [`PeriodicStackTrace`](./periodic_stack_trace.py) - Test hook that sends the stacktracing signal
- [`RunChangeStreamsInBackground`](./change_streams.py) - Run in the background full cluster change streams while a test is running. Open and close the change stream every `1..10` tests (random using `config.RANDOM_SEED`). to mongo processes at randomized intervals.
- [`RunDBCheckInBackground`](./dbcheck_background.py) - A hook for running `dbCheck` on a replica set while a test is running. - [`QueryableServerHook`](./queryable_server_hook.py) - Starts the queryable server before each test
- This includes dbhashes for all non-local databases and non-replicated system collections that match on the primary and secondaries. for queryable restores. Restarts the queryable server between tests.
- It also will check the performance results against any thresholds that are set for each benchmark. If no thresholds are set for a test, this hook should always pass. - [`RotateExecutionControlParams`](./rotate_execution_control_params.py) - Periodically rotates
- [`RunQueryStats`](./run_query_stats.py) - Runs `$queryStats` after every test, and clears the query stats store before every test. 'executionControlConcurrencyAdjustmentAlgorithm' and deprioritization server parameters to random
valid values.
- [`RunChangeStreamsInBackground`](./change_streams.py) - Run in the background full cluster change
streams while a test is running. Open and close the change stream every `1..10` tests (random
using `config.RANDOM_SEED`).
- [`RunDBCheckInBackground`](./dbcheck_background.py) - A hook for running `dbCheck` on a replica
set while a test is running.
- This includes dbhashes for all non-local databases and non-replicated system collections that
match on the primary and secondaries.
- It also will check the performance results against any thresholds that are set for each
benchmark. If no thresholds are set for a test, this hook should always pass.
- [`RunQueryStats`](./run_query_stats.py) - Runs `$queryStats` after every test, and clears the
query stats store before every test.
- [`SimulateCrash`](./simulate_crash.py) - A hook to simulate crashes. - [`SimulateCrash`](./simulate_crash.py) - A hook to simulate crashes.
- [`ValidateCollections`](./validate.py) - Run full validation. - [`ValidateCollections`](./validate.py) - Run full validation.
- [`ValidateCollectionsInBackground`](./validate_background.py) - A hook to run background collection validation against test servers while a test is running. - [`ValidateCollectionsInBackground`](./validate_background.py) - A hook to run background
- This will run on all collections in all databases on every stand-alone node, primary replica-set node, or primary shard node. collection validation against test servers while a test is running.
- [`ValidateDirectSecondaryReads`](./validate_direct_secondary_reads.py) - Only supported in suites that use `ReplicaSetFixture`. - This will run on all collections in all databases on every stand-alone node, primary replica-set
- To be used with `set_read_preference_secondary.js` and `implicit_enable_profiler.js` in suites that read directly from secondaries in a replica set. Check the profiler collections of all databases at the end of the suite to verify that each secondary only ran the read commands it got directly from the shell. node, or primary shard node.
- [`ValidateDirectSecondaryReads`](./validate_direct_secondary_reads.py) - Only supported in suites
that use `ReplicaSetFixture`.
- To be used with `set_read_preference_secondary.js` and `implicit_enable_profiler.js` in suites
that read directly from secondaries in a replica set. Check the profiler collections of all
databases at the end of the suite to verify that each secondary only ran the read commands it
got directly from the shell.
- [`WaitForReplication`](./wait_for_replication.py) - Wait for replication to complete. - [`WaitForReplication`](./wait_for_replication.py) - Wait for replication to complete.
## Interfaces ## Interfaces
All hooks inherit from the [`buildscripts.resmokelib.testing.hooks.interface.Hook`](./interface.py) parent class and can override any subset of the following empty base methods: All hooks inherit from the [`buildscripts.resmokelib.testing.hooks.interface.Hook`](./interface.py)
parent class and can override any subset of the following empty base methods:
- `before_suite` - `before_suite`
- `before_test` - `before_test`
- `after_test` - `after_test`
- `after_suite` - `after_suite`
At least 1 base method must be overridden, otherwise the hook will not do anything at all. During test suite execution, each hook runs its custom logic in the respective scenarios. Some customizable tasks that hooks can perform include: _validating data, deleting data, performing cleanup_, etc. At least 1 base method must be overridden, otherwise the hook will not do anything at all. During
test suite execution, each hook runs its custom logic in the respective scenarios. Some customizable
tasks that hooks can perform include: _validating data, deleting data, performing cleanup_, etc.
- [`BGHook`](./bghook.py) - A hook that repeatedly calls `run_action()` in a background thread for the duration of the test suite. - [`BGHook`](./bghook.py) - A hook that repeatedly calls `run_action()` in a background thread for
- [`DataConsistencyHook`](./jsfile.py) - A hook for running a static JavaScript file that checks data consistency of the server. the duration of the test suite.
- If the mongo shell process running the JavaScript file exits with a non-zero return code, then an `errors.ServerFailure` exception is raised to cause resmoke.py's test execution to stop. - [`DataConsistencyHook`](./jsfile.py) - A hook for running a static JavaScript file that checks
data consistency of the server.
- If the mongo shell process running the JavaScript file exits with a non-zero return code, then
an `errors.ServerFailure` exception is raised to cause resmoke.py's test execution to stop.
- [`Hook`](./interface.py) - Common interface all Hooks will inherit from. - [`Hook`](./interface.py) - Common interface all Hooks will inherit from.
- [`JSHook`](./jsfile.py) - A hook interface with a static JavaScript file to execute. - [`JSHook`](./jsfile.py) - A hook interface with a static JavaScript file to execute.
- [`PerClusterDataConsistencyHook`](./jsfile.py) - A hook that runs on each independent cluster of the fixture. - [`PerClusterDataConsistencyHook`](./jsfile.py) - A hook that runs on each independent cluster of
the fixture.
- The independent cluster itself may be another fixture. - The independent cluster itself may be another fixture.

View File

@ -1,33 +1,52 @@
# TestCases # TestCases
TestCases extend Python-based `unittest.TestCase` objects that resmoke can run as different "kinds" of tests. TestCases extend Python-based `unittest.TestCase` objects that resmoke can run as different "kinds"
of tests.
## Supported TestCases ## Supported TestCases
Specify any of the following as the `test_kind` in your [Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config: Specify any of the following as the `test_kind` in your
[Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config:
- `all_versions_js_test`: [`AllVersionsJSTestCase`](./jstest.py) - Alias for JSTestCase for multiversion passthrough suites. - `all_versions_js_test`: [`AllVersionsJSTestCase`](./jstest.py) - Alias for JSTestCase for
- It runs with all combinations of versions of replica sets and sharded clusters. The distinct name is picked up by task generation. multiversion passthrough suites.
- It runs with all combinations of versions of replica sets and sharded clusters. The distinct
name is picked up by task generation.
- `benchmark_test`: [`BenchmarkTestCase`](./benchmark_test.py) - A Benchmark test to execute. - `benchmark_test`: [`BenchmarkTestCase`](./benchmark_test.py) - A Benchmark test to execute.
- `bulk_write_cluster_js_test`: [`BulkWriteClusterTestCase`](./bulk_write_cluster_js_test.py) - A test to execute with connection data for multiple clusters passed through TestData. - `bulk_write_cluster_js_test`: [`BulkWriteClusterTestCase`](./bulk_write_cluster_js_test.py) - A
- `cpp_integration_test`: [`CPPIntegrationTestCase`](./cpp_integration_test.py) - A C++ integration test to execute. test to execute with connection data for multiple clusters passed through TestData.
- `cpp_libfuzzer_test`: [`CPPLibfuzzerTestCase`](./cpp_libfuzzer_test.py) - A C++ libfuzzer test to execute. - `cpp_integration_test`: [`CPPIntegrationTestCase`](./cpp_integration_test.py) - A C++ integration
test to execute.
- `cpp_libfuzzer_test`: [`CPPLibfuzzerTestCase`](./cpp_libfuzzer_test.py) - A C++ libfuzzer test to
execute.
- `cpp_unit_test`: [`CPPUnitTestCase`](./cpp_unittest.py) - A C++ unit test to execute. - `cpp_unit_test`: [`CPPUnitTestCase`](./cpp_unittest.py) - A C++ unit test to execute.
- `db_test`: [`DBTestCase`](./dbtest.py) - A dbtest to execute. - `db_test`: [`DBTestCase`](./dbtest.py) - A dbtest to execute.
- `fsm_workload_test`: [`FSMWorkloadTestCase`](./fsm_workload_test.py) - A wrapper for several copies of a `_SingleFSMWorkloadTestCase` to execute. - `fsm_workload_test`: [`FSMWorkloadTestCase`](./fsm_workload_test.py) - A wrapper for several
- `js_test`: [`JSTestCase`](./jstest.py) - A wrapper for several copies of a `_SingleJSTestCase` to execute copies of a `_SingleFSMWorkloadTestCase` to execute.
- Around **75% of all suites use the `js_test` kind**. See [jstests/README.md](../../../../jstests/README.md) for specific guidance. - `js_test`: [`JSTestCase`](./jstest.py) - A wrapper for several copies of a `_SingleJSTestCase` to
execute
- Around **75% of all suites use the `js_test` kind**. See
[jstests/README.md](../../../../jstests/README.md) for specific guidance.
- `json_schema_test`: [`JSONSchemaTestCase`](./json_schema_test.py) - A JSON Schema test to execute. - `json_schema_test`: [`JSONSchemaTestCase`](./json_schema_test.py) - A JSON Schema test to execute.
- `magic_restore_js_test`: [`MagicRestoreTestCase`](./magic_restore_js_test.py) - A test to execute for running tests in a try/catch block. - `magic_restore_js_test`: [`MagicRestoreTestCase`](./magic_restore_js_test.py) - A test to execute
- `mongos_test`: [`MongosTestCase`](./mongos_test.py) - A TestCase which runs a mongos binary with the given parameters. for running tests in a try/catch block.
- `multi_stmt_txn_passthrough`: [`MultiStmtTxnTestCase`](./multi_stmt_txn_test.py) - Test case for multi statement transactions. - `mongos_test`: [`MongosTestCase`](./mongos_test.py) - A TestCase which runs a mongos binary with
- `parallel_fsm_workload_test`: [`ParallelFSMWorkloadTestCase`](./fsm_workload_test.py) - An FSM workload to execute. the given parameters.
- `pretty_printer_test`: [`PrettyPrinterTestCase`](./pretty_printer_testcase.py) - A pretty printer test to execute. - `multi_stmt_txn_passthrough`: [`MultiStmtTxnTestCase`](./multi_stmt_txn_test.py) - Test case for
multi statement transactions.
- `parallel_fsm_workload_test`: [`ParallelFSMWorkloadTestCase`](./fsm_workload_test.py) - An FSM
workload to execute.
- `pretty_printer_test`: [`PrettyPrinterTestCase`](./pretty_printer_testcase.py) - A pretty printer
test to execute.
- `py_test`: [`PyTestCase`](./pytest.py) - A python test to execute. - `py_test`: [`PyTestCase`](./pytest.py) - A python test to execute.
- `query_tester_self_test`: [`QueryTesterSelfTestCase`](./query_tester_self_test.py) - A QueryTester self-test to execute. - `query_tester_self_test`: [`QueryTesterSelfTestCase`](./query_tester_self_test.py) - A QueryTester
- `query_tester_server_test`: [`QueryTesterServerTestCase`](./query_tester_server_test.py) - A QueryTester server test to execute. self-test to execute.
- `sdam_json_test`: [`SDAMJsonTestCase`](./sdam_json_test.py) - Server Discovery and Monitoring JSON test case. - `query_tester_server_test`: [`QueryTesterServerTestCase`](./query_tester_server_test.py) - A
- `server_selection_json_test`: [`ServerSelectionJsonTestCase`](./server_selection_json_test.py) - Server Selection JSON test case. QueryTester server test to execute.
- `sdam_json_test`: [`SDAMJsonTestCase`](./sdam_json_test.py) - Server Discovery and Monitoring JSON
test case.
- `server_selection_json_test`: [`ServerSelectionJsonTestCase`](./server_selection_json_test.py) -
Server Selection JSON test case.
- `sleep_test`: [`SleepTestCase`](./sleeptest.py) - SleepTestCase class. - `sleep_test`: [`SleepTestCase`](./sleeptest.py) - SleepTestCase class.
- `tla_plus_test`: [`TLAPlusTestCase`](./tla_plus_test.py) - A TLA+ specification to model-check. - `tla_plus_test`: [`TLAPlusTestCase`](./tla_plus_test.py) - A TLA+ specification to model-check.
@ -36,26 +55,36 @@ Specify any of the following as the `test_kind` in your [Suite](../../../../buil
Top level interfaces: Top level interfaces:
- [`TestCase`](./interface.py) - A test case to execute. The `run_test` method must be implemented. - [`TestCase`](./interface.py) - A test case to execute. The `run_test` method must be implemented.
- [`ProcessTestCase`](./interface.py) - Base class for TestCases that executes an external process. The `_make_process` method must be implemented. - [`ProcessTestCase`](./interface.py) - Base class for TestCases that executes an external process.
The `_make_process` method must be implemented.
Subclasses: Subclasses:
- [`JSRunnerFileTestCase`](./jsrunnerfile.py) - A test case with a static JavaScript runner file to execute. - [`JSRunnerFileTestCase`](./jsrunnerfile.py) - A test case with a static JavaScript runner file to
- [`MultiClientsTestCase`](./jstest.py) - A wrapper for several copies of a SingleTestCase to execute. execute.
- [`MultiClientsTestCase`](./jstest.py) - A wrapper for several copies of a SingleTestCase to
execute.
- [`TestCaseFactory`](./interface.py) - Convenience interface to initialize and build test cases - [`TestCaseFactory`](./interface.py) - Convenience interface to initialize and build test cases
## Fixture TestCases ## Fixture TestCases
These are testcases that are used to coordinate fixture lifecycles via resmoke's internal `FixtureTestCaseManager`. These are testcases that are used to coordinate fixture lifecycles via resmoke's internal
`FixtureTestCaseManager`.
> NOTE This design does lead to seeing "extra" tests in a run, where a fixture sets up, your `N` tests are run, and the fixture tears down, so you see `N+2` "tests" passing via resmoke. > NOTE This design does lead to seeing "extra" tests in a run, where a fixture sets up, your `N`
> tests are run, and the fixture tears down, so you see `N+2` "tests" passing via resmoke.
- [`FixtureTestCase`](./fixture.py) - Base class for the fixture test cases. - [`FixtureTestCase`](./fixture.py) - Base class for the fixture test cases.
- [`FixtureSetupTestCase`](./fixture.py) - TestCase for setting up a fixture. - [`FixtureSetupTestCase`](./fixture.py) - TestCase for setting up a fixture.
- [`FixtureTeardownTestCase`](./fixture.py) - TestCase for tearing down a fixture. - [`FixtureTeardownTestCase`](./fixture.py) - TestCase for tearing down a fixture.
- [`FixtureAbortTestCase`](./fixture.py) - TestCase for killing/aborting a fixture. Intended for use before archiving a failed test. - [`FixtureAbortTestCase`](./fixture.py) - TestCase for killing/aborting a fixture. Intended for use
- When resmoke detects that a test has failed (and [archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is configured), it dynamically generates a new `FixtureAbortTestCase` for immediate execution. This test case sends a `SIGABRT` to each running mongod process. before archiving a failed test.
- When resmoke detects that a test has failed (and
[archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is
configured), it dynamically generates a new `FixtureAbortTestCase` for immediate execution.
This test case sends a `SIGABRT` to each running mongod process.
## Testing TestCases ## Testing TestCases
Self-tests for the testcases themselves can be found in [buildscripts/tests/resmokelib/testing/testcases/](../../../../buildscripts/tests/resmokelib/testing/testcases/) Self-tests for the testcases themselves can be found in
[buildscripts/tests/resmokelib/testing/testcases/](../../../../buildscripts/tests/resmokelib/testing/testcases/)

View File

@ -1,33 +1,55 @@
# S3 Binary # S3 Binary
This is a small utility to help safely manage tool binaries that are stored in MongoDB's S3 bucket for the purpose of using in this repository's build, test, or release processes. This is a small utility to help safely manage tool binaries that are stored in MongoDB's S3 bucket
for the purpose of using in this repository's build, test, or release processes.
### Security ### Security
Any time a binary is pulled down from the internet and executed, there is risk that the binary has been modified unintentionally. This tool creates a hash of the binary that the developer is uploads and stores a record of it in a programmatically accessible Python script (see `buildscripts/s3_binary/hashes.py`). When a tool uses the S3 binary, this interface forces a checksum of the binary before the binary is run, verifying the result against the value stored in `hashes.py` and stopping execution if it doesn't match. Any time a binary is pulled down from the internet and executed, there is risk that the binary has
been modified unintentionally. This tool creates a hash of the binary that the developer is uploads
and stores a record of it in a programmatically accessible Python script (see
`buildscripts/s3_binary/hashes.py`). When a tool uses the S3 binary, this interface forces a
checksum of the binary before the binary is run, verifying the result against the value stored in
`hashes.py` and stopping execution if it doesn't match.
### Hermetic Guarantee ### Hermetic Guarantee
The other risk of relying on a binary stored in S3 is that if the binary is changed, that it will change the results of previously run tests or builds in continuous integration. This is not ideal since there are often cases where an old commit needs to be re-ran to reproduce user issues. Storing the hash in the repository and preventing modifications prevents accidental compatibility breaks of previous commits. The other risk of relying on a binary stored in S3 is that if the binary is changed, that it will
change the results of previously run tests or builds in continuous integration. This is not ideal
since there are often cases where an old commit needs to be re-ran to reproduce user issues. Storing
the hash in the repository and preventing modifications prevents accidental compatibility breaks of
previous commits.
### Example Usage ### Example Usage
Scenario: You have a developer tool called db-contrib-tool that you want to build into a binary, and then use that binary as part of a test process in 10gen/mongo. To use the s3_binary tool you would: Scenario: You have a developer tool called db-contrib-tool that you want to build into a binary, and
then use that binary as part of a test process in 10gen/mongo. To use the s3_binary tool you would:
1. Create your binaries and put them into a single directory on your local system, ex: 1. Create your binaries and put them into a single directory on your local system, ex:
/tmp/db-contrib-tool/db-contrib-tool-v1_windows.exe /tmp/db-contrib-tool/db-contrib-tool-v1_windows.exe /tmp/db-contrib-tool/db-contrib-tool-v1_linux
/tmp/db-contrib-tool/db-contrib-tool-v1_linux
2. Invoke bazel run buildscripts/s3_binary:upload -- /tmp/db-contrib-tool s3://mdb-build-public/db-contrib-tool/v1 2. Invoke bazel run buildscripts/s3_binary:upload -- /tmp/db-contrib-tool
s3://mdb-build-public/db-contrib-tool/v1
3. Follow the prompts, this will then update your local `buildscripts/s3_binary/hashes.py` file mapping the s3 path of each binary to its sha256 hash. 3. Follow the prompts, this will then update your local `buildscripts/s3_binary/hashes.py` file
mapping the s3 path of each binary to its sha256 hash.
4. Update your test code to call: `download_s3_binary(f"s3://mdb-build-public/db-contrib-tool/v1/db-contrib-tool-v1_{os}{ext}")`. This will then automatically verify the download matches the hash at runtime. 4. Update your test code to call:
`download_s3_binary(f"s3://mdb-build-public/db-contrib-tool/v1/db-contrib-tool-v1_{os}{ext}")`.
This will then automatically verify the download matches the hash at runtime.
5. Create a commit with your new code that adds in the `download_s3_binary` call and the `buildscripts/s3_binary/hashes.py` modifications. 5. Create a commit with your new code that adds in the `download_s3_binary` call and the
`buildscripts/s3_binary/hashes.py` modifications.
The case above covers usage in Python. If using another language like starlark for Bazel dependencies, you would follow the same flow but copy the hashes into the starlark code instead of relying off of hashes.py. Please retain the modifications to hashes.py regardless to make it easy to use your binaries in python. The case above covers usage in Python. If using another language like starlark for Bazel
dependencies, you would follow the same flow but copy the hashes into the starlark code instead of
relying off of hashes.py. Please retain the modifications to hashes.py regardless to make it easy to
use your binaries in python.
### Future Additions ### Future Additions
In general, it's less error prone to have the entire flow of building, uploading, and using a binary all happen in an automated pipeline without developer interaction. In the future, this tool will be updated to be easily invocable from a continuous integration pipeline that performs the build and either returns the hashes to the user to be later committed, or automatically submits a PR to update them. In general, it's less error prone to have the entire flow of building, uploading, and using a binary
all happen in an automated pipeline without developer interaction. In the future, this tool will be
updated to be easily invocable from a continuous integration pipeline that performs the build and
either returns the hashes to the user to be later committed, or automatically submits a PR to update
them.

View File

@ -55,8 +55,8 @@ bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-p
## Storage Execution ## Storage Execution
The smoke test suites for storage execution are divided up into components. The smoke test suite The smoke test suites for storage execution are divided up into components. The smoke test suite for
for all of the components that storage execution owns can be run with the following: all of the components that storage execution owns can be run with the following:
``` ```
bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-bsoncolumn,server-collection-write-path,server-external-sorter,server-index-builds,server-key-string,server-storage-engine-integration,server-timeseries-bucket-catalog,server-tracking-allocators,server-ttl //... bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-bsoncolumn,server-collection-write-path,server-external-sorter,server-index-builds,server-key-string,server-storage-engine-integration,server-timeseries-bucket-catalog,server-tracking-allocators,server-ttl //...
@ -76,7 +76,8 @@ There are currently no smoke test integration tests for this component.
### Server-Collection-Write-Path ### Server-Collection-Write-Path
The unit and integration tests for the server-collection-write-path component can be run with the following: The unit and integration tests for the server-collection-write-path component can be run with the
following:
``` ```
bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-collection-write-path //... bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-collection-write-path //...
@ -112,7 +113,8 @@ There are currently no smoke test integration tests for this component.
### Server-Storage-Engine-Integration ### Server-Storage-Engine-Integration
The unit and integration tests for the server-storage-engine-integration component can be run with the following: The unit and integration tests for the server-storage-engine-integration component can be run with
the following:
``` ```
bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-storage-engine-integration //... bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-storage-engine-integration //...

View File

@ -10,7 +10,8 @@ mongodb_repo_root$ source python3-venv/bin/activate
(python3-venv) mongodb_repo_root$ python buildscripts/resmoke.py run --suites resmoke_end2end_tests (python3-venv) mongodb_repo_root$ python buildscripts/resmoke.py run --suites resmoke_end2end_tests
``` ```
- Finer grained control of tests can also be run with by invoking python's unittest main by hand. E.g: - Finer grained control of tests can also be run with by invoking python's unittest main by hand.
E.g:
``` ```
(python3-venv) mongodb_repo_root$ python -m unittest -v buildscripts.tests.resmoke_end2end.test_resmoke.TestTestSelection.test_at_sign_as_replay_file (python3-venv) mongodb_repo_root$ python -m unittest -v buildscripts.tests.resmoke_end2end.test_resmoke.TestTestSelection.test_at_sign_as_replay_file

View File

@ -4,24 +4,26 @@
Antithesis is a third party vendor with an environment that can perform network fuzzing. We can Antithesis is a third party vendor with an environment that can perform network fuzzing. We can
upload images containing `docker-compose.yml` files, which represent various MongoDB topologies, to upload images containing `docker-compose.yml` files, which represent various MongoDB topologies, to
the Antithesis Docker registry. Antithesis runs `docker-compose up` from these images to spin up the Antithesis Docker registry. Antithesis runs `docker-compose up` from these images to spin up the
the corresponding multi-container application in their environment and run a test suite. Network corresponding multi-container application in their environment and run a test suite. Network fuzzing
fuzzing is performed on the topology while the test suite runs & a report is generated by is performed on the topology while the test suite runs & a report is generated by Antithesis
Antithesis identifying bugs. Check out identifying bugs. Check out https://github.com/mongodb/mongo/wiki/Testing-MongoDB-with-Antithesis to
https://github.com/mongodb/mongo/wiki/Testing-MongoDB-with-Antithesis to see an example of how we see an example of how we use Antithesis today.
use Antithesis today.
## Base Images ## Base Images
The `base_images` directory consists of the building blocks for creating a MongoDB test topology. The `base_images` directory consists of the building blocks for creating a MongoDB test topology.
These images are uploaded to the Antithesis Docker registry [nightly](https://github.com/mongodb/mongo/blob/6cf8b162a61173eb372b54213def6dd61e1fd684/etc/evergreen_yml_components/variants/ubuntu/test_dev_master_and_lts_branches_only.yml#L28) during the These images are uploaded to the Antithesis Docker registry
[`antithesis image build and push`](https://github.com/mongodb/mongo/blob/020632e3ae328f276b2c251417b5a39389af6141/etc/evergreen_yml_components/definitions.yml#L2823) function. [nightly](https://github.com/mongodb/mongo/blob/6cf8b162a61173eb372b54213def6dd61e1fd684/etc/evergreen_yml_components/variants/ubuntu/test_dev_master_and_lts_branches_only.yml#L28)
during the
[`antithesis image build and push`](https://github.com/mongodb/mongo/blob/020632e3ae328f276b2c251417b5a39389af6141/etc/evergreen_yml_components/definitions.yml#L2823)
function.
### mongo_binaries ### mongo_binaries
This image contains the latest `mongo`, `mongos` and `mongod` binaries. It can be used to This image contains the latest `mongo`, `mongos` and `mongod` binaries. It can be used to start a
start a `mongod` instance, `mongos` instance or execute `mongo` commands. This is the main building `mongod` instance, `mongos` instance or execute `mongo` commands. This is the main building block
block for creating the System Under Test topology. for creating the System Under Test topology.
### workload ### workload
@ -36,16 +38,16 @@ buildscript/resmoke.py run --suite antithesis_concurrency_sharded_with_stepdowns
**Every topology must have 1 workload container.** **Every topology must have 1 workload container.**
Note: During `workload` image build, `evergreen/antithesis_image_build_and_push.sh` runs, which generates Note: During `workload` image build, `evergreen/antithesis_image_build_and_push.sh` runs, which
"antithesis compatible" test suites and prepends them with `antithesis_`. These are the test suites generates "antithesis compatible" test suites and prepends them with `antithesis_`. These are the
that can run in antithesis and are available from within the `workload` container. test suites that can run in antithesis and are available from within the `workload` container.
### Dockerfile ### Dockerfile
This assembles an image with the necessary files for spinning up the corresponding topology. It This assembles an image with the necessary files for spinning up the corresponding topology. It
consists of a `docker-compose.yml`, a `logs` directory, a `scripts` directory and a `data` consists of a `docker-compose.yml`, a `logs` directory, a `scripts` directory and a `data`
directory. If this is structured properly, you should be able to copy the files & directories directory. If this is structured properly, you should be able to copy the files & directories from
from this image and run `docker-compose up` to set up the desired topology. this image and run `docker-compose up` to set up the desired topology.
Example from what `buildscripts/resmokelib/testing/docker_cluster_image_builder.py` generates: Example from what `buildscripts/resmokelib/testing/docker_cluster_image_builder.py` generates:
@ -67,8 +69,8 @@ therefore use `FROM scratch`.
### docker-compose.yml ### docker-compose.yml
This describes how to construct the corresponding topology using the This describes how to construct the corresponding topology using the `mongo-binaries` and `workload`
`mongo-binaries` and `workload` images. images.
Example from `buildscripts/antithesis/topologies/sharded_cluster/docker-compose.yml`: Example from `buildscripts/antithesis/topologies/sharded_cluster/docker-compose.yml`:
@ -162,15 +164,15 @@ networks:
Each container must have a `command` in `docker-compose.yml` that runs an init script. The init Each container must have a `command` in `docker-compose.yml` that runs an init script. The init
script belongs in the `scripts` directory, which is included as a volume. The `command` should be script belongs in the `scripts` directory, which is included as a volume. The `command` should be
set like so: `/bin/bash /scripts/[script_name].sh` or `python3 /scripts/[script_name].py`. This is set like so: `/bin/bash /scripts/[script_name].sh` or `python3 /scripts/[script_name].py`. This is a
a requirement for the topology to start up properly in Antithesis. requirement for the topology to start up properly in Antithesis.
When creating `mongod` or `mongos` instances, route the logs like so: When creating `mongod` or `mongos` instances, route the logs like so:
`--logpath /var/log/mongodb/mongodb.log` and utilize `volumes` -- as in `database1`. `--logpath /var/log/mongodb/mongodb.log` and utilize `volumes` -- as in `database1`. This enables us
This enables us to easily retrieve logs if a bug is detected by Antithesis. to easily retrieve logs if a bug is detected by Antithesis.
The `ipv4_address` should be set to `10.20.20.130` or higher if you do not want that container to The `ipv4_address` should be set to `10.20.20.130` or higher if you do not want that container to be
be affected by network fuzzing. For instance, you would likely not want the `workload` container affected by network fuzzing. For instance, you would likely not want the `workload` container
to be affected by network fuzzing -- as shown in the example above. to be affected by network fuzzing -- as shown in the example above.
Use the `evergreen-latest-master` tag for all images. This is updated automatically in Use the `evergreen-latest-master` tag for all images. This is updated automatically in
@ -182,20 +184,26 @@ Take a look at `buildscripts/antithesis/topologies/sharded_cluster/scripts/mongo
how to use util methods from `buildscripts/antithesis/topologies/sharded_cluster/scripts/utils.py` how to use util methods from `buildscripts/antithesis/topologies/sharded_cluster/scripts/utils.py`
to set up the desired topology. You can also use simple shell scripts as in the case of to set up the desired topology. You can also use simple shell scripts as in the case of
`buildscripts/antithesis/topologies/sharded_cluster/scripts/database_init.py`. These init scripts `buildscripts/antithesis/topologies/sharded_cluster/scripts/database_init.py`. These init scripts
must not end in order to keep the underlying container alive. You can use an infinite while must not end in order to keep the underlying container alive. You can use an infinite while loop for
loop for `python` scripts or you can use `tail -f /dev/null` for shell scripts. `python` scripts or you can use `tail -f /dev/null` for shell scripts.
## How do I create a new topology for Antithesis testing? ## How do I create a new topology for Antithesis testing?
This should be done with care to ensure we are using our limited resources efficiently. This should be done with care to ensure we are using our limited resources efficiently.
Create a new task extending the `antithesis_task_template`, tagged with `antithesis`, passing the specified `suite` to the `antithesis image build and push` task. See other examples to get started. Create a new task extending the `antithesis_task_template`, tagged with `antithesis`, passing the
specified `suite` to the `antithesis image build and push` task. See other examples to get started.
## How do I test my suite in antithesis? ## How do I test my suite in antithesis?
If you provide the evergreen parameter `schedule_antithesis_tests` to your evergreen patch, once we build the antithesis images in your evergreen patch we send antithesis an api request to run your newly created images for an hour. You will get emailed the report when it finishes running in antithesis. If you provide the evergreen parameter `schedule_antithesis_tests` to your evergreen patch, once we
build the antithesis images in your evergreen patch we send antithesis an api request to run your
newly created images for an hour. You will get emailed the report when it finishes running in
antithesis.
Important Note: This will happen for every antithesis task you schedule in your patch. Please do not schedule more than 1 or 2 tasks with this parameter at a time or it will use up a lot of our testing time allocated with antithesis. Important Note: This will happen for every antithesis task you schedule in your patch. Please do not
schedule more than 1 or 2 tasks with this parameter at a time or it will use up a lot of our testing
time allocated with antithesis.
`evergreen patch --param schedule_antithesis_tests=true` `evergreen patch --param schedule_antithesis_tests=true`
@ -203,10 +211,10 @@ Important Note: This will happen for every antithesis task you schedule in your
### Normal resmoke testing ### Normal resmoke testing
Antithesis constantly runs your resmoke suite with one random test from the suite at a time. Antithesis constantly runs your resmoke suite with one random test from the suite at a time. We
We support this out-of-the-box with most resmoke suites that use python fixtures. support this out-of-the-box with most resmoke suites that use python fixtures. This is very similar
This is very similar to how tests run in evergreen. to how tests run in evergreen. Your antithesis tasks in evergreen will default to this if the
Your antithesis tasks in evergreen will default to this if the `antithesis_test_composer_dir` var is not specified on the task. `antithesis_test_composer_dir` var is not specified on the task.
### Test Composer ### Test Composer
@ -222,4 +230,5 @@ Evergreen configuration details, see
## Additional Resources ## Additional Resources
If you are interested in leveraging Antithesis feel free to reach out to #ask-devprod-correctness or #server-testing on Slack. If you are interested in leveraging Antithesis feel free to reach out to #ask-devprod-correctness or
#server-testing on Slack.

View File

@ -1,11 +1,10 @@
# Server-Internal Baton Pattern # Server-Internal Baton Pattern
Batons are lightweight job queues in _mongod_ and _mongos_ processes that allow Batons are lightweight job queues in _mongod_ and _mongos_ processes that allow recording the intent
recording the intent to execute a task (e.g., polling on a network socket) and to execute a task (e.g., polling on a network socket) and deferring its execution to a later time.
deferring its execution to a later time. Batons, often by reusing `Client` Batons, often by reusing `Client` threads and through the _Waitable_ interface, move the execution
threads and through the _Waitable_ interface, move the execution of scheduled of scheduled tasks out of the line, potentially hiding the execution cost from the critical path. A
tasks out of the line, potentially hiding the execution cost from the critical total of four baton classes are available today:
path. A total of four baton classes are available today:
- [Baton][baton] - [Baton][baton]
- [DefaultBaton][defaultBaton] - [DefaultBaton][defaultBaton]
@ -14,72 +13,74 @@ path. A total of four baton classes are available today:
## Baton Basics ## Baton Basics
All baton implementations extend _Baton_. They are tightly associated with an All baton implementations extend _Baton_. They are tightly associated with an `OperationContext` and
`OperationContext` and its `Client` thread. An `OperationContext` that belongs its `Client` thread. An `OperationContext` that belongs to a `ServiceContext` with a
to a `ServiceContext` with a `TransportLayer` uses an `AsioNetworkingBaton`, `TransportLayer` uses an `AsioNetworkingBaton`, else a `DefaultBaton`. The baton is accessed through
else a `DefaultBaton`. The baton is accessed through the `OperationContext` with the `OperationContext` with a call to `OperationContext::getBaton()`.
a call to `OperationContext::getBaton()`.
Each baton implementation exposes an interface to allow scheduling tasks on the Each baton implementation exposes an interface to allow scheduling tasks on the baton, to demand the
baton, to demand the awakening of the baton on client socket disconnect, and to awakening of the baton on client socket disconnect, and to create a _SubBaton_. A _SubBaton_, for
create a _SubBaton_. A _SubBaton_, for any of the baton types, is essentially a any of the baton types, is essentially a handle to a local object that proxies scheduling requests
handle to a local object that proxies scheduling requests to its underlying baton to its underlying baton until it is detached (e.g., through destruction of its handle).
until it is detached (e.g., through destruction of its handle).
Additionally, a _NetworkingBaton_ enables consumers of a transport layer to Additionally, a _NetworkingBaton_ enables consumers of a transport layer to execute I/O themselves,
execute I/O themselves, rather than delegating it to other threads. They are rather than delegating it to other threads. They are special batons that are able to poll network
special batons that are able to poll network sockets, which is not feasible sockets, which is not feasible through other baton types. This is essential for minimizing context
through other baton types. This is essential for minimizing context switches and switches and improving the readability of stack traces.
improving the readability of stack traces.
A baton runs automatically when blocking on its associated `OperationContext` A baton runs automatically when blocking on its associated `OperationContext` with a call to
with a call to `OperationContext::waitForConditionOrInterrupt()`. Many different `OperationContext::waitForConditionOrInterrupt()`. Many different apis that take in or use an
apis that take in or use an _Interruptible_ will eventually call into this method _Interruptible_ will eventually call into this method (e.g. `Future::get(...)`,
(e.g. `Future::get(...)`, `OperationContext::sleepUntil(...)`, etc.). `OperationContext::sleepUntil(...)`, etc.).
### DefaultBaton ### DefaultBaton
DefaultBaton is the most basic baton implementation. This baton provides the DefaultBaton is the most basic baton implementation. This baton provides the platform to execute
platform to execute tasks while a client thread awaits an event or a timeout, tasks while a client thread awaits an event or a timeout, essentially paving the way towards
essentially paving the way towards utilizing idle cycles of client threads for utilizing idle cycles of client threads for useful work. Tasks can be scheduled on this baton
useful work. Tasks can be scheduled on this baton through its associated through its associated `OperationContext` and using `OperationContext::getBaton()::schedule(...)`.
`OperationContext` and using `OperationContext::getBaton()::schedule(...)`.
Note that because _Baton_ extends an _OutOfLineExecutor_, it can be used as the Note that because _Baton_ extends an _OutOfLineExecutor_, it can be used as the executor to run work
executor to run work on an `ExecutorFuture`. on an `ExecutorFuture`.
### AsioNetworkingBaton ### AsioNetworkingBaton
The AsioNetworkingBaton can schedule and run tasks similarly to the _DefaultBaton_, The AsioNetworkingBaton can schedule and run tasks similarly to the _DefaultBaton_, but it also
but it also implements the _NetworkingBaton_ interface to provide a networking implements the _NetworkingBaton_ interface to provide a networking reactor. It can register sessions
reactor. It can register sessions to monitor and will utilize `poll(2)` and to monitor and will utilize `poll(2)` and `eventfd(2)` to wait until I/O can be performed on the
`eventfd(2)` to wait until I/O can be performed on the socket or until interrupted. socket or until interrupted.
This baton is primarily used for egress networking where it gets scheduled to send This baton is primarily used for egress networking where it gets scheduled to send off a command
off a command after a connection is made (see the relevant code [here][asioNetworkingBatonScheduling]). after a connection is made (see the relevant code [here][asioNetworkingBatonScheduling]). This means
This means that the AsioNetworkingBaton will normally perform socket I/O without that the AsioNetworkingBaton will normally perform socket I/O without needing to poll. It only
needing to poll. It only registers a session for polling if another read or registers a session for polling if another read or write is needed on the socket (e.g. [registering
write is needed on the socket (e.g. [registering a session during socket read][asioNetworkingBatonPollingSetup]). a session during socket read][asioNetworkingBatonPollingSetup]).
In order for an egress session to use the baton, it must be specified as an In order for an egress session to use the baton, it must be specified as an argument to
argument to `TaskExecutor::scheduleRemoteCommand(...)`. `TaskExecutor::scheduleRemoteCommand(...)`.
Note that this baton is only available for Linux. Note that this baton is only available for Linux.
## Example ## Example
For an example of scheduling a task on the `OperationContext` baton, see For an example of scheduling a task on the `OperationContext` baton, see [here][example].
[here][example].
## Considerations ## Considerations
Since any task scheduled on a baton is intended for out-of-line execution, it Since any task scheduled on a baton is intended for out-of-line execution, it must be non-blocking
must be non-blocking and preferably short-lived to ensure forward progress. and preferably short-lived to ensure forward progress.
[baton]: https://github.com/mongodb/mongo/blob/5906d967c3144d09fab6a4cc1daddb295df19ffb/src/mongo/db/baton.h#L61-L178 [baton]:
[defaultBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/db/default_baton.h#L46-L75 https://github.com/mongodb/mongo/blob/5906d967c3144d09fab6a4cc1daddb295df19ffb/src/mongo/db/baton.h#L61-L178
[networkingBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton.h#L61-L96 [defaultBaton]:
[asioNetworkingBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton_asio_linux.h#L60-L529 https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/db/default_baton.h#L46-L75
[asioNetworkingBatonScheduling]: https://github.com/mongodb/mongo/blob/46b8c49b4e13cc4c8389b2822f9e30dd73b81d6e/src/mongo/executor/network_interface_tl.cpp#L910 [networkingBaton]:
[asioNetworkingBatonPollingSetup]: https://github.com/mongodb/mongo/blob/eab4ec41cc2b28bf0a38eb813f9690e1bfa6c9a6/src/mongo/transport/asio/asio_session_impl.cpp#L666-L696 https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton.h#L61-L96
[example]: https://github.com/mongodb/mongo/blob/262e5a961fa7221bfba5722aeea2db719f2149f5/src/mongo/s/multi_statement_transaction_requests_sender.cpp#L91-L99 [asioNetworkingBaton]:
https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton_asio_linux.h#L60-L529
[asioNetworkingBatonScheduling]:
https://github.com/mongodb/mongo/blob/46b8c49b4e13cc4c8389b2822f9e30dd73b81d6e/src/mongo/executor/network_interface_tl.cpp#L910
[asioNetworkingBatonPollingSetup]:
https://github.com/mongodb/mongo/blob/eab4ec41cc2b28bf0a38eb813f9690e1bfa6c9a6/src/mongo/transport/asio/asio_session_impl.cpp#L666-L696
[example]:
https://github.com/mongodb/mongo/blob/262e5a961fa7221bfba5722aeea2db719f2149f5/src/mongo/s/multi_statement_transaction_requests_sender.cpp#L91-L99

View File

@ -1,6 +1,7 @@
# Branching # Branching
This document describes branching task regarding file updates in `10gen/mongo` repository that should be done on a new branch immediately after a branch cut. This document describes branching task regarding file updates in `10gen/mongo` repository that
should be done on a new branch immediately after a branch cut.
## Table of contents ## Table of contents
@ -14,11 +15,14 @@ This document describes branching task regarding file updates in `10gen/mongo` r
### GitHub App credentials ### GitHub App credentials
Add GitHub app credentials (app id and key) in the new project settings, eg. https://spruce.corp.mongodb.com/project/mongodb-mongo-v8.3/settings/github-app-settings (additional MANA permissions may be required, else coordinate with Release team contacts). Add GitHub app credentials (app id and key) in the new project settings, eg.
https://spruce.corp.mongodb.com/project/mongodb-mongo-v8.3/settings/github-app-settings (additional
MANA permissions may be required, else coordinate with Release team contacts).
## 2. Create working branch ## 2. Create working branch
To save time during the branch cut these branching changes could be done beforehand, but not too early to avoid extra file conflicts, and then rebased on a new `vX.Y` branch. To save time during the branch cut these branching changes could be done beforehand, but not too
early to avoid extra file conflicts, and then rebased on a new `vX.Y` branch.
Create a working branch from `master` or from a new `vX.Y` branch if it already exists: Create a working branch from `master` or from a new `vX.Y` branch if it already exists:
@ -30,13 +34,16 @@ git checkout -b vX.Y-branching-task
## 2. Update files ## 2. Update files
**IMPORTANT!** All of these changes should be a separate commit, but they should be pushed together in the same commit-queue task. **IMPORTANT!** All of these changes should be a separate commit, but they should be pushed together
in the same commit-queue task.
The reason they should be pushed as separate commits is in the case of needing to revert one aspect of this entire task. The reason they should be pushed as separate commits is in the case of needing to revert one aspect
of this entire task.
> See [8.2 branching PR](https://github.com/mongodb/mongo/pull/38920/commits) for reference. > See [8.2 branching PR](https://github.com/mongodb/mongo/pull/38920/commits) for reference.
Some have some automated steps you can run, but please double-check their edits. Initialize the version here, used throughout: Some have some automated steps you can run, but please double-check their edits. Initialize the
version here, used throughout:
```sh ```sh
VERSION=8.3 VERSION=8.3
@ -51,7 +58,9 @@ sed -i "s/master/v$VERSION/g" copy.bara.sky
sed -i 's/branch = "master"/branch = "v'"$VERSION"'"/' buildscripts/sync_repo_with_copybara.py sed -i 's/branch = "master"/branch = "v'"$VERSION"'"/' buildscripts/sync_repo_with_copybara.py
``` ```
For each file [`copy.bara.sky`](../../copy.bara.sky) and [`sync_repo_with_copybara.py`](../../buildscripts/sync_repo_with_copybara.py), the "master" branch references should be replaced with the new branch name. For each file [`copy.bara.sky`](../../copy.bara.sky) and
[`sync_repo_with_copybara.py`](../../buildscripts/sync_repo_with_copybara.py), the "master" branch
references should be replaced with the new branch name.
### Evergreen YAML configurations ### Evergreen YAML configurations
@ -63,16 +72,23 @@ Run the following automation and verify results:
sed -i "s/suffix\"] = \"latest\"/suffix\"] = \"v$VERSION-latest\"/g" buildscripts/generate_version_expansions.py sed -i "s/suffix\"] = \"latest\"/suffix\"] = \"v$VERSION-latest\"/g" buildscripts/generate_version_expansions.py
``` ```
In the file [`buildscripts/generate_version_expansions.py`](../../buildscripts/generate_version_expansions.py), the "latest" suffixes should be replaced with the new branch name. In the file
[`buildscripts/generate_version_expansions.py`](../../buildscripts/generate_version_expansions.py),
the "latest" suffixes should be replaced with the new branch name.
#### 2. Nightly YAML #### 2. Nightly YAML
[`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) will be used as YAML configuration in the new `mongodb-mongo-vX.Y` evergreen project. [`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) will be used as YAML configuration in
the new `mongodb-mongo-vX.Y` evergreen project.
This will move some build variants from `etc/evergreen.yml` to continue running on a new branch project. More information about build variants after branching is [here](../evergreen-testing/yaml_configuration/buildvariants.md#build-variants-after-branching). This will move some build variants from `etc/evergreen.yml` to continue running on a new branch
project. More information about build variants after branching is
[here](../evergreen-testing/yaml_configuration/buildvariants.md#build-variants-after-branching).
- Copy over commit-queue aliases and patch aliases from [`etc/evergreen.yml`](../../etc/evergreen.yml) - Copy over commit-queue aliases and patch aliases from
- Update "include" section: comment out or uncomment file includes as instructions in the comments suggest. [`etc/evergreen.yml`](../../etc/evergreen.yml)
- Update "include" section: comment out or uncomment file includes as instructions in the comments
suggest.
#### 3. Burn-in tasks #### 3. Burn-in tasks
@ -82,7 +98,12 @@ Run the following automation and verify results:
sed -i '/burn_in_tag_include_build_variants/{N;N;N;d;}' etc/evergreen_yml_components/variants/misc/misc.yml sed -i '/burn_in_tag_include_build_variants/{N;N;N;d;}' etc/evergreen_yml_components/variants/misc/misc.yml
``` ```
In the file [`etc/evergreen_yml_components/variants/misc/misc.yml`](../../etc/evergreen_yml_components/variants/misc/misc.yml), build variant names in the ["burn_in_tag_include_build_variants" expansion](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/etc/evergreen_yml_components/variants/misc/misc.yml#L21) that are _not_ included in [`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) are _removed_. In the file
[`etc/evergreen_yml_components/variants/misc/misc.yml`](../../etc/evergreen_yml_components/variants/misc/misc.yml),
build variant names in the
["burn_in_tag_include_build_variants" expansion](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/etc/evergreen_yml_components/variants/misc/misc.yml#L21)
that are _not_ included in [`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) are
_removed_.
#### 4. Suggested to Required #### 4. Suggested to Required
@ -94,7 +115,9 @@ sed -i 's@display_name: "\* Amazon Linux 2023 arm64 Enterprise"@display_name: "!
sed -i 's/tags: \["suggested", "forbid_tasks_tagged_with_experimental"\]/tags: ["required", "forbid_tasks_tagged_with_experimental"]/g' etc/evergreen_yml_components/variants/amazon/test_dev.yml sed -i 's/tags: \["suggested", "forbid_tasks_tagged_with_experimental"\]/tags: ["required", "forbid_tasks_tagged_with_experimental"]/g' etc/evergreen_yml_components/variants/amazon/test_dev.yml
``` ```
For the variant `enterprise-amazon-linux2023-arm64` in [`etc/evergreen_yml_components/variants/amazon/test_dev.yml`](../../etc/evergreen_yml_components/variants/amazon/test_dev.yml), replace: For the variant `enterprise-amazon-linux2023-arm64` in
[`etc/evergreen_yml_components/variants/amazon/test_dev.yml`](../../etc/evergreen_yml_components/variants/amazon/test_dev.yml),
replace:
- "\*" with "!" in their display names - "\*" with "!" in their display names
- "suggested" variant tag with "required" - "suggested" variant tag with "required"
@ -116,10 +139,12 @@ sed -i 's/!.incompatible_all_feature_flags/!.requires_all_feature_flags/g' $FILE
For the build variant names: For the build variant names:
- in [`etc/evergreen_yml_components/variants/windows/test_dev.yml`](../../etc/evergreen_yml_components/variants/windows/test_dev.yml): - in
[`etc/evergreen_yml_components/variants/windows/test_dev.yml`](../../etc/evergreen_yml_components/variants/windows/test_dev.yml):
- `enterprise-windows-all-feature-flags-required` - `enterprise-windows-all-feature-flags-required`
- `enterprise-windows-all-feature-flags-non-essential` - `enterprise-windows-all-feature-flags-non-essential`
- in [`etc/evergreen_yml_components/variants/sanitizer/test_dev.yml`](../../etc/evergreen_yml_components/variants/sanitizer/test_dev.yml): - in
[`etc/evergreen_yml_components/variants/sanitizer/test_dev.yml`](../../etc/evergreen_yml_components/variants/sanitizer/test_dev.yml):
- `linux-debug-aubsan-lite-all-feature-flags-required` - `linux-debug-aubsan-lite-all-feature-flags-required`
@ -130,9 +155,12 @@ For the build variant names:
#### 6. Sys-perf YAML #### 6. Sys-perf YAML
[`etc/system_perf.yml`](../../etc/system_perf.yml) will be used as YAML configuration for a new `sys-perf-X.Y` evergreen project [`etc/system_perf.yml`](../../etc/system_perf.yml) will be used as YAML configuration for a new
`sys-perf-X.Y` evergreen project
> Ensure that [DSI](https://github.com/10gen/dsi/blob/master/evergreen/system_perf/README.md#branching) has been updated with new branches > Ensure that
> [DSI](https://github.com/10gen/dsi/blob/master/evergreen/system_perf/README.md#branching) has been
> updated with new branches
Run the following automation and verify results: Run the following automation and verify results:
@ -146,8 +174,13 @@ sed -i "s@evergreen/system_perf/master/variants.yml@evergreen/system_perf/$VERSI
In the file [`etc/system_perf.yml`](../../etc/system_perf.yml), the following should be reflected: In the file [`etc/system_perf.yml`](../../etc/system_perf.yml), the following should be reflected:
- Remove `evergreen/system_perf/master/master_variants.yml` from "include" section - Remove `evergreen/system_perf/master/master_variants.yml` from "include" section
- With the exception of `base.yml`, update all other entries that contain `master` in the path to contain `X.Y` in the path instead. (e.g. `evergreen/system_perf/master/variants.yml` should become `evergreen/system_perf/X.Y/variants.yml`). - With the exception of `base.yml`, update all other entries that contain `master` in the path to
- Update the [evergreen project variable](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-and-Distro-Settings#variables) `compile_project` in the new sys-perf-X.Y evergreen project to point to the new mongodb-mongo-vX.Y branch contain `X.Y` in the path instead. (e.g. `evergreen/system_perf/master/variants.yml` should become
`evergreen/system_perf/X.Y/variants.yml`).
- Update the
[evergreen project variable](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-and-Distro-Settings#variables)
`compile_project` in the new sys-perf-X.Y evergreen project to point to the new mongodb-mongo-vX.Y
branch
#### 7. Evergreen project validation #### 7. Evergreen project validation
@ -157,7 +190,10 @@ Run the following automation and verify results:
sed -i 's/RELEASE_BRANCH = False/RELEASE_BRANCH = True/g' buildscripts/validate_evg_project_config.py sed -i 's/RELEASE_BRANCH = False/RELEASE_BRANCH = True/g' buildscripts/validate_evg_project_config.py
``` ```
In file [`buildscripts/validate_evg_project_config.py`](../../buildscripts/validate_evg_project_config.py), the `RELEASE_BRANCH` variable should be set to `True` to leverage a specialized shortcut conditional to `evaluate` the project, not `validate`. In file
[`buildscripts/validate_evg_project_config.py`](../../buildscripts/validate_evg_project_config.py),
the `RELEASE_BRANCH` variable should be set to `True` to leverage a specialized shortcut conditional
to `evaluate` the project, not `validate`.
#### 8. Coverity #### 8. Coverity
@ -167,7 +203,8 @@ Run the following automation and verify results:
sed -i "s/stream: mongo.master/stream: mongo.v$VERSION/g" etc/coverity.yml sed -i "s/stream: mongo.master/stream: mongo.v$VERSION/g" etc/coverity.yml
``` ```
In the file [`etc/coverity.yml`](../../etc/coverity.yml), the "stream" should be updated to the new branch. In the file [`etc/coverity.yml`](../../etc/coverity.yml), the "stream" should be updated to the new
branch.
#### Finally: format and lint #### Finally: format and lint
@ -179,7 +216,8 @@ Run linters and formatters and fix anything that couldn't be autofixed.
## 3. Test changes ## 3. Test changes
In case working branch was created from `master` branch, rebase it on a new `vX.Y` branch and fix file conflicts if any. In case working branch was created from `master` branch, rebase it on a new `vX.Y` branch and fix
file conflicts if any.
Schedule required patch on a new `mongodb-mongo-vX.Y` project: Schedule required patch on a new `mongodb-mongo-vX.Y` project:
@ -187,7 +225,8 @@ Schedule required patch on a new `mongodb-mongo-vX.Y` project:
evergreen patch -p mongodb-mongo-vX.Y -a required evergreen patch -p mongodb-mongo-vX.Y -a required
``` ```
If patch results reveal that some steps are missing or outdated in this file, make sure to update the branching documentation on a "master" branch accordingly. If patch results reveal that some steps are missing or outdated in this file, make sure to update
the branching documentation on a "master" branch accordingly.
## 4. Merge changes ## 4. Merge changes

View File

@ -1,8 +1,7 @@
# Building MongoDB # Building MongoDB
Please note that prebuilt binaries are available on Please note that prebuilt binaries are available on [mongodb.org](http://www.mongodb.org/downloads)
[mongodb.org](http://www.mongodb.org/downloads) and may be the easiest and may be the easiest way to get started, rather than building from source.
way to get started, rather than building from source.
To build MongoDB, you will need: To build MongoDB, you will need:
@ -20,13 +19,13 @@ To build MongoDB, you will need:
- On Ubuntu, the lzma library is required. Install `liblzma-dev` - On Ubuntu, the lzma library is required. Install `liblzma-dev`
- On Amazon Linux, the xz-devel library is required. `yum install xz-devel` - On Amazon Linux, the xz-devel library is required. `yum install xz-devel`
- Python 3.13 - Python 3.13
- About 13 GB of free disk space for the core binaries (`mongod`, - About 13 GB of free disk space for the core binaries (`mongod`, `mongos`, and `mongo`).
`mongos`, and `mongo`).
If using a newer version of a C++ compiler than listed above, it may work. However the versions listed above have been verified to work. If using a newer version of a C++ compiler than listed above, it may work. However the versions
listed above have been verified to work.
MongoDB supports the following architectures: arm64, ppc64le, s390x, MongoDB supports the following architectures: arm64, ppc64le, s390x, and x86-64. More detailed
and x86-64. More detailed platform instructions can be found below. platform instructions can be found below.
## Quick (re)Start ## Quick (re)Start
@ -45,23 +44,21 @@ If you only want to build the database server `mongod`:
$ bazel build install-mongod $ bazel build install-mongod
**_Note_**: For C++ compilers that are newer than the supported **_Note_**: For C++ compilers that are newer than the supported version, the compiler may issue new
version, the compiler may issue new warnings that cause MongoDB to warnings that cause MongoDB to fail to build since the build system treats compiler warnings as
fail to build since the build system treats compiler warnings as errors. To ignore the warnings, pass the switch `--disable_warnings_as_errors=True` to the bazel
errors. To ignore the warnings, pass the switch command.
`--disable_warnings_as_errors=True` to the bazel command.
$ bazel build install-mongod --disable_warnings_as_errors=True $ bazel build install-mongod --disable_warnings_as_errors=True
If you want to build absolutely everything (`mongod`, `mongo`, unit If you want to build absolutely everything (`mongod`, `mongo`, unit tests, etc):
tests, etc):
$ bazel build --build_tag_filters=mongo_binary //src/mongo/... $ bazel build --build_tag_filters=mongo_binary //src/mongo/...
## Bazel Targets ## Bazel Targets
The following targets can be named on the bazel command line to build and The following targets can be named on the bazel command line to build and install a subset of
install a subset of components: components:
- `install-mongod` - `install-mongod`
- `install-mongos` - `install-mongos`
@ -69,16 +66,15 @@ install a subset of components:
- `install-dist` (includes all server components) - `install-dist` (includes all server components)
- `install-devcore` (includes `mongod`, `mongos`, and `jstestshell` (formerly `mongo` shell)) - `install-devcore` (includes `mongod`, `mongos`, and `jstestshell` (formerly `mongo` shell))
**_NOTE_**: The `install-core` and `install-dist` targets are _not_ **_NOTE_**: The `install-core` and `install-dist` targets are _not_ guaranteed to be identical. The
guaranteed to be identical. The `install-core` target will only ever include a `install-core` target will only ever include a minimal set of "core" server components, while
minimal set of "core" server components, while `install-dist` is intended `install-dist` is intended for a functional end-user installation. If you are testing, you should
for a functional end-user installation. If you are testing, you should use the use the `install-devcore` or `install-dist` targets instead.
`install-devcore` or `install-dist` targets instead.
## Where to find Binaries ## Where to find Binaries
The build system will produce an installation tree into `bazel-bin/install`, as well The build system will produce an installation tree into `bazel-bin/install`, as well individual
individual install target trees like `bazel-bin/<install-target>`. install target trees like `bazel-bin/<install-target>`.
## Windows ## Windows
@ -97,8 +93,6 @@ To install dependencies on Debian or Ubuntu systems:
## OS X ## OS X
Install Xcode 16.4 or newer. Make sure macOS 15.5 platform Install Xcode 16.4 or newer. Make sure macOS 15.5 platform is installed.
is installed.
Install llvm and lld, version 19 from brew: Install llvm and lld, version 19 from brew: brew install llvm@19 lld@19
brew install llvm@19 lld@19

View File

@ -5,25 +5,23 @@ current version of master, if not explicitly stated otherwise. Implementation de
versions may vary slightly. versions may vary slightly.
Change streams are a convenient way for an application to monitor changes made to the data in a Change streams are a convenient way for an application to monitor changes made to the data in a
deployment. deployment. The events produced by change streams are called "change events". The event data is
The events produced by change streams are called "change events". The event data is produced from produced from the oplog(s) of the deployment. The events that are emitted by change streams include
the oplog(s) of the deployment.
The events that are emitted by change streams include
- DML events: emitted for operations that insert, update, replace, or delete individual documents. - DML events: emitted for operations that insert, update, replace, or delete individual documents.
- DDL events: emitted for operations that create, drop, or modify collections, databases, or views. - DDL events: emitted for operations that create, drop, or modify collections, databases, or views.
- Data placement events: emitted for operations that define or modify the placement of data inside - Data placement events: emitted for operations that define or modify the placement of data inside a
a sharded cluster. sharded cluster.
- Cluster topology events: emitted for operations that add or remove shards in a sharded cluster. - Cluster topology events: emitted for operations that add or remove shards in a sharded cluster.
Which exact event types are emitted by a change stream depends on the change stream configuration Which exact event types are emitted by a change stream depends on the change stream configuration
and the deployment type. and the deployment type.
Change streams are mainly used by customer applications and tools to keep track of changes to the Change streams are mainly used by customer applications and tools to keep track of changes to the
data in a deployment, in order to relay these updates to external systems. data in a deployment, in order to relay these updates to external systems. Some of MongoDB's own
Some of MongoDB's own tools and components are also based on change streams, e.g. _mongosync_ (C2C), tools and components are also based on change streams, e.g. _mongosync_ (C2C), Atlas Search, Atlas
Atlas Search, Atlas Stream Processing, and the resharding process. Stream Processing, and the resharding process. The component that opens a change stream and pulls
The component that opens a change stream and pulls events from it is called the "consumer". events from it is called the "consumer".
## Change Stream Guarantees ## Change Stream Guarantees
@ -31,17 +29,16 @@ Change Streams provide various guarantees:
- Ordering: change streams deliver events in the order they originally occurred within the target - Ordering: change streams deliver events in the order they originally occurred within the target
namespace (e.g., collection, database, or entire cluster). The order is based on the sequence in namespace (e.g., collection, database, or entire cluster). The order is based on the sequence in
which the operations were applied to the oplog. which the operations were applied to the oplog. In a sharded cluster, the events from multiple
In a sharded cluster, the events from multiple oplogs will be merged deterministically into a oplogs will be merged deterministically into a single, ordered stream of change events.
single, ordered stream of change events.
- Durability and reproducability: change streams are based on the internal oplog, which is part of - Durability and reproducability: change streams are based on the internal oplog, which is part of
the deployment's replication mechanism. Change streams only deliver events after they have been the deployment's replication mechanism. Change streams only deliver events after they have been
committed to a majority of nodes and durably persisted, ensuring they will not be rolled back. committed to a majority of nodes and durably persisted, ensuring they will not be rolled back.
- Exactly-once delivery: every event in a change stream is emitted exactly once, and no event that - Exactly-once delivery: every event in a change stream is emitted exactly once, and no event that
matches the change stream filter is skipped. matches the change stream filter is skipped.
- Resumability: change stream consumption can be interrupted due to transient errors (e.g. network - Resumability: change stream consumption can be interrupted due to transient errors (e.g. network
issues, node failures, application errors), but it can be resumed from the exact point where issues, node failures, application errors), but it can be resumed from the exact point where the
the consumption stopped. This is made possible by the resume token (`_id` field) that accompanies consumption stopped. This is made possible by the resume token (`_id` field) that accompanies
every change event, which acts as a bookmark. This allows to the consumer to continue processing every change event, which acts as a bookmark. This allows to the consumer to continue processing
changes from the last known position without missing events. changes from the last known position without missing events.
@ -71,9 +68,8 @@ opened against standalone _mongod_ instances, as there is no oplog to generate t
standalone mode. standalone mode.
In replica set deployments, the change stream can be opened directly on any replica set member of In replica set deployments, the change stream can be opened directly on any replica set member of
the deployment. the deployment. In sharded cluster deployments, the change stream must be opened against any of the
In sharded cluster deployments, the change stream must be opened against any of the deployment's deployment's _mongos_ processes.
_mongos_ processes.
A change stream is opened by executing an `aggregate` command with a pipeline that contains at least A change stream is opened by executing an `aggregate` command with a pipeline that contains at least
the `$changeStream` pipeline stage. the `$changeStream` pipeline stage.
@ -115,9 +111,8 @@ db.getSiblingDB("testDB").runCommand({
``` ```
The `aggregate` parameter must be set to `1` for database-level change streams, and the command must The `aggregate` parameter must be set to `1` for database-level change streams, and the command must
be executed inside the desired database. be executed inside the desired database. The internal namespace that is used by database-level
The internal namespace that is used by database-level change streams is `<dbName>.$cmd.aggregate` change streams is `<dbName>.$cmd.aggregate` (where `<dbName>` is the actual name of the database).
(where `<dbName>` is the actual name of the database).
### Opening an All-Cluster Change Stream ### Opening an All-Cluster Change Stream
@ -161,9 +156,8 @@ into smaller fragments, in order to avoid running into `BSONObjectTooLarge` erro
### Change Stream Start Time ### Change Stream Start Time
When opening a change stream without specifying an explicit point in time, the change stream will be When opening a change stream without specifying an explicit point in time, the change stream will be
opened using the current time, and will report only change events that happened after that point opened using the current time, and will report only change events that happened after that point in
in time. time. The current time here is
The current time here is
- the time of the latest majority-committed operation for replica set change streams, or - the time of the latest majority-committed operation for replica set change streams, or
- the value of the cluster's vector clock for sharded cluster change streams. - the value of the cluster's vector clock for sharded cluster change streams.
@ -174,9 +168,8 @@ parameter is specified as a logical timestamp.
### Resuming Change Streams ### Resuming Change Streams
Change streams allow the consumer to resume the change stream after an error occurred. Change streams allow the consumer to resume the change stream after an error occurred. To support
To support resumability, change streams report a "resume token" inside the `_id` field of every resumability, change streams report a "resume token" inside the `_id` field of every emitted event.
emitted event.
To resume a change stream after an error occurred, the resume token of a previously consumed event To resume a change stream after an error occurred, the resume token of a previously consumed event
can be passed in one of the parameters `resumeAfter` or `startAfter` when opening a change stream. can be passed in one of the parameters `resumeAfter` or `startAfter` when opening a change stream.
@ -198,8 +191,7 @@ with a different `$match` expression may lead to different events being returned
the event with the original resume token not being found in the new change stream. the event with the original resume token not being found in the new change stream.
The resume tokens that are emitted by change streams are string values that contain a hexadecimal The resume tokens that are emitted by change streams are string values that contain a hexadecimal
encoding of the internal resume token data. encoding of the internal resume token data. The internal resume token data contains
The internal resume token data contains
- the cluster time of an event. - the cluster time of an event.
- the version of the resume token format. - the version of the resume token format.
@ -212,11 +204,13 @@ The internal resume token data contains
Resume tokens are versioned. Currently only version 2 is supported. Resume tokens are versioned. Currently only version 2 is supported.
Future versions may introduce new resume token versions. Client applications should treat resume Future versions may introduce new resume token versions. Client applications should treat resume
tokens as opaque identifiers and should not make any assumptions about the format or internals tokens as opaque identifiers and should not make any assumptions about the format or internals or
or resume tokens, nor should they rely on the internal implementation details of resume tokens. resume tokens, nor should they rely on the internal implementation details of resume tokens.
Resume tokens are serialized and deserialized by the [ResumeToken](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L148) Resume tokens are serialized and deserialized by the
class. The resume token internal data is stored in [ResumeTokenData](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L51). [ResumeToken](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L148)
class. The resume token internal data is stored in
[ResumeTokenData](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L51).
#### Resume Token Types #### Resume Token Types
@ -225,12 +219,12 @@ There are two types of resume tokens:
- event resume tokens - event resume tokens
- high watermark resume tokens - high watermark resume tokens
The former stem from actual change events. The former stem from actual change events. High watermark token are a special kind of change stream
High watermark token are a special kind of change stream resume token that represent a logical resume token that represent a logical position in the global change stream ordered only by cluster
position in the global change stream ordered only by cluster time, not a specific event. time, not a specific event.
High watermark tokens sort strictly before any real event token at the same cluster time. High watermark tokens sort strictly before any real event token at the same cluster time. That is, a
That is, a highwatermark token for time T sorts ahead of all events whose cluster time >= T. highwatermark token for time T sorts ahead of all events whose cluster time >= T.
#### Decoding Resume Tokens #### Decoding Resume Tokens
@ -267,43 +261,42 @@ by the consumer or the change stream runs into an error. Also, unused cursors ar
garbage-collected after a period of inactivity. garbage-collected after a period of inactivity.
When opening a change stream on a sharded cluster, the targeted `mongos` instance will open the When opening a change stream on a sharded cluster, the targeted `mongos` instance will open the
required cursors on the relevant shards of the cluster and also the config server. Here, the `mongos` required cursors on the relevant shards of the cluster and also the config server. Here, the
instance will also automatically open additional cursors in case new shards are added to the `mongos` instance will also automatically open additional cursors in case new shards are added to
cluster. All this is abstracted from the consumer of the change stream. The consumer of the change the cluster. All this is abstracted from the consumer of the change stream. The consumer of the
stream will only see a single cursor and interact with _mongos_, which handles the complexity of change stream will only see a single cursor and interact with _mongos_, which handles the complexity
managing the underlying shard cursors. of managing the underlying shard cursors.
If a change stream cursor can be successfully established, the cursor id is returned to the If a change stream cursor can be successfully established, the cursor id is returned to the
consumer. The consumer can then use the cursor id to pull change events from the change stream by consumer. The consumer can then use the cursor id to pull change events from the change stream by
issuing follow-up `getMore` commands to this cursor. issuing follow-up `getMore` commands to this cursor.
If a change stream cursor cannot be successfully opened, the initial `aggregate` command will If a change stream cursor cannot be successfully opened, the initial `aggregate` command will return
return an error, and the returned cursor id will be `0`. In this case, no events can be consumed an error, and the returned cursor id will be `0`. In this case, no events can be consumed from the
from the change stream, and the consumer needs to resolve the error. change stream, and the consumer needs to resolve the error.
### Change Stream errors ### Change Stream errors
When a change stream is opened at a specific point in time, it is validated that the oplog of all When a change stream is opened at a specific point in time, it is validated that the oplog of all
participating nodes actually contains data for this point in time. participating nodes actually contains data for this point in time. If the oplog does not contain any
If the oplog does not contain any data for the exact point in time or before, it would be possible data for the exact point in time or before, it would be possible that the requested data has already
that the requested data has already fallen off the oplog. fallen off the oplog. In case no oplog entry can be found that is at least as old as the specified
In case no oplog entry can be found that is at least as old as the specified timetamp, opening the timetamp, opening the change stream will fail with error code `OplogQueryMinTsMissing`. This
change stream will fail with error code `OplogQueryMinTsMissing`. validation happens for all change streams, regardless if the start timestamp is specified via the
This validation happens for all change streams, regardless if the start timestamp is specified via `resumeAfter`, `startAfter` or `startAtOperationTime` parameters, or if the start time is implied
the `resumeAfter`, `startAfter` or `startAtOperationTime` parameters, or if the start time is from the current time. An exception in which opening a change stream at a later point in time than
implied from the current time. the timestamp of the first present oplog entry is permitted is for new shard primaries. New shard
An exception in which opening a change stream at a later point in time than the timestamp of the primary can be added to an existing cluster at any point in time. When a new shard primary is added,
first present oplog entry is permitted is for new shard primaries. its first oplog entry will be a no-op entry with `msg` == `initiating set` (on ASC) or `msg` ==
New shard primary can be added to an existing cluster at any point in time. When a new shard primary `new primary` (on DSC).
is added, its first oplog entry will be a no-op entry with `msg` == `initiating set` (on ASC) or
`msg` == `new primary` (on DSC).
The code for this can be found [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/classic/collection_scan.cpp#L195-L227). The code for this can be found
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/classic/collection_scan.cpp#L195-L227).
Another common error is `ChangeStreamHistoryLost`. This error is raised when a change stream is Another common error is `ChangeStreamHistoryLost`. This error is raised when a change stream is
opened with a resume token that cannot be found (anymore) in any of the participating nodes' oplogs. opened with a resume token that cannot be found (anymore) in any of the participating nodes' oplogs.
This can either happen when the resume event has actually fallen off the oplog, or, when a This can either happen when the resume event has actually fallen off the oplog, or, when a change
change stream is resumed with the resume token from another change stream with a different `$match` stream is resumed with the resume token from another change stream with a different `$match`
expression. In this case, the new change stream may filter out the resume event due to the different expression. In this case, the new change stream may filter out the resume event due to the different
`$match` expression, so it cannot be found anymore. `$match` expression, so it cannot be found anymore.
@ -342,9 +335,9 @@ request:
- `maxTimeMS`: maximum server-side waiting time for producing events. - `maxTimeMS`: maximum server-side waiting time for producing events.
The `getMore` command will fill the response with up to `batchSize` results if that many events are The `getMore` command will fill the response with up to `batchSize` results if that many events are
available. A response can also contain less events than the specified `batchSize`. available. A response can also contain less events than the specified `batchSize`. Regardless of the
Regardless of the specified batch size, the maximum response size limit of 16MB will be honored, in specified batch size, the maximum response size limit of 16MB will be honored, in order to prevent
order to prevent responses from getting too large. responses from getting too large.
A change stream response is returned to the consumer when A change stream response is returned to the consumer when
@ -353,14 +346,13 @@ A change stream response is returned to the consumer when
would make it exceed the 16MB size limit. would make it exceed the 16MB size limit.
In case the change stream cursor has reached the end of the oplog and there are currently no events In case the change stream cursor has reached the end of the oplog and there are currently no events
to return, the response will be returned immediately if it already contains at least one event. to return, the response will be returned immediately if it already contains at least one event. If
If the response is empty, the change stream will wait for at most `maxTimeMS` for new oplog entries the response is empty, the change stream will wait for at most `maxTimeMS` for new oplog entries to
to arrive. arrive. If no new oplog entries arrive within `maxTimeMS`, an empty response will be returned. If
If no new oplog entries arrive within `maxTimeMS`, an empty response will be returned. If new oplog new oplog entries arrive within `maxTimeMS` and at least one of them matches the change stream's
entries arrive within `maxTimeMS` and at least one of them matches the change stream's filter, the filter, the matching event will be returned immediately. If oplog entries arrive but do not match
matching event will be returned immediately. If oplog entries arrive but do not match the change the change stream's filter, the change stream will wait for matching oplog entries until `maxTimeMS`
stream's filter, the change stream will wait for matching oplog entries until `maxTimeMS` is fully is fully expired.
expired.
### Generic Event layout ### Generic Event layout
@ -379,8 +371,8 @@ The following generic fields are added for change streams that were opened with
- `collectionUUID`: UUID of the collection for which the event occurred, if applicable. - `collectionUUID`: UUID of the collection for which the event occurred, if applicable.
- `operationDescription`: populated for DDL events. - `operationDescription`: populated for DDL events.
Most other fields are event type-specific, so they are only present for specific events. Most other fields are event type-specific, so they are only present for specific events. A few such
A few such fields include: fields include:
- `documentKey`: the `_id` value of the affected document, populated for DML events. May contain the - `documentKey`: the `_id` value of the affected document, populated for DML events. May contain the
shard key values for sharded collections. shard key values for sharded collections.
@ -389,9 +381,11 @@ A few such fields include:
value than `default`. value than `default`.
- `updateDescription` / `rawUpdateDescription`: contains details for "update" events. - `updateDescription` / `rawUpdateDescription`: contains details for "update" events.
The majority of change stream event fields are emitted by the `ChangeStreamDefaultEventTransformation` The majority of change stream event fields are emitted by the
object [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321). This object is called by the `ChangeStreamEventTransform` `ChangeStreamDefaultEventTransformation` object
stage [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321).
This object is called by the `ChangeStreamEventTransform` stage
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75).
A custom `$project` stage in the change stream pipeline can be used to suppress certain fields. A custom `$project` stage in the change stream pipeline can be used to suppress certain fields.
@ -401,8 +395,8 @@ Emitted change events can get large, especially if they contain pre- or post-ima
the events can exceed the maximum BSON object size of 16MB, which can lead to `BSONObjectTooLarge` the events can exceed the maximum BSON object size of 16MB, which can lead to `BSONObjectTooLarge`
errors when trying to process these change stream events. errors when trying to process these change stream events.
To split large change stream events into multiple smaller chunks, change stream consumers can add To split large change stream events into multiple smaller chunks, change stream consumers can add a
a `$changeStreamSplitLargeEvent` stage as the last step of their change stream pipeline, e.g. `$changeStreamSplitLargeEvent` stage as the last step of their change stream pipeline, e.g.
```js ```js
db.getSiblingDB("testDB").runCommand({ db.getSiblingDB("testDB").runCommand({
@ -419,8 +413,10 @@ db.getSiblingDB("testDB").runCommand({
}); });
``` ```
The splitting is performed by the `ChangeStreamSplitLargeEventStage` stage [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_split_large_event_stage.cpp#L72), The splitting is performed by the `ChangeStreamSplitLargeEventStage` stage
using [this helper function](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_split_event_helpers.cpp#L63). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_split_large_event_stage.cpp#L72),
using
[this helper function](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_split_event_helpers.cpp#L63).
The change stream consumer is responsible for assembling the split event fragments into a single The change stream consumer is responsible for assembling the split event fragments into a single
event later. event later.
@ -434,10 +430,9 @@ close the change stream cursor in specific situations:
- the target collection is renamed - the target collection is renamed
- the parent database of the target collection is dropped - the parent database of the target collection is dropped
- in database-level change streams, the change stream is invalidated if the target database is - in database-level change streams, the change stream is invalidated if the target database is
dropped. dropped. In case a change stream gets invalidated by any of the above situations, it will emit a
In case a change stream gets invalidated by any of the above situations, it will emit a special special "invalidate" event to inform the consumer that further processing is not possible. There
"invalidate" event to inform the consumer that further processing is not possible. are no "invalidate" events in all-cluster change streams.
There are no "invalidate" events in all-cluster change streams.
Issuing of change stream invalidate events is implemented in the `ChangeStreamCheckInvalidateStage` Issuing of change stream invalidate events is implemented in the `ChangeStreamCheckInvalidateStage`
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_invalidate_stage.cpp#L106-L157). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_invalidate_stage.cpp#L106-L157).
@ -445,12 +440,13 @@ Issuing of change stream invalidate events is implemented in the `ChangeStreamCh
## Change Stream Parameters ## Change Stream Parameters
The behavior of change streams can be controlled via various parameters that can be passed with the The behavior of change streams can be controlled via various parameters that can be passed with the
initial `aggregate` command used to open the change stream. initial `aggregate` command used to open the change stream. The parameters are defined in an
The parameters are defined in an [IDL file](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.idl#L84). [IDL file](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.idl#L84).
The parameters that are provided when opening the change stream are automatically validated using The parameters that are provided when opening the change stream are automatically validated using
mechanisms provided by the IDL framework. Additional validation of the change stream parameters is mechanisms provided by the IDL framework. Additional validation of the change stream parameters is
performed [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.cpp#L391). performed
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.cpp#L391).
Invalid change stream parameters are immediately rejected with appropriate errors. Invalid change stream parameters are immediately rejected with appropriate errors.
### `fullDocument` ### `fullDocument`
@ -466,17 +462,16 @@ The following values are possible:
may not be the same version of the document that was present when the "update" change event was may not be the same version of the document that was present when the "update" change event was
originally recorded. If no document can be found by the lookup, the `fullDocument` field will originally recorded. If no document can be found by the lookup, the `fullDocument` field will
contain `null`. contain `null`.
- `whenAvailable`: the `fullDocument` field will be populated with the post-image for the event. - `whenAvailable`: the `fullDocument` field will be populated with the post-image for the event. The
The post-image is generated on the fly from a stored pre-image and applying a delta update from post-image is generated on the fly from a stored pre-image and applying a delta update from the
the event on top of it. If no post-image is available, the `fullDocument` field will contain event on top of it. If no post-image is available, the `fullDocument` field will contain `null`.
`null`.
- `required`: populates the `fullDocument` field with the post-image for the event. Post-images are - `required`: populates the `fullDocument` field with the post-image for the event. Post-images are
generated in the same way as in `whenAvailable`. If no post-image can be generated, this will generated in the same way as in `whenAvailable`. If no post-image can be generated, this will
abort the change stream with a `NoMatchingDocument` error. abort the change stream with a `NoMatchingDocument` error.
The latter two options rely on pre-images to be enabled for the target collection(s). The latter two options rely on pre-images to be enabled for the target collection(s). When
When pre-images are enabled, they are written synchronously with the regular "update" oplog entry, pre-images are enabled, they are written synchronously with the regular "update" oplog entry, and
and change stream events arent returned until both have been majority-committed. change stream events arent returned until both have been majority-committed.
Post-images for "update" events are added to change events by the `ChangeStreamAddPostImage` stage Post-images for "update" events are added to change events by the `ChangeStreamAddPostImage` stage
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_post_image_stage.cpp#L84). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_post_image_stage.cpp#L84).
@ -506,29 +501,25 @@ parameters are:
#### `showExpandedEvents` (public) #### `showExpandedEvents` (public)
The `showExpandedEvents` flag can be used to make a change stream return both additional event types The `showExpandedEvents` flag can be used to make a change stream return both additional event types
and additional fields. and additional fields. The flag defaults to `false`. In this mode, change streams will only return
The flag defaults to `false`. In this mode, change streams will only return DML events and no DDL DML events and no DDL events. When setting `showExpandedEvents` to `true`, change streams will also
events. emit events for various DDL operations. In addition, setting `showExpandedEvents` will make change
When setting `showExpandedEvents` to `true`, change streams will also emit events for various DDL streams return the additional fields `collectionUUID` (for various change stream event types) and
operations. `updateDescription.disambiguatedPaths` (for update events).
In addition, setting `showExpandedEvents` will make change streams return the additional fields
`collectionUUID` (for various change stream event types) and `updateDescription.disambiguatedPaths`
(for update events).
#### `matchCollectionUUIDForUpdateLookup` (public) #### `matchCollectionUUIDForUpdateLookup` (public)
The `matchCollectionUUIDForUpdateLookup` field can be used to ensure that "updateLookup" operations The `matchCollectionUUIDForUpdateLookup` field can be used to ensure that "updateLookup" operations
are performed on the correct collection in case multiple collections with the same name have existed are performed on the correct collection in case multiple collections with the same name have existed
over time. over time. This is relevant, because change streams can be opened retroactively on collections that
This is relevant, because change streams can be opened retroactively on collections that were already were already dropped and may have been recreated with the same name but different contents
dropped and may have been recreated with the same name but different contents afterwards. afterwards.
The flag defaults to `false`. In this case, "updateLookup" operations will not verify that the The flag defaults to `false`. In this case, "updateLookup" operations will not verify that the
looked-up document is actually from the same collection "generation" as the change event the looked-up document is actually from the same collection "generation" as the change event the
document was looked up for. document was looked up for. If set to `true`, "updateLookup" operations will compare the collection
If set to `true`, "updateLookup" operations will compare the collection UUID of the change event UUID of the change event with the UUID of the collection. If there is a UUID mismatch, the returned
with the UUID of the collection. If there is a UUID mismatch, the returned `fullDocument` field of `fullDocument` field of the event will be set to `null`.
the event will be set to `null`.
#### `allChangesForCluster` (public) #### `allChangesForCluster` (public)
@ -539,29 +530,28 @@ automatically when opening an all-cluster change stream.
The `showSystemEvents` flag can be used to make change streams return events for collections inside The `showSystemEvents` flag can be used to make change streams return events for collections inside
the `system` namespace. These are not emitted by default. Setting `showSystemEvents` to `true` will the `system` namespace. These are not emitted by default. Setting `showSystemEvents` to `true` will
also include events related to system collections in the change stream. also include events related to system collections in the change stream. The flag defaults to `false`
The flag defaults to `false` and is internal. and is internal.
#### `showMigrationEvents` (internal) #### `showMigrationEvents` (internal)
The `showMigrationEvents` flag can be used to make change streams return DML events that are The `showMigrationEvents` flag can be used to make change streams return DML events that are
happening during chunk migrations. If set to `true`, insert and delete events related to chunk happening during chunk migrations. If set to `true`, insert and delete events related to chunk
migrations will be reported as if they were regular events. migrations will be reported as if they were regular events. The flag defaults to `false` and is
The flag defaults to `false` and is internal. internal.
#### `showCommitTimestamp` (internal) #### `showCommitTimestamp` (internal)
The `showCommitTimestamp` flag can be used to include the transaction commit timestamp inside DML The `showCommitTimestamp` flag can be used to include the transaction commit timestamp inside DML
events that were part of a prepared transaction. events that were part of a prepared transaction. The flag defaults to `true` and is internal. It is
The flag defaults to `true` and is internal. It is used by the resharding. used by the resharding.
#### `showRawUpdateDescription` (internal) #### `showRawUpdateDescription` (internal)
The `showRawUpdateDescription` flag can be used to make change streams emit the raw, internal format The `showRawUpdateDescription` flag can be used to make change streams emit the raw, internal format
used for "update" oplog entries. used for "update" oplog entries. If set to `true`, emitted change stream "update" events will
If set to `true`, emitted change stream "update" events will contain a `rawUpdateDescription` field. contain a `rawUpdateDescription` field. The default is `false`. In this case, emitted change stream
The default is `false`. In this case, emitted change stream "update" events will contain the regular "update" events will contain the regular `updateDescription` field.
`updateDescription` field.
#### `allowToRunOnConfigDB` (internal) #### `allowToRunOnConfigDB` (internal)
@ -572,9 +562,9 @@ server to keep track of shard additions and removals in the deployment.
#### `$_passthroughToShard` (internal) #### `$_passthroughToShard` (internal)
In sharded cluster deployments, all change streams are supposed to be opened on _mongos_. _mongos_ In sharded cluster deployments, all change streams are supposed to be opened on _mongos_. _mongos_
will open the required cursors to the data shards and the config server on the consumer's behalf. will open the required cursors to the data shards and the config server on the consumer's behalf. If
If the consumer only wants to target a specific shard of the cluster, they can use the `$_passthroughToShard` the consumer only wants to target a specific shard of the cluster, they can use the
aggregation parameter to limit the change stream to a single shard. `$_passthroughToShard` aggregation parameter to limit the change stream to a single shard.
For example, to open a collection-level change stream targeting only one of the cluster's shards For example, to open a collection-level change stream targeting only one of the cluster's shards
(identified by the value in `shardId`), the following example code can be used: (identified by the value in `shardId`), the following example code can be used:
@ -592,8 +582,8 @@ db.getSiblingDB("testDB").runCommand({
}); });
``` ```
Using `$_passthroughToShard` will bypass the regular cluster shard targeting for change streams Using `$_passthroughToShard` will bypass the regular cluster shard targeting for change streams and
and open a replica set change stream pipeline (only) on the targeted shard. The change events that open a replica set change stream pipeline (only) on the targeted shard. The change events that
mongos retrieves from the single shard will be returned as is, without using a merge pipeline on mongos retrieves from the single shard will be returned as is, without using a merge pipeline on
_mongos_. _mongos_.
@ -609,23 +599,26 @@ stream against a _mongos_ instance. The _mongos_ instance will then use the clus
information to open the cursors on the config server and the data shards on behalf of the consumer. information to open the cursors on the config server and the data shards on behalf of the consumer.
Because of the ordering guarantee provided by change streams, _mongos_ must wait until all cursors Because of the ordering guarantee provided by change streams, _mongos_ must wait until all cursors
have either responded with events, or ran into a timeout and reported that currently no more events have either responded with events, or ran into a timeout and reported that currently no more events
are available for them. are available for them. The latter is why change streams in a sharded cluster can have higher
The latter is why change streams in a sharded cluster can have higher latency than change streams latency than change streams in replica sets.
in replica sets.
For sharded cluster change streams, the merging of the multiple streams of change events from the For sharded cluster change streams, the merging of the multiple streams of change events from the
different cursors is performed by the [`AsyncResultsMerger`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/s/query/exec/async_results_merger.h#L100). different cursors is performed by the
[`AsyncResultsMerger`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/s/query/exec/async_results_merger.h#L100).
## Change Stream Pipeline Building ## Change Stream Pipeline Building
A change stream pipeline issued by a consumer contains the `$changeStream` meta stage. A change stream pipeline issued by a consumer contains the `$changeStream` meta stage. This stage is
This stage is expanded internally into multiple `DocumentSource`s [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_pipeline_helpers.cpp#L171). expanded internally into multiple `DocumentSource`s
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_pipeline_helpers.cpp#L171).
The change stream `DocumentSource`s are located in the `src/mongo/db/pipeline` directory [here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline), among other `DocumentSource`s that The change stream `DocumentSource`s are located in the `src/mongo/db/pipeline` directory
are not related to change streams. [here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline),
The `DocumentSource`s are only used for pipeline building and optimization, but they are converted among other `DocumentSource`s that are not related to change streams. The `DocumentSource`s are only
into execution `Stage`s later when the change stream is executed. used for pipeline building and optimization, but they are converted into execution `Stage`s later
These `Stage`s are located in the `src/mongo/db/exec/agg` directory [here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg). when the change stream is executed. These `Stage`s are located in the `src/mongo/db/exec/agg`
directory
[here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg).
### Replica Set Pipelines ### Replica Set Pipelines
@ -634,13 +627,14 @@ On a replica set, the `$changeStream` stage is expanded into the following inter
- `$_internalChangeStreamOplogMatch` - `$_internalChangeStreamOplogMatch`
- `$_internalChangeStreamUnwindTransaction` - `$_internalChangeStreamUnwindTransaction`
- `$_internalChangeStreamTransform` - `$_internalChangeStreamTransform`
- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level change - `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level
streams) change streams)
- `$_internalChangeStreamCheckResumability` - `$_internalChangeStreamCheckResumability`
- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to `off`) - `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to
`off`)
- `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`) - `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`)
- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token is - `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token
not a high water mark token) is not a high water mark token)
- user-defined `$match` expression (only present if the user's change stream pipeline contains a - user-defined `$match` expression (only present if the user's change stream pipeline contains a
`$match` stage) `$match` stage)
- user-defined `$project` expression (only present if the user's change stream pipeline contains a - user-defined `$project` expression (only present if the user's change stream pipeline contains a
@ -648,8 +642,8 @@ On a replica set, the `$changeStream` stage is expanded into the following inter
- `$_internalChangeStreamSplitLargeEvent` (only present if the change stream is opened with the - `$_internalChangeStreamSplitLargeEvent` (only present if the change stream is opened with the
`$changeStreamSplitLargeEvent` pipeline step) `$changeStreamSplitLargeEvent` pipeline step)
The change stream pipeline on replica sets will also contain a `$match` stage to filter out all non-DML The change stream pipeline on replica sets will also contain a `$match` stage to filter out all
change events in case `showExpandedEvents` is not set. non-DML change events in case `showExpandedEvents` is not set.
### Sharded Cluster Pipelines ### Sharded Cluster Pipelines
@ -659,10 +653,11 @@ following internal stages:
- `$_internalChangeStreamOplogMatch` - `$_internalChangeStreamOplogMatch`
- `$_internalChangeStreamUnwindTransaction` - `$_internalChangeStreamUnwindTransaction`
- `$_internalChangeStreamTransform` - `$_internalChangeStreamTransform`
- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level change - `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level
streams) change streams)
- `$_internalChangeStreamCheckResumability` - `$_internalChangeStreamCheckResumability`
- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to `off`) - `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to
`off`)
- `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`) - `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`)
- user-defined `$match` expression (only present if the user's change stream pipeline contains a - user-defined `$match` expression (only present if the user's change stream pipeline contains a
`$match` stage) `$match` stage)
@ -674,8 +669,8 @@ following internal stages:
--- ---
- `$_internalChangeStreamHandleTopologyChange` - `$_internalChangeStreamHandleTopologyChange`
- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token is - `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token
not a high water mark token) is not a high water mark token)
Additionally, the change stream pipeline on a sharded cluster will contain a `$match` stage to Additionally, the change stream pipeline on a sharded cluster will contain a `$match` stage to
filter out all non-DML change events in case `showExpandedEvents` is not set. filter out all non-DML change events in case `showExpandedEvents` is not set.
@ -685,9 +680,9 @@ After building the initial pipeline stages, _mongos_ will split the pipeline int
- a part that is executed on data shards ("shard pipeline") and - a part that is executed on data shards ("shard pipeline") and
- a part that is executed on _mongos_ ("merge pipeline"). - a part that is executed on _mongos_ ("merge pipeline").
The pipeline split point is above the `$_internalChangeStreamHandleTopologyChange` stage. The pipeline split point is above the `$_internalChangeStreamHandleTopologyChange` stage. _mongos_
_mongos_ will also add a `$mergeCursors` stage that aggregates the responses from different shards will also add a `$mergeCursors` stage that aggregates the responses from different shards and the
and the config server into a single, sorted stream. config server into a single, sorted stream.
#### Data Shard Pipeline #### Data Shard Pipeline
@ -696,15 +691,16 @@ The shard pipeline will look like this:
- `$_internalChangeStreamOplogMatch` - `$_internalChangeStreamOplogMatch`
- `$_internalChangeStreamUnwindTransaction` - `$_internalChangeStreamUnwindTransaction`
- `$_internalChangeStreamTransform` - `$_internalChangeStreamTransform`
- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level change - `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level
streams) change streams)
- `$_internalChangeStreamCheckResumability` - `$_internalChangeStreamCheckResumability`
- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to `off`) - `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to
`off`)
- `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`) - `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`)
- user-defined `$match` expression (only present if the user's change stream pipeline contains a - user-defined `$match` expression (only present if the user's change stream pipeline contains a
`$match` stage) `$match` stage)
- user-defined `$project` expression (only present if the change stream pipeline contains a `$project` - user-defined `$project` expression (only present if the change stream pipeline contains a
stage) `$project` stage)
- `$_internalChangeStreamSplitLargeEvent` (only present if the change stream is opened with the - `$_internalChangeStreamSplitLargeEvent` (only present if the change stream is opened with the
`$changeStreamSplitLargeEvent` pipeline step) `$changeStreamSplitLargeEvent` pipeline step)
@ -714,16 +710,18 @@ The merge pipeline on _mongos_ will look like this:
- `$mergeCursors` - `$mergeCursors`
- `$_internalChangeStreamHandleTopologyChange` - `$_internalChangeStreamHandleTopologyChange`
- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token is - `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token
not a high water mark token) is not a high water mark token)
### Details of individual Pipeline Stages ### Details of individual Pipeline Stages
#### `$_internalChangeStreamOplogMatch` #### `$_internalChangeStreamOplogMatch`
This stage is responsible for reading data from the oplog and filtering out irrelevant events. This stage is responsible for reading data from the oplog and filtering out irrelevant events. The
The `DocumentSourceChangeStreamOplogMatch` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.h#L61). `DocumentSourceChangeStreamOplogMatch` code is
The oplog filter for the stage is built [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.cpp#L79). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.h#L61).
The oplog filter for the stage is built
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.cpp#L79).
There is no `Stage` equivalent for `DocumentSourceChangeStreamOplogMatch`, as it will be turned into There is no `Stage` equivalent for `DocumentSourceChangeStreamOplogMatch`, as it will be turned into
a `$cursor` stage for execution. a `$cursor` stage for execution.
@ -731,28 +729,35 @@ a `$cursor` stage for execution.
#### `$_internalChangeStreamUnwindTransaction` #### `$_internalChangeStreamUnwindTransaction`
This stage is responsible for "unwinding" (expanding) multiple operations that are contained in an This stage is responsible for "unwinding" (expanding) multiple operations that are contained in an
"applyOps" oplog entry into individual events. "applyOps" oplog entry into individual events. The `DocumentSourceChangeStreamUnwindTransaction`
The `DocumentSourceChangeStreamUnwindTransaction` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.h#L71). code is
The `ChangeStreamUnwindTransactionStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.cpp#L83). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.h#L71).
The `ChangeStreamUnwindTransactionStage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.cpp#L83).
#### `$_internalChangeStreamTransform` #### `$_internalChangeStreamTransform`
This stage is responsible for converting oplog entries into change events. It will build a change This stage is responsible for converting oplog entries into change events. It will build a change
event document for every oplog entry that enters this stage. event document for every oplog entry that enters this stage. Event fields are added based on the
Event fields are added based on the change stream configuration. change stream configuration. The `DocumentSourceChangeStreamTransform` code is
The `DocumentSourceChangeStreamTransform` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_transform.h#L60). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_transform.h#L60).
The `ChangeStreamTransformStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75). The `ChangeStreamTransformStage` code is
The actual event transformation happens inside `ChangeStreamDefaultEventTransformation` [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75).
The actual event transformation happens inside `ChangeStreamDefaultEventTransformation`
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321).
#### `$_internalChangeStreamCheckInvalidate` #### `$_internalChangeStreamCheckInvalidate`
This stage is responsible for creating change stream "invalidate" events and is only added for This stage is responsible for creating change stream "invalidate" events and is only added for
collection-level and database-level change streams. collection-level and database-level change streams. The `DocumentSourceChangeStreamCheckInvalidate`
The `DocumentSourceChangeStreamCheckInvalidate` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_invalidate.h#L65). code is
The `ChangeStreamCheckInvalidate` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_invalidate_stage.cpp#L106). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_invalidate.h#L65).
The `ChangeStreamCheckInvalidate` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_invalidate_stage.cpp#L106).
When an invalidate event is encountered, the stage will first emit an "invalidate" event, and then When an invalidate event is encountered, the stage will first emit an "invalidate" event, and then
throws a `ChangeStreamInvalidated` exception on the next call. The [`ChangeStreamInvalidatedInfo`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_invalidation_info.h#L47). throws a `ChangeStreamInvalidated` exception on the next call. The
[`ChangeStreamInvalidatedInfo`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_invalidation_info.h#L47).
exception type contains the error code `ChangeStreamInvalidated`. exception type contains the error code `ChangeStreamInvalidated`.
#### `$_internalChangeStreamCheckResumability` #### `$_internalChangeStreamCheckResumability`
@ -761,18 +766,22 @@ This stage checks if the oplog has enough history to resume the change stream, a
events up to the given resume point. If no data for the resume point can be found in the oplog events up to the given resume point. If no data for the resume point can be found in the oplog
anymore, it will throw a `ChangeStreamHistoryLost` error. anymore, it will throw a `ChangeStreamHistoryLost` error.
The `DocumentSourceChangeStreamCheckResumability` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_resumability.h#L79). The `DocumentSourceChangeStreamCheckResumability` code is
The `ChangeStreamCheckResumabilityStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_resumability_stage.cpp#L68). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_resumability.h#L79).
The `ChangeStreamCheckResumabilityStage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_resumability_stage.cpp#L68).
#### `$_internalChangeStreamAddPreImage` #### `$_internalChangeStreamAddPreImage`
This stage is responsible for adding pre-image data to "update", "replace" and "delete" events. It This stage is responsible for adding pre-image data to "update", "replace" and "delete" events. It
is only added to change stream pipelines if the `fullDocumentBeforeChange` parameter is not set to is only added to change stream pipelines if the `fullDocumentBeforeChange` parameter is not set to
`off`. `off`. If enabled, the stage relies on the pre-images stored in the system's pre-image system
If enabled, the stage relies on the pre-images stored in the system's pre-image system collection. collection.
The `DocumentSourceChangeStreamAddPreImage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_pre_image.h#L67). The `DocumentSourceChangeStreamAddPreImage` code is
The `ChangeStreamAddPreImageStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_pre_image_stage.cpp#L67). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_pre_image.h#L67).
The `ChangeStreamAddPreImageStage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_pre_image_stage.cpp#L67).
#### `$_internalChangeStreamAddPostImage` #### `$_internalChangeStreamAddPostImage`
@ -780,23 +789,24 @@ This stage is responsible for adding post-image data to "update" events. It is o
stream pipelines if the `fullDocument` parameter is not set to `default`. stream pipelines if the `fullDocument` parameter is not set to `default`.
If `fullDocument` is set to `updateLookup`, the stage will perform a lookup for the current version If `fullDocument` is set to `updateLookup`, the stage will perform a lookup for the current version
of a document that was updated by an "update" event, and store it in the `fullDocument` field of of a document that was updated by an "update" event, and store it in the `fullDocument` field of the
the "update" event if present. The lookup is performed using the `_id` value of the document from "update" event if present. The lookup is performed using the `_id` value of the document from the
the change event. As the lookup is executed at a different point in time than when the change event change event. As the lookup is executed at a different point in time than when the change event was
was recorded, it is possible that the lookup finds a different version of the document than the one recorded, it is possible that the lookup finds a different version of the document than the one that
that was active when the change event was recorded. This can happen if the document was updated was active when the change event was recorded. This can happen if the document was updated again
again between the change event and the lookup. The lookup may also find no document at all if the between the change event and the lookup. The lookup may also find no document at all if the document
document was deleted after the "update" event, but before the lookup. was deleted after the "update" event, but before the lookup. In case the lookup cannot find a
In case the lookup cannot find a document with the requested `_id`, it will populate the document with the requested `_id`, it will populate the `fullDocument` field with a value of `null`.
`fullDocument` field with a value of `null`.
If `fullDocument` is set to `whenAvailable` or `required`, the stage will make use of the stored If `fullDocument` is set to `whenAvailable` or `required`, the stage will make use of the stored
pre-image of the document in the system's pre-image system collection. It will fetch the pre-image pre-image of the document in the system's pre-image system collection. It will fetch the pre-image
and then apply the delta that is stored in the "update" change event on top of it, and store the and then apply the delta that is stored in the "update" change event on top of it, and store the
result in the `fullDocument` field. result in the `fullDocument` field.
The `DocumentSourceChangeStreamAddPostImage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_post_image.h#L63). The `DocumentSourceChangeStreamAddPostImage` code is
The `ChangeStreamAddPostImageStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_post_image_stage.cpp#L84). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_post_image.h#L63).
The `ChangeStreamAddPostImageStage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_post_image_stage.cpp#L84).
#### `$_internalChangeStreamEnsureResumeTokenPresent` #### `$_internalChangeStreamEnsureResumeTokenPresent`
@ -805,18 +815,22 @@ the change stream parameters is actually in the stream. The stage is only presen
stream resume token is not a high water mark token. If the resume token cannot be found in the stream resume token is not a high water mark token. If the resume token cannot be found in the
stream, it will throw a `ChangeStreamFatalError`. stream, it will throw a `ChangeStreamFatalError`.
The `DocumentSourceChangeStreamEnsureResumeTokenPresent` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_ensure_resume_token_present.h#L51). The `DocumentSourceChangeStreamEnsureResumeTokenPresent` code is
The `ChangeStreamEnsureResumeTokenPresent` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_ensure_resume_token_present_stage.cpp#L67). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_ensure_resume_token_present.h#L51).
The `ChangeStreamEnsureResumeTokenPresent` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_ensure_resume_token_present_stage.cpp#L67).
#### `$_internalChangeStreamHandleTopologyChange` #### `$_internalChangeStreamHandleTopologyChange`
This stage is only present in sharded cluster change streams and is always part of the _mongos_ This stage is only present in sharded cluster change streams and is always part of the _mongos_
merge pipeline. The stage is responsible for opening additional cursors to shards that have been merge pipeline. The stage is responsible for opening additional cursors to shards that have been
added to the cluster. It will handle "insert" events into the `config.shards` collection that added to the cluster. It will handle "insert" events into the `config.shards` collection that were
were observed from the config server. observed from the config server.
The `DocumentSourceChangeStreamHandleTopologyChange` code can be found [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_handle_topology_change.h#L63). The `DocumentSourceChangeStreamHandleTopologyChange` code can be found
The `ChangeStreamHandleTopologyChangeStage` code can be found [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_handle_topology_change_stage.cpp#L121). [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_handle_topology_change.h#L63).
The `ChangeStreamHandleTopologyChangeStage` code can be found
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_handle_topology_change_stage.cpp#L121).
## Missing documentation (to be completed) ## Missing documentation (to be completed)

View File

@ -1,75 +1,70 @@
# Command Dispatch # Command Dispatch
Command dispatch refers to the general process by which client requests are Command dispatch refers to the general process by which client requests are taken from the network,
taken from the network, parsed, sanitized, then finally run on databases. parsed, sanitized, then finally run on databases.
## Service Entry Points ## Service Entry Points
[Service entry points][service_entry_point_h] fulfill the transition from the [Service entry points][service_entry_point_h] fulfill the transition from the transport layer into
transport layer into command implementations. For each incoming connection command implementations. For each incoming connection from a client (in the form of a
from a client (in the form of a [session][session_h] object), a new dedicated [session][session_h] object), a new dedicated thread is spawned then detached, and is also assigned
thread is spawned then detached, and is also assigned a new [session workflow] a new [session workflow] [session_workflow_h], responsible for maintaining the workflow of a single
[session_workflow_h], responsible for maintaining the workflow of a client connection during its lifetime. Central to the entry point is the `handleRequest()` function,
single client connection during its lifetime. Central to the entry point is the which manages the server-side logic of processing requests and returns a response message indicating
`handleRequest()` function, which manages the server-side logic of processing the result of the corresponding request message. This function is currently implemented by several
requests and returns a response message indicating the result of the subclasses of the parent `ServiceEntryPoint` in order to account for the differences in processing
corresponding request message. This function is currently implemented by several requests between the shard and router roles -- these distinctions are reflected in the
subclasses of the parent `ServiceEntryPoint` in order to account for the `ServiceEntryPointRouterRole` and `ServiceEntryPointShardRole` subclasses (see
differences in processing requests between the shard and router roles -- these [here][service_entry_point_router_role_h] and [here][service_entry_point_shard_role.h]).
distinctions are reflected in the `ServiceEntryPointRouterRole` and
`ServiceEntryPointShardRole` subclasses (see [here][service_entry_point_router_role_h]
and [here][service_entry_point_shard_role.h]).
## Strategy ## Strategy
One area in which the _mongos_ entry point differs from its _mongod_ counterpart One area in which the _mongos_ entry point differs from its _mongod_ counterpart is in its usage of
is in its usage of the [Strategy class][strategy_h]. `Strategy` operates as a the [Strategy class][strategy_h]. `Strategy` operates as a legacy interface for processing client
legacy interface for processing client read, write, and command requests; there read, write, and command requests; there is a near 1-to-1 mapping between its constituent functions
is a near 1-to-1 mapping between its constituent functions and request types and request types (e.g. `writeOp()` for handling write operation requests, `getMore()` for a getMore
(e.g. `writeOp()` for handling write operation requests, `getMore()` for a request, etc.). These functions comprise the backbone of the _mongos_ entry point's
getMore request, etc.). These functions comprise the backbone of the _mongos_ `handleRequest()` -- that is to say, when a valid request is received, it is sieved and ultimately
entry point's `handleRequest()` -- that is to say, when a valid request is passed along to the appropriate Strategy class member function. The significance of using the
received, it is sieved and ultimately passed along to the appropriate Strategy Strategy class specifically with the _mongos_ entry point is that it [facilitates query routing to
class member function. The significance of using the Strategy class specifically shards][mongos_router] in _addition_ to running queries against targeted databases (see
with the _mongos_ entry point is that it [facilitates query routing to [s/transaction_router.h][transaction_router_h] for finer details).
shards][mongos_router] in _addition_ to running queries against targeted
databases (see [s/transaction_router.h][transaction_router_h] for finer
details).
## Commands ## Commands
The [Command class][commands_h] serves as a means of cataloging a server command The [Command class][commands_h] serves as a means of cataloging a server command as well as
as well as ascribing various attributes and behaviors to commands via the [type ascribing various attributes and behaviors to commands via the [type
system][template_method_pattern], that will likely be used during the lifespan system][template_method_pattern], that will likely be used during the lifespan of a particular
of a particular server. Construction of a Command should only occur during server. Construction of a Command should only occur during server startup. When a new Command is
server startup. When a new Command is constructed, that Command is stored in a constructed, that Command is stored in a global `CommandRegistry` object for future reference. There
global `CommandRegistry` object for future reference. There are two kinds of are two kinds of Command subclasses: `BasicCommand` and `TypedCommand`.
Command subclasses: `BasicCommand` and `TypedCommand`.
A major distinction between the two is in their implementation of the `parse()` A major distinction between the two is in their implementation of the `parse()` member function.
member function. `parse()` takes in a request and returns a handle to a single `parse()` takes in a request and returns a handle to a single invocation of a particular Command
invocation of a particular Command (represented by a `CommandInvocation`), that (represented by a `CommandInvocation`), that can then be used to run the Command. The
can then be used to run the Command. The `BasicCommand::parse()` is a naive `BasicCommand::parse()` is a naive implementation that merely forwards incoming requests to the
implementation that merely forwards incoming requests to the Invocation and Invocation and makes sure that the Command does not support document sequences. The implementation
makes sure that the Command does not support document sequences. The of `TypedCommand::parse()`, on the other hand, varies depending on the Request type parameter the
implementation of `TypedCommand::parse()`, on the other hand, varies depending Command takes in. Since the `TypedCommand` accepts requests generated by IDL, the parsing function
on the Request type parameter the Command takes in. Since the `TypedCommand` associated with a usable Request type must allow it to be parsed as an IDL command. In handling
accepts requests generated by IDL, the parsing function associated with a usable requests, both the _mongos_ and _mongod_ entry points interact with the Command subclasses through
Request type must allow it to be parsed as an IDL command. In handling requests, the `CommandHelpers` struct in order to parse requests and ultimately run them as Commands.
both the _mongos_ and _mongod_ entry points interact with the Command subclasses
through the `CommandHelpers` struct in order to parse requests and ultimately
run them as Commands.
## Admission control ## Admission control
To ensure stability of our servers, we have implemented different admission control mechanisms to prevent data-nodes from becoming overloaded with operations. When implementing a new command, it's important to decide whether the command will be subject to one of the admission controls in place and understand the resulting outcomes. To ensure stability of our servers, we have implemented different admission control mechanisms to
prevent data-nodes from becoming overloaded with operations. When implementing a new command, it's
important to decide whether the command will be subject to one of the admission controls in place
and understand the resulting outcomes.
For example, user commands may be subject to Ingress Admission Control, which happens in the [ServiceEntryPoint][IngressControl]. For example, user commands may be subject to Ingress Admission Control, which happens in the
For information on admission control and how to implement admission control into a new command, please see [Admission Control README][ACReadMe] [ServiceEntryPoint][IngressControl]. For information on admission control and how to implement
admission control into a new command, please see [Admission Control README][ACReadMe]
## See Also ## See Also
For details on transport internals, including ingress networking, see [this document][transport_internals]. For details on transport internals, including ingress networking, see [this
document][transport_internals].
[service_entry_point_h]: ../src/mongo/transport/service_entry_point.h [service_entry_point_h]: ../src/mongo/transport/service_entry_point.h
[session_h]: ../src/mongo/transport/session.h [session_h]: ../src/mongo/transport/session.h
@ -85,4 +80,5 @@ For details on transport internals, including ingress networking, see [this docu
[template_method_pattern]: https://en.wikipedia.org/wiki/Template_method_pattern [template_method_pattern]: https://en.wikipedia.org/wiki/Template_method_pattern
[transport_internals]: ../src/mongo/transport/README.md [transport_internals]: ../src/mongo/transport/README.md
[ACReadMe]: ../src/mongo/db/admission/README.md [ACReadMe]: ../src/mongo/db/admission/README.md
[IngressControl]: https://github.com/mongodb/mongo/blob/a86c7f5de2a5de4d2f49e40e8970754ec6a5ba6c/src/mongo/db/service_entry_point_shard_role.cpp#L1803 [IngressControl]:
https://github.com/mongodb/mongo/blob/a86c7f5de2a5de4d2f49e40e8970754ec6a5ba6c/src/mongo/db/service_entry_point_shard_role.cpp#L1803

View File

@ -14,9 +14,9 @@ dynamically extensible.
A `ServiceContext` represents all of the state of a single Mongo server process, which may be either A `ServiceContext` represents all of the state of a single Mongo server process, which may be either
a `mongod` or a `mongos`. It creates and manages the previously mentioned `Client`s and a `mongod` or a `mongos`. It creates and manages the previously mentioned `Client`s and
`OperationContext`s, as well as a `TransportLayer` for performing network operations, a `OperationContext`s, as well as a `TransportLayer` for performing network operations, a
`PeriodicRunner` for running housekeeping tasks periodically, a `StorageEngine` for interacting `PeriodicRunner` for running housekeeping tasks periodically, a `StorageEngine` for interacting with
with the actual database itself, and a set of time sources. In general, every Mongo server process the actual database itself, and a set of time sources. In general, every Mongo server process has a
has a single `ServiceContext`, known as the _global_ `ServiceContext`. Typical uses of the global single `ServiceContext`, known as the _global_ `ServiceContext`. Typical uses of the global
`ServiceContext` outside of server initialization and shutdown include looking up `Client` or `ServiceContext` outside of server initialization and shutdown include looking up `Client` or
`OperationContext` information for a particular thread or operation, or killing one or more running `OperationContext` information for a particular thread or operation, or killing one or more running
operations during, e.g., a primary replica step-down. The global `ServiceContext` is created during operations during, e.g., a primary replica step-down. The global `ServiceContext` is created during
@ -28,16 +28,16 @@ The `ServiceContext` associated with a given `Client` object can be fetched in a
using [`Client::getServiceContext()`][client-get-service-context-url] when possible. As of time of using [`Client::getServiceContext()`][client-get-service-context-url] when possible. As of time of
writing, every server process only maintains a single `ServiceContext`, but preferring writing, every server process only maintains a single `ServiceContext`, but preferring
`Client::getServiceContext()` or `ServiceContext::getCurrentServiceContext()` over `Client::getServiceContext()` or `ServiceContext::getCurrentServiceContext()` over
[`ServiceContext::getGlobalServiceContext()`][get-global-service-context-url] will allow us to [`ServiceContext::getGlobalServiceContext()`][get-global-service-context-url] will allow us to more
more easily maintain multiple `ServiceContext`s per server process if desired in the future. easily maintain multiple `ServiceContext`s per server process if desired in the future.
## [`Client`][client-url] ## [`Client`][client-url]
Each logical connection to a Mongo service is managed by a `Client` object, where a logical Each logical connection to a Mongo service is managed by a `Client` object, where a logical
connection may be a user or an internal process that needs to run a command or query on the database. connection may be a user or an internal process that needs to run a command or query on the
Construction of a `Client` object is typically performed with a call to `makeClient` on the global database. Construction of a `Client` object is typically performed with a call to `makeClient` on
`ServiceContext`, which can then be attached to any thread of execution, or with a call to the global `ServiceContext`, which can then be attached to any thread of execution, or with a call
[`Client::initThread`][client-init-thread-url] which constructs a `Client` on the global to [`Client::initThread`][client-init-thread-url] which constructs a `Client` on the global
`ServiceContext` and binds it to the current thread. All operations executed by the `Client` will `ServiceContext` and binds it to the current thread. All operations executed by the `Client` will
take place on that `Client`s associated thread serially over the network connection managed by the take place on that `Client`s associated thread serially over the network connection managed by the
`Session` object that was passed into the `Client`s constructor. If no `Session` is passed to the `Session` object that was passed into the `Client`s constructor. If no `Session` is passed to the
@ -70,13 +70,13 @@ operations. The semantics of the `Client` lock are summarized in the table below
[`Client::cc()`][client-cc-url] may be used to get the `Client` object associated with the currently [`Client::cc()`][client-cc-url] may be used to get the `Client` object associated with the currently
executing thread. Prefer passing `Client` objects as parameters over calls to `Client::cc()` when executing thread. Prefer passing `Client` objects as parameters over calls to `Client::cc()` when
possible. A [`ThreadClient`][thread-client-url] is an RAII-style class which may be used to construct possible. A [`ThreadClient`][thread-client-url] is an RAII-style class which may be used to
and bind a `Client` to the current running thread and automatically unbind it once the `ThreadClient` construct and bind a `Client` to the current running thread and automatically unbind it once the
goes out of scope. An [`AlternativeClientRegion`][acr-url] is another RAII-style class which may be `ThreadClient` goes out of scope. An [`AlternativeClientRegion`][acr-url] is another RAII-style
used to temporarily bind a `Client` object to the currently running thread (holding any currently class which may be used to temporarily bind a `Client` object to the currently running thread
bound `Client` in reserve), rebinding the current threads old `Client` to the current thread upon (holding any currently bound `Client` in reserve), rebinding the current threads old `Client` to
falling out of scope. [`ClientStrand`][client-strand-url] functions similarly, but also provides an the current thread upon falling out of scope. [`ClientStrand`][client-strand-url] functions
`Executor` interface for binding a `Client` to an arbitrary thread. similarly, but also provides an `Executor` interface for binding a `Client` to an arbitrary thread.
## [`OperationContext`][operation-context-url] ## [`OperationContext`][operation-context-url]
@ -92,23 +92,37 @@ performed asynchronously.
### Interruptibility ### Interruptibility
`OperationContext`s implement the [`Interruptible`][interruptible-url] interface, which allows them to `OperationContext`s implement the [`Interruptible`][interruptible-url] interface, which allows them
be killed by their associated `Client`s (or, by proxy, their owning `ServiceContext`). See to be killed by their associated `Client`s (or, by proxy, their owning `ServiceContext`). See [this
[this comment block][opctx-interruptible-comment-block-url] for more details on when and how comment block][opctx-interruptible-comment-block-url] for more details on when and how
`OperationContext`s are interrupted. `OperationContext`s are interrupted.
[service-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L141 [service-context-url]:
[decorable-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/decorable.h https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L141
[client-get-service-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L117 [decorable-url]:
[get-global-service-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L755 https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/decorable.h
[client-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h [client-get-service-context-url]:
[client-init-thread-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L75 https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L117
[client-cc-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L372 [get-global-service-context-url]:
[thread-client-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L320 https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L755
[acr-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L347 [client-url]:
[client-strand-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client_strand.h https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h
[operation-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.h [client-init-thread-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L75
[client-cc-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L372
[thread-client-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L320
[acr-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L347
[client-strand-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client_strand.h
[operation-context-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.h
[kill-op-url]: https://docs.mongodb.com/manual/reference/command/killOp/ [kill-op-url]: https://docs.mongodb.com/manual/reference/command/killOp/
[baton-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/baton.h [baton-url]:
[interruptible-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/interruptible.h https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/baton.h
[opctx-interruptible-comment-block-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.cpp#L281 [interruptible-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/interruptible.h
[opctx-interruptible-comment-block-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.cpp#L281

File diff suppressed because it is too large Load Diff

View File

@ -4,8 +4,10 @@
**👉 Please visit the new [Dev Container Documentation](./devcontainer/README.md) for:** **👉 Please visit the new [Dev Container Documentation](./devcontainer/README.md) for:**
- 📖 [**Getting Started Guide**](./devcontainer/getting-started.md) - Step-by-step setup instructions - 📖 [**Getting Started Guide**](./devcontainer/getting-started.md) - Step-by-step setup
- 🏗️ [**Architecture & Technical Details**](./devcontainer/architecture.md) - How everything works under the hood instructions
- 🏗️ [**Architecture & Technical Details**](./devcontainer/architecture.md) - How everything works
under the hood
- 🔧 [**Troubleshooting Guide**](./devcontainer/troubleshooting.md) - Solutions to common issues - 🔧 [**Troubleshooting Guide**](./devcontainer/troubleshooting.md) - Solutions to common issues
- 💡 [**Advanced Usage**](./devcontainer/advanced.md) - Customization and power user features - 💡 [**Advanced Usage**](./devcontainer/advanced.md) - Customization and power user features
- ❓ [**FAQ**](./devcontainer/faq.md) - Frequently asked questions - ❓ [**FAQ**](./devcontainer/faq.md) - Frequently asked questions

View File

@ -1,10 +1,12 @@
# MongoDB Development with Dev Containers # MongoDB Development with Dev Containers
**⚠️ BETA:** The devcontainer setup is currently in Beta stage. Please report issues and feedback to the team. **⚠️ BETA:** The devcontainer setup is currently in Beta stage. Please report issues and feedback to
the team.
## 📚 Documentation Index ## 📚 Documentation Index
This is the comprehensive guide for developing MongoDB using Dev Containers. Choose the guide that best fits your needs: This is the comprehensive guide for developing MongoDB using Dev Containers. Choose the guide that
best fits your needs:
### 🚀 [Getting Started](./getting-started.md) ### 🚀 [Getting Started](./getting-started.md)
@ -80,7 +82,8 @@ This is the comprehensive guide for developing MongoDB using Dev Containers. Cho
## What are Dev Containers? ## What are Dev Containers?
Dev Containers provide a consistent, reproducible development environment using Docker containers. This ensures: Dev Containers provide a consistent, reproducible development environment using Docker containers.
This ensures:
- ✅ **Consistency**: Everyone works with identical tooling and dependencies - ✅ **Consistency**: Everyone works with identical tooling and dependencies
- ✅ **Isolation**: Your host system stays clean - ✅ **Isolation**: Your host system stays clean

View File

@ -1,8 +1,10 @@
# Advanced Dev Container Usage # Advanced Dev Container Usage
This guide covers advanced workflows and power user features for managing multiple containers, backups, and complex development scenarios. This guide covers advanced workflows and power user features for managing multiple containers,
backups, and complex development scenarios.
**Looking to customize your devcontainer?** See the [Customization Guide](./customization.md) for dotfiles, VS Code settings, extensions, and performance tuning. **Looking to customize your devcontainer?** See the [Customization Guide](./customization.md) for
dotfiles, VS Code settings, extensions, and performance tuning.
## Table of Contents ## Table of Contents

View File

@ -1,6 +1,7 @@
# Dev Container Architecture # Dev Container Architecture
This document provides a deep dive into how the MongoDB devcontainer is structured and how all the pieces work together. This document provides a deep dive into how the MongoDB devcontainer is structured and how all the
pieces work together.
## Table of Contents ## Table of Contents
@ -201,7 +202,8 @@ MongoDB requires specific compiler versions. The toolchain installation process
### Toolchain Configuration ### Toolchain Configuration
The `toolchain_config.env` file contains architecture-specific toolchain definitions for both ARM64 and AMD64: The `toolchain_config.env` file contains architecture-specific toolchain definitions for both ARM64
and AMD64:
```bash ```bash
# Generated by toolchain.py # Generated by toolchain.py
@ -289,7 +291,8 @@ The MongoDB toolchain includes:
### Toolchain Updates ### Toolchain Updates
The toolchain is managed by the MongoDB team. When updates are available, you'll get them automatically when you: The toolchain is managed by the MongoDB team. When updates are available, you'll get them
automatically when you:
- Pull the latest changes from the repository - Pull the latest changes from the repository
- Rebuild your devcontainer - Rebuild your devcontainer

View File

@ -1,10 +1,14 @@
# Customizing Your Dev Container # Customizing Your Dev Container
This guide covers personal customizations you can make to your MongoDB devcontainer **without modifying the repository's devcontainer configuration**. These are user-level settings that only affect your development environment. This guide covers personal customizations you can make to your MongoDB devcontainer **without
modifying the repository's devcontainer configuration**. These are user-level settings that only
affect your development environment.
**Want to modify the devcontainer setup for everyone?** See [Contributing Customizations](#contributing-customizations) at the bottom. **Want to modify the devcontainer setup for everyone?** See
[Contributing Customizations](#contributing-customizations) at the bottom.
**For general VS Code settings** (themes, fonts, keybindings), see the [VS Code documentation](https://code.visualstudio.com/docs/getstarted/settings). **For general VS Code settings** (themes, fonts, keybindings), see the
[VS Code documentation](https://code.visualstudio.com/docs/getstarted/settings).
## Table of Contents ## Table of Contents
@ -76,7 +80,9 @@ This applies to all devcontainers you work with, not just MongoDB.
## Contributing Customizations ## Contributing Customizations
The customizations above are all user-level and don't require changes to the repository. If you want to modify the devcontainer setup itself to benefit all MongoDB developers, you'll need to submit a PR. The customizations above are all user-level and don't require changes to the repository. If you want
to modify the devcontainer setup itself to benefit all MongoDB developers, you'll need to submit a
PR.
**Examples of repository-level customizations:** **Examples of repository-level customizations:**
@ -108,4 +114,5 @@ The customizations above are all user-level and don't require changes to the rep
- [Architecture](./architecture.md) - How devcontainers work - [Architecture](./architecture.md) - How devcontainers work
- [Advanced Usage](./advanced.md) - Multiple containers, backups, workflows - [Advanced Usage](./advanced.md) - Multiple containers, backups, workflows
- [Troubleshooting](./troubleshooting.md) - Fix issues - [Troubleshooting](./troubleshooting.md) - Fix issues
- [VS Code Dev Containers Documentation](https://code.visualstudio.com/docs/devcontainers/containers) - General VS Code features - [VS Code Dev Containers Documentation](https://code.visualstudio.com/docs/devcontainers/containers) -
General VS Code features

View File

@ -6,14 +6,16 @@ Frequently asked questions about MongoDB development with dev containers.
### What is a dev container? ### What is a dev container?
A dev container (development container) is a Docker container configured specifically for development. It includes: A dev container (development container) is a Docker container configured specifically for
development. It includes:
- All build tools and dependencies - All build tools and dependencies
- IDE configuration and extensions - IDE configuration and extensions
- Persistent storage for caches and settings - Persistent storage for caches and settings
- Consistent environment across all developers - Consistent environment across all developers
Think of it as a portable, reproducible development environment that runs on any machine with Docker. Think of it as a portable, reproducible development environment that runs on any machine with
Docker.
[Learn more about dev containers →](https://containers.dev/) [Learn more about dev containers →](https://containers.dev/)
@ -43,11 +45,14 @@ Report issues to help improve it for everyone!
- Pros: Works without SSH keys, simpler for read-only access - Pros: Works without SSH keys, simpler for read-only access
- Cons: May require password/token for push operations - Cons: May require password/token for push operations
See the [Getting Started guide SSH setup section](./getting-started.md#4-configure-ssh-keys-recommended) for details. See the
[Getting Started guide SSH setup section](./getting-started.md#4-configure-ssh-keys-recommended) for
details.
### How do SSH keys work with devcontainers? ### How do SSH keys work with devcontainers?
VS Code automatically forwards your SSH agent to the container, so you don't need to copy keys into the container. VS Code automatically forwards your SSH agent to the container, so you don't need to copy keys into
the container.
**Requirements:** **Requirements:**
@ -65,7 +70,8 @@ ssh-add -l
ssh -T git@github.com ssh -T git@github.com
``` ```
**Inside the container**, Git commands will automatically use your host's SSH keys through agent forwarding. **Inside the container**, Git commands will automatically use your host's SSH keys through agent
forwarding.
[Learn more about SSH agent forwarding →](https://code.visualstudio.com/remote/advancedcontainers/sharing-git-credentials) [Learn more about SSH agent forwarding →](https://code.visualstudio.com/remote/advancedcontainers/sharing-git-credentials)
@ -126,7 +132,8 @@ First-time setup includes:
- WSL2 installed and configured - WSL2 installed and configured
- Docker Desktop with WSL2 integration enabled - Docker Desktop with WSL2 integration enabled
**Important:** Clone repository in WSL2 filesystem (not `/mnt/c/`), not Windows filesystem, for best performance. **Important:** Clone repository in WSL2 filesystem (not `/mnt/c/`), not Windows filesystem, for best
performance.
### Can I use this on Apple Silicon (M1/M2/M3)? ### Can I use this on Apple Silicon (M1/M2/M3)?
@ -161,7 +168,8 @@ docker cp <container_id>:/workspaces/mongo/file.txt ~/Downloads/
**Option 3: Use bind mount** (sacrifices performance) **Option 3: Use bind mount** (sacrifices performance)
Open your existing local repository in VS Code and use "Dev Containers: Reopen in Container". This uses a bind mount which allows direct host filesystem access but is slower, especially on macOS. Open your existing local repository in VS Code and use "Dev Containers: Reopen in Container". This
uses a bind mount which allows direct host filesystem access but is slower, especially on macOS.
### Can I use my existing local clone? ### Can I use my existing local clone?
@ -369,8 +377,7 @@ gcc --version # Should show the MongoDB toolchain GCC version
ls -la ~/.config/engflow_auth/ ls -la ~/.config/engflow_auth/
``` ```
**Re-authenticate:** **Re-authenticate:** Contact MongoDB team for authentication flow.
Contact MongoDB team for authentication flow.
**Build locally instead:** **Build locally instead:**
@ -406,13 +413,15 @@ Allocate as much disk space as you can comfortably spare. We recommend at least
**Allocate as much as possible** while leaving enough for your host OS to function (~4-8 GB). **Allocate as much as possible** while leaving enough for your host OS to function (~4-8 GB).
More RAM = faster builds with more parallel jobs. MongoDB builds are resource-intensive and benefit greatly from additional memory. More RAM = faster builds with more parallel jobs. MongoDB builds are resource-intensive and benefit
greatly from additional memory.
### How many CPU cores should I allocate? ### How many CPU cores should I allocate?
**Allocate as many cores as possible** while leaving a couple for your host OS (1-2 cores). **Allocate as many cores as possible** while leaving a couple for your host OS (1-2 cores).
Bazel parallelizes well; more cores = significantly faster builds. If you have 8+ cores available, MongoDB builds will complete much faster. Bazel parallelizes well; more cores = significantly faster builds. If you have 8+ cores available,
MongoDB builds will complete much faster.
### Can I reduce resource usage? ### Can I reduce resource usage?
@ -437,7 +446,8 @@ bazel clean # Clear build outputs
bazel clean --expunge # Clear everything (reclaim disk space) bazel clean --expunge # Clear everything (reclaim disk space)
``` ```
> **Note:** Reducing resources will make builds slower. If possible, it's better to allocate more resources to Docker instead. > **Note:** Reducing resources will make builds slower. If possible, it's better to allocate more
> resources to Docker instead.
### How do I monitor resource usage? ### How do I monitor resource usage?
@ -492,7 +502,8 @@ But you lose VS Code integration, extensions, and convenience features.
- **Architecture Details**: [architecture.md](./architecture.md) - **Architecture Details**: [architecture.md](./architecture.md)
- **Troubleshooting**: [troubleshooting.md](./troubleshooting.md) - **Troubleshooting**: [troubleshooting.md](./troubleshooting.md)
- **Advanced Topics**: [advanced.md](./advanced.md) - **Advanced Topics**: [advanced.md](./advanced.md)
- **VS Code Docs**: [code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers) - **VS Code Docs**:
[code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers)
### Who do I contact for help? ### Who do I contact for help?

View File

@ -1,16 +1,19 @@
# Getting Started with MongoDB Dev Containers # Getting Started with MongoDB Dev Containers
This guide will walk you through setting up your MongoDB development environment using Dev Containers. This guide will walk you through setting up your MongoDB development environment using Dev
Containers.
## Prerequisites ## Prerequisites
### 1. Install Docker ### 1. Install Docker
Dev Containers require Docker to be installed and running on your system. Choose one of the following Docker providers: Dev Containers require Docker to be installed and running on your system. Choose one of the
following Docker providers:
#### Option A: Rancher Desktop (Recommended) #### Option A: Rancher Desktop (Recommended)
[Rancher Desktop](https://rancherdesktop.io/) is our recommended Docker provider for devcontainer development. [Rancher Desktop](https://rancherdesktop.io/) is our recommended Docker provider for devcontainer
development.
**Installation:** **Installation:**
@ -20,28 +23,34 @@ Dev Containers require Docker to be installed and running on your system. Choose
- **Container Engine**: Select `dockerd (moby)` ⚠️ **Important!** - **Container Engine**: Select `dockerd (moby)` ⚠️ **Important!**
- **Configure Path**: Select "Automatic" - **Configure Path**: Select "Automatic"
**Recommended Settings:** **Recommended Settings:** After installation, increase resources for better build performance:
After installation, increase resources for better build performance:
1. Open Rancher Desktop → Preferences → Virtual Machine 1. Open Rancher Desktop → Preferences → Virtual Machine
2. **Memory**: Allocate as much as your system allows (leave ~4-8 GB for your host OS) 2. **Memory**: Allocate as much as your system allows (leave ~4-8 GB for your host OS)
3. **CPUs**: Allocate as many cores as possible (leave 1-2 for your host OS) 3. **CPUs**: Allocate as many cores as possible (leave 1-2 for your host OS)
4. **Disk**: Rancher Desktop doesn't have a UI for disk size. To increase it, see [Troubleshooting - Increase Docker disk allocation](./troubleshooting.md#build-fails-with-no-space-left-on-device) for instructions. 4. **Disk**: Rancher Desktop doesn't have a UI for disk size. To increase it, see
[Troubleshooting - Increase Docker disk allocation](./troubleshooting.md#build-fails-with-no-space-left-on-device)
for instructions.
5. Apply changes and restart Rancher Desktop 5. Apply changes and restart Rancher Desktop
> **Tip:** More resources = faster builds. MongoDB builds benefit significantly from additional CPU cores and memory. > **Tip:** More resources = faster builds. MongoDB builds benefit significantly from additional CPU
> cores and memory.
**IMPORTANT!**: If you already have VSCode open when you install Rancher Desktop, make sure to restart VSCode otherwise it may not find the Docker socket and VSCode will prompt you to install Docker Desktop instead. **IMPORTANT!**: If you already have VSCode open when you install Rancher Desktop, make sure to
restart VSCode otherwise it may not find the Docker socket and VSCode will prompt you to install
Docker Desktop instead.
#### Option B: Docker Desktop #### Option B: Docker Desktop
[Docker Desktop](https://www.docker.com/products/docker-desktop/) is a popular alternative. [Docker Desktop](https://www.docker.com/products/docker-desktop/) is a popular alternative.
> **Note on Licensing**: Docker Desktop may require a paid license for commercial use. Please review the licensing terms to ensure compliance with your use case. > **Note on Licensing**: Docker Desktop may require a paid license for commercial use. Please review
> the licensing terms to ensure compliance with your use case.
**Installation:** **Installation:**
1. Download from [docker.com/products/docker-desktop](https://www.docker.com/products/docker-desktop/) 1. Download from
[docker.com/products/docker-desktop](https://www.docker.com/products/docker-desktop/)
2. Install and start Docker Desktop 2. Install and start Docker Desktop
3. Go to Settings → Resources and allocate generously: 3. Go to Settings → Resources and allocate generously:
- **Memory**: Allocate as much as possible (leave ~4-8 GB for your host OS) - **Memory**: Allocate as much as possible (leave ~4-8 GB for your host OS)
@ -52,7 +61,8 @@ After installation, increase resources for better build performance:
[OrbStack](https://orbstack.dev/) is a lightweight, fast Docker alternative for macOS. [OrbStack](https://orbstack.dev/) is a lightweight, fast Docker alternative for macOS.
> **Note on Licensing**: OrbStack may require a paid license for commercial use. Please review the licensing terms to ensure compliance with your use case. > **Note on Licensing**: OrbStack may require a paid license for commercial use. Please review the
> licensing terms to ensure compliance with your use case.
**Installation:** **Installation:**
@ -64,12 +74,14 @@ After installation, increase resources for better build performance:
For Linux users, you can use Docker Engine directly. For Linux users, you can use Docker Engine directly.
**Installation:** **Installation:** Follow the official guide:
Follow the official guide: [docs.docker.com/engine/install](https://docs.docker.com/engine/install/) [docs.docker.com/engine/install](https://docs.docker.com/engine/install/)
### 2. Create SSH Directory (Required) ### 2. Create SSH Directory (Required)
> **⚠️ Critical:** You **must** have a `~/.ssh` directory on your host machine before building the devcontainer. The devcontainer requires this directory to exist, regardless of whether you use SSH or HTTPS to clone the repository. > **⚠️ Critical:** You **must** have a `~/.ssh` directory on your host machine before building the
> devcontainer. The devcontainer requires this directory to exist, regardless of whether you use SSH
> or HTTPS to clone the repository.
```bash ```bash
# On your HOST machine (not inside the container) # On your HOST machine (not inside the container)
@ -87,13 +99,17 @@ Download and install VS Code from [code.visualstudio.com](https://code.visualstu
1. Open VS Code 1. Open VS Code
2. Go to Extensions (⌘/Ctrl+Shift+X) 2. Go to Extensions (⌘/Ctrl+Shift+X)
3. Search for "Dev Containers" 3. Search for "Dev Containers"
4. Install the [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension by Microsoft 4. Install the
[Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
extension by Microsoft
### 5. Configure SSH Keys (Recommended) ### 5. Configure SSH Keys (Recommended)
To clone the repository using SSH (recommended for contributors), you'll need SSH keys configured with GitHub. To clone the repository using SSH (recommended for contributors), you'll need SSH keys configured
with GitHub.
> **⚠️ Important:** Run all commands in this section on your **host machine** (not inside the container). SSH keys need to be set up before cloning the repository into the container. > **⚠️ Important:** Run all commands in this section on your **host machine** (not inside the
> container). SSH keys need to be set up before cloning the repository into the container.
#### Check if you have SSH keys #### Check if you have SSH keys
@ -183,7 +199,8 @@ Get-Service ssh-agent | Set-Service -StartupType Automatic
Start-Service ssh-agent Start-Service ssh-agent
``` ```
> **Note:** VS Code automatically forwards your SSH agent to the container, so your keys will be available inside the devcontainer. > **Note:** VS Code automatically forwards your SSH agent to the container, so your keys will be
> available inside the devcontainer.
[Learn more about using SSH keys with GitHub →](https://docs.github.com/en/authentication/connecting-to-github-with-ssh) [Learn more about using SSH keys with GitHub →](https://docs.github.com/en/authentication/connecting-to-github-with-ssh)
@ -191,7 +208,8 @@ Start-Service ssh-agent
### Step 1: Clone Repository in Named Container Volume ### Step 1: Clone Repository in Named Container Volume
For **optimal performance**, especially on macOS, clone the repository directly into a Docker volume rather than your local filesystem. This is crucial for Bazel performance. For **optimal performance**, especially on macOS, clone the repository directly into a Docker volume
rather than your local filesystem. This is crucial for Bazel performance.
#### Why Named Volumes? #### Why Named Volumes?
@ -397,7 +415,8 @@ ssh-add ~/.ssh/id_ed25519
# Command Palette → "Dev Containers: Rebuild Container" # Command Palette → "Dev Containers: Rebuild Container"
``` ```
**VS Code SSH Agent Forwarding**: The Dev Containers extension automatically forwards your SSH agent, but this requires: **VS Code SSH Agent Forwarding**: The Dev Containers extension automatically forwards your SSH
agent, but this requires:
- SSH agent running on host with keys loaded - SSH agent running on host with keys loaded
- SSH key files in default location (`~/.ssh/`) - SSH key files in default location (`~/.ssh/`)

View File

@ -28,7 +28,8 @@ Docker version <version> or later is required
**Solution** **Solution**
Restart VSCode. If you install Rancher Desktop while you already have VSCode open, it doesn't properly detect the Docker socket and prompts you to install Docker Desktop by mistake. Restart VSCode. If you install Rancher Desktop while you already have VSCode open, it doesn't
properly detect the Docker socket and prompts you to install Docker Desktop by mistake.
## Container Build Issues ## Container Build Issues
@ -48,7 +49,9 @@ Error response from daemon: invalid mount config for type "bind": bind source pa
**Root Cause:** **Root Cause:**
The devcontainer configuration mounts your `~/.ssh` directory to enable Git operations over SSH. If this directory doesn't exist on your host machine, the container fails to start. **This directory is required even if you plan to use HTTPS instead of SSH for cloning.** The devcontainer configuration mounts your `~/.ssh` directory to enable Git operations over SSH. If
this directory doesn't exist on your host machine, the container fails to start. **This directory is
required even if you plan to use HTTPS instead of SSH for cloning.**
**Solutions:** **Solutions:**
@ -73,7 +76,8 @@ SSH agent forwarding behavior varies by Docker provider on macOS:
- With dockerd runtime: Automatic agent forwarding - With dockerd runtime: Automatic agent forwarding
- With containerd runtime: Agent forwarding requires additional setup - With containerd runtime: Agent forwarding requires additional setup
To use SSH agent forwarding, ensure your SSH keys are added to your host's SSH agent before starting the container: To use SSH agent forwarding, ensure your SSH keys are added to your host's SSH agent before starting
the container:
```bash ```bash
ssh-add ~/.ssh/id_ed25519 # or your key name ssh-add ~/.ssh/id_ed25519 # or your key name
@ -117,7 +121,8 @@ Error: failed to solve: write /var/lib/docker/...: no space left on device
disk: 100GB disk: 100GB
``` ```
4. Start Rancher Desktop 4. Start Rancher Desktop
5. If Rancher Desktop was previously initialized, you may need to perform a factory reset (Preferences → Troubleshooting → Reset Kubernetes) for the disk size change to take effect. 5. If Rancher Desktop was previously initialized, you may need to perform a factory reset
(Preferences → Troubleshooting → Reset Kubernetes) for the disk size change to take effect.
**On Windows (WSL2):** **On Windows (WSL2):**
@ -125,7 +130,8 @@ Error: failed to solve: write /var/lib/docker/...: no space left on device
1. Stop Rancher Desktop 1. Stop Rancher Desktop
2. Run: `wsl --shutdown` 2. Run: `wsl --shutdown`
3. Follow Microsoft's guide to increase WSL2 disk size: https://learn.microsoft.com/en-us/windows/wsl/disk-space 3. Follow Microsoft's guide to increase WSL2 disk size:
https://learn.microsoft.com/en-us/windows/wsl/disk-space
**Docker Desktop:** **Docker Desktop:**
@ -174,7 +180,8 @@ Error: Failed to download toolchain
curl -I "$(grep TOOLCHAIN_URL .devcontainer/toolchain_config.env | cut -d'"' -f2)" curl -I "$(grep TOOLCHAIN_URL .devcontainer/toolchain_config.env | cut -d'"' -f2)"
``` ```
3. **If toolchain URL is broken**, report it to the MongoDB team. This is a devcontainer configuration issue that needs to be fixed upstream. 3. **If toolchain URL is broken**, report it to the MongoDB team. This is a devcontainer
configuration issue that needs to be fixed upstream.
### Build Fails with Checksum Mismatch ### Build Fails with Checksum Mismatch
@ -203,7 +210,8 @@ Got: def456...
# Command Palette → "Dev Containers: Rebuild Container Without Cache" # Command Palette → "Dev Containers: Rebuild Container Without Cache"
``` ```
3. **If problem persists**, this is likely a devcontainer configuration issue - report it to the MongoDB team. 3. **If problem persists**, this is likely a devcontainer configuration issue - report it to the
MongoDB team.
### Container Fails to Start ### Container Fails to Start
@ -288,11 +296,9 @@ Got: def456...
- File save is delayed - File save is delayed
- Terminal autocomplete is slow - Terminal autocomplete is slow
**Root Cause:** **Root Cause:** Bind mounts on macOS use osxfs which has high latency for filesystem operations.
Bind mounts on macOS use osxfs which has high latency for filesystem operations.
**Solution:** **Solution:** ✅ **Use named volumes instead of bind mounts** (see Getting Started guide)
**Use named volumes instead of bind mounts** (see Getting Started guide)
### High CPU Usage ### High CPU Usage
@ -517,7 +523,8 @@ fatal: Could not read from remote repository.
ssh-add ~/.ssh/id_ed25519 # or id_rsa ssh-add ~/.ssh/id_ed25519 # or id_rsa
``` ```
See [Getting Started - SSH Setup](./getting-started.md#4-configure-ssh-keys-recommended) for detailed instructions. See [Getting Started - SSH Setup](./getting-started.md#4-configure-ssh-keys-recommended) for
detailed instructions.
### SSH Works on Host But Not in Container ### SSH Works on Host But Not in Container
@ -527,8 +534,7 @@ See [Getting Started - SSH Setup](./getting-started.md#4-configure-ssh-keys-reco
- Same operations fail inside devcontainer - Same operations fail inside devcontainer
- "Permission denied" or asks for password - "Permission denied" or asks for password
**Root Cause:** **Root Cause:** SSH agent forwarding isn't working properly.
SSH agent forwarding isn't working properly.
**Solutions:** **Solutions:**
@ -633,8 +639,7 @@ git config --global credential.helper store
# Next time you enter credentials, they'll be saved # Next time you enter credentials, they'll be saved
``` ```
**Option 3: Fix SSH agent forwarding**: **Option 3: Fix SSH agent forwarding**: See "SSH Works on Host But Not in Container" section above.
See "SSH Works on Host But Not in Container" section above.
### Multiple SSH Keys (Personal + Work) ### Multiple SSH Keys (Personal + Work)
@ -868,8 +873,7 @@ ModuleNotFoundError: No module named 'pymongo'
- History cleared - History cleared
- Python venv empty - Python venv empty
**Root Cause:** **Root Cause:** Volumes not mounting correctly
Volumes not mounting correctly
**Solutions:** **Solutions:**
@ -917,8 +921,8 @@ docker cp <container_id>:/workspaces/mongo/file.txt ~/Downloads/
# Right-click file → Download... # Right-click file → Download...
``` ```
**To edit with external tools:** **To edit with external tools:** Use bind mounts instead of named volumes (but sacrifices
Use bind mounts instead of named volumes (but sacrifices performance). performance).
### Volume Fills Up Disk ### Volume Fills Up Disk
@ -1070,8 +1074,7 @@ permission denied while trying to connect to Docker daemon
- Slow builds - Slow builds
- Out of memory errors - Out of memory errors
**Solution:** **Solution:** Go to Docker Desktop → Settings → Resources and allocate generously:
Go to Docker Desktop → Settings → Resources and allocate generously:
- **CPUs**: Allocate as many as possible (leave 1-2 for host OS) - **CPUs**: Allocate as many as possible (leave 1-2 for host OS)
- **Memory**: Allocate as much as possible (leave ~4-8 GB for host OS) - **Memory**: Allocate as much as possible (leave ~4-8 GB for host OS)
@ -1087,8 +1090,7 @@ Go to Docker Desktop → Settings → Resources and allocate generously:
- Docker-outside-of-docker doesn't work - Docker-outside-of-docker doesn't work
- Volume mounts fail - Volume mounts fail
**Solution:** **Solution:** OrbStack has some limitations with devcontainer features. Try:
OrbStack has some limitations with devcontainer features. Try:
1. Update to latest OrbStack version 1. Update to latest OrbStack version
2. Check OrbStack documentation for devcontainer compatibility 2. Check OrbStack documentation for devcontainer compatibility
@ -1177,7 +1179,8 @@ cd mongo
If your issue isn't covered here: If your issue isn't covered here:
1. **Check VS Code Docs**: [code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers) 1. **Check VS Code Docs**:
[code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers)
2. **Search Issues**: MongoDB GitHub repository issues 2. **Search Issues**: MongoDB GitHub repository issues
3. **Ask the Team**: MongoDB developers Slack/chat 3. **Ask the Team**: MongoDB developers Slack/chat
4. **File a Bug**: Include: 4. **File a Bug**: Include:

View File

@ -1,26 +1,95 @@
# Egress Networking # Egress Networking
Egress networking entails outbound communication (i.e. requests) from a client process to a server process (e.g. _mongod_), as well as inbound communication (i.e. responses) from such a server process back to a client process. Egress networking entails outbound communication (i.e. requests) from a client process to a server
process (e.g. _mongod_), as well as inbound communication (i.e. responses) from such a server
process back to a client process.
## Remote Commands ## Remote Commands
A remote command represents an exchange of data between a client and a server. A remote command consists of two steps: a request, which the clients sends to the server, and a response, which the client receives from the server. These elements are represented by the [request][remote_command_request_h] and [response][remote_command_response_h] objects; each wraps the BSON that represents the on-wire transacted data and metadata that describes the context of the command, such as the host that the command targets. Each object also contains metadata that corresponds to its half of the command lifecycle. For example, the request object notes the timeout of the command and the operation's unique identifier, among other fields, and the response object notes the final disposition of the command's data exchange as a `Status` object (which takes no position on the success of the command's semantics at the remote) and the time that the command actually took to execute, among other fields. In the case of an exhaust command, there may be multiple responses for a single request. A remote command represents an exchange of data between a client and a server. A remote command
consists of two steps: a request, which the clients sends to the server, and a response, which the
client receives from the server. These elements are represented by the
[request][remote_command_request_h] and [response][remote_command_response_h] objects; each wraps
the BSON that represents the on-wire transacted data and metadata that describes the context of the
command, such as the host that the command targets. Each object also contains metadata that
corresponds to its half of the command lifecycle. For example, the request object notes the timeout
of the command and the operation's unique identifier, among other fields, and the response object
notes the final disposition of the command's data exchange as a `Status` object (which takes no
position on the success of the command's semantics at the remote) and the time that the command
actually took to execute, among other fields. In the case of an exhaust command, there may be
multiple responses for a single request.
## Connection Pooling ## Connection Pooling
The [executor::ConnectionPool][connection_pool_h] class is responsible for pooling connections to any number of hosts. It contains zero or more `ConnectionPool::SpecificPool` objects, each of which pools connections for a unique host, and exactly one `ConnectionPool::ControllerInterface` object, which is responsible for the addition, removal, and updating of `SpecificPool`s to, from, and in its owning `ConnectionPool`. When a caller requests a connection to a host from the `ConnectionPool`, the `ConnectionPool` creates a new `SpecificPool` to pool connections for that host if one does not exist already, and then the `ConnectionPool` forwards the request to the `SpecificPool`. A `SpecificPool` expires when its `hostTimeout` has passed without any connection requests, after which time it becomes unusable; further requests for connections to that host will trigger the creation of a fresh `SpecificPool`. The [executor::ConnectionPool][connection_pool_h] class is responsible for pooling connections to
any number of hosts. It contains zero or more `ConnectionPool::SpecificPool` objects, each of which
pools connections for a unique host, and exactly one `ConnectionPool::ControllerInterface` object,
which is responsible for the addition, removal, and updating of `SpecificPool`s to, from, and in its
owning `ConnectionPool`. When a caller requests a connection to a host from the `ConnectionPool`,
the `ConnectionPool` creates a new `SpecificPool` to pool connections for that host if one does not
exist already, and then the `ConnectionPool` forwards the request to the `SpecificPool`. A
`SpecificPool` expires when its `hostTimeout` has passed without any connection requests, after
which time it becomes unusable; further requests for connections to that host will trigger the
creation of a fresh `SpecificPool`.
The final result of a successful connection request made through `ConnectionPool::getConnection` is a `ConnectionPool::ConnectionInterface`, which represents a connection ready for use. Externally, the `ConnectionInterface` is primarily used by the caller to exchange data with its remote host. Callers return `ConnectionInterface`s to the pool by allowing them to destruct and callers must signal to the pool the final disposition of the connection beforehand through the `indicate*` family of methods. `ConnectionInterface`s also support setting timers to schedule future activities. Internally, the `ConnectionInterface` is used to prepare the connection for data exchange before transferring ownership to the caller and refreshing the health of a connection when the caller returns the connection to the pool. `ConnectionInterface` also maintains a notion of generation, which is implemented as a monotonically-incrementing counter. When a caller returns a `ConnectionInterface` to a `ConnectionPool` from a generation prior to the current generation of the corresponding `SpecificPool`, the connection is dropped. The current generation of a `SpecificPool` is incremented when the pool experiences certain failures (e.g., when to establish a new connection). `ConnectionPool` also drops a connection if the caller called `indicateFailure` on the connection before returning it. `ConnectionPool` uses a global mutex for access to `SpecificPool`s as well as generation counters. The final result of a successful connection request made through `ConnectionPool::getConnection` is
a `ConnectionPool::ConnectionInterface`, which represents a connection ready for use. Externally,
the `ConnectionInterface` is primarily used by the caller to exchange data with its remote host.
Callers return `ConnectionInterface`s to the pool by allowing them to destruct and callers must
signal to the pool the final disposition of the connection beforehand through the `indicate*` family
of methods. `ConnectionInterface`s also support setting timers to schedule future activities.
Internally, the `ConnectionInterface` is used to prepare the connection for data exchange before
transferring ownership to the caller and refreshing the health of a connection when the caller
returns the connection to the pool. `ConnectionInterface` also maintains a notion of generation,
which is implemented as a monotonically-incrementing counter. When a caller returns a
`ConnectionInterface` to a `ConnectionPool` from a generation prior to the current generation of the
corresponding `SpecificPool`, the connection is dropped. The current generation of a `SpecificPool`
is incremented when the pool experiences certain failures (e.g., when to establish a new
connection). `ConnectionPool` also drops a connection if the caller called `indicateFailure` on the
connection before returning it. `ConnectionPool` uses a global mutex for access to `SpecificPool`s
as well as generation counters.
`ConnectionPool` uses its single instance of `EgressConnectionCloserManager` to determine when hosts should be dropped. The manager consists of multiple `EgressConnectionClosers`, which are used to determine whether hosts should be dropped. In the context of the ConnectionPool, the manager's purpose is to drop _connections_ to hosts based on whether they have been marked as keep open or not. `ConnectionPool` uses its single instance of `EgressConnectionCloserManager` to determine when hosts
should be dropped. The manager consists of multiple `EgressConnectionClosers`, which are used to
determine whether hosts should be dropped. In the context of the ConnectionPool, the manager's
purpose is to drop _connections_ to hosts based on whether they have been marked as keep open or
not.
## Internal Network Clients ## Internal Network Clients
Client-side outbound communication in egress networking is primarily handled by the [AsyncDBClient class][async_client_h]. The async client is responsible for initializing a connection to a particular host as well as initializing the [wire protocol][wire_protocol] for client-server communication, after which remote requests can be sent by the client and corresponding remote responses from a database can subsequently be received. In setting up the wire protocol, the async client sends an [isMaster][is_master] request to the server and parses the server's isMaster response to ensure that the status of the connection is OK. An initial isMaster request is constructed in the legacy OP_QUERY protocol, so that clients can still communicate with servers that may not support other protocols. The async client also supports client authentication functionality (i.e. authenticating a user's credentials, client host, remote host, etc.). Client-side outbound communication in egress networking is primarily handled by the [AsyncDBClient
class][async_client_h]. The async client is responsible for initializing a connection to a
particular host as well as initializing the [wire protocol][wire_protocol] for client-server
communication, after which remote requests can be sent by the client and corresponding remote
responses from a database can subsequently be received. In setting up the wire protocol, the async
client sends an [isMaster][is_master] request to the server and parses the server's isMaster
response to ensure that the status of the connection is OK. An initial isMaster request is
constructed in the legacy OP_QUERY protocol, so that clients can still communicate with servers that
may not support other protocols. The async client also supports client authentication functionality
(i.e. authenticating a user's credentials, client host, remote host, etc.).
The scheduling of requests is managed by the [task executor][task_executor_h], which maintains the notion of **events** and **callbacks**. Callbacks represent work (e.g. remote requests) that is to be executed by the executor, and are scheduled by client threads as well as other callbacks. There are several variations of work scheduling methods, which include: immediate scheduling, scheduling no earlier than a specified time, and scheduling iff a specified event has been signalled. These methods return a handle that can be used while the executor is still in scope for either waiting on or cancelling the scheduled callback in question. If a scheduled callback is cancelled, it remains on the work queue and is technically still run, but is labeled as having been 'cancelled' beforehand. Once a given callback/request is scheduled, the task executor is then able to execute such requests via a [network interface][network_interface_h]. The network interface, connected to a particular host/server, begins the asynchronous execution of commands specified via a request bundled in the aforementioned callback handle. The interface is capable of blocking threads until its associated task executor has work that needs to be performed, and is likewise able to return from an idle state when it receives a signal that the executor has new work to process. The scheduling of requests is managed by the [task executor][task_executor_h], which maintains the
notion of **events** and **callbacks**. Callbacks represent work (e.g. remote requests) that is to
be executed by the executor, and are scheduled by client threads as well as other callbacks. There
are several variations of work scheduling methods, which include: immediate scheduling, scheduling
no earlier than a specified time, and scheduling iff a specified event has been signalled. These
methods return a handle that can be used while the executor is still in scope for either waiting on
or cancelling the scheduled callback in question. If a scheduled callback is cancelled, it remains
on the work queue and is technically still run, but is labeled as having been 'cancelled'
beforehand. Once a given callback/request is scheduled, the task executor is then able to execute
such requests via a [network interface][network_interface_h]. The network interface, connected to a
particular host/server, begins the asynchronous execution of commands specified via a request
bundled in the aforementioned callback handle. The interface is capable of blocking threads until
its associated task executor has work that needs to be performed, and is likewise able to return
from an idle state when it receives a signal that the executor has new work to process.
Client-side legacy networking draws upon the `DBClientBase` class, of which there are multiple subclasses residing in the `src/mongo/client` folder. The [replica set DBClient][dbclient_rs_h] discerns which one of multiple servers in a replica set is the primary at construction time, and establishes a connection (using the `DBClientConnection` wrapper class, also extended from `DBClientBase`) with the replica set via the primary. In cases where the primary server is unresponsive within a specified time range, the RS DBClient will automatically attempt to establish a secondary server as the new primary (see [automatic failover][automatic_failover]). Client-side legacy networking draws upon the `DBClientBase` class, of which there are multiple
subclasses residing in the `src/mongo/client` folder. The [replica set DBClient][dbclient_rs_h]
discerns which one of multiple servers in a replica set is the primary at construction time, and
establishes a connection (using the `DBClientConnection` wrapper class, also extended from
`DBClientBase`) with the replica set via the primary. In cases where the primary server is
unresponsive within a specified time range, the RS DBClient will automatically attempt to establish
a secondary server as the new primary (see [automatic failover][automatic_failover]).
## See Also ## See Also

View File

@ -3,26 +3,26 @@
## What it is ## What it is
Similar to [burn_in_tests](burn_in_tests.md), `burn_in_tags` also detects the javascript tests Similar to [burn_in_tests](burn_in_tests.md), `burn_in_tags` also detects the javascript tests
(under the [jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) (under the [jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) that are new
that are new or have changed since the last git command and then runs those tests in repeated or have changed since the last git command and then runs those tests in repeated mode to validate
mode to validate their stability. But instead of running the tests on their original build their stability. But instead of running the tests on their original build variants, `burn_in_tags`
variants, `burn_in_tags` runs them on the burn_in build variants that are generated separately. runs them on the burn_in build variants that are generated separately.
## How to use it ## How to use it
You can use `burn_in_tags` on evergreen by selecting the `burn_in_tags_gen` task when creating a patch. You can use `burn_in_tags` on evergreen by selecting the `burn_in_tags_gen` task when creating a
The burn_in build variants, i.e., `enterprise-rhel-8-64-bit-inmem` and `enterprise-rhel-8-64-bit-multiversion` patch. The burn_in build variants, i.e., `enterprise-rhel-8-64-bit-inmem` and
will be generated, each of which will have a `burn_in_tests` task generated by the `enterprise-rhel-8-64-bit-multiversion` will be generated, each of which will have a `burn_in_tests`
[mongo-task-generator](https://github.com/mongodb/mongo-task-generator). `burn_in_tests` task, a task generated by the [mongo-task-generator](https://github.com/mongodb/mongo-task-generator).
[generated task](task_generation.md), may have multiple sub-tasks which run the test suites only for the `burn_in_tests` task, a [generated task](task_generation.md), may have multiple sub-tasks which run
new or changed javascript tests (note that a javascript test can be included in multiple test suites). Each of the test suites only for the new or changed javascript tests (note that a javascript test can be
those tests will be run 2 times minimum, and 1000 times maximum or for 10 minutes, whichever is reached first. included in multiple test suites). Each of those tests will be run 2 times minimum, and 1000 times
maximum or for 10 minutes, whichever is reached first.
## ! Run All Affected JStests ## ! Run All Affected JStests
The `! Run All Affected JStests` variant has a single `burn_in_tags_gen` task. This task will create & The `! Run All Affected JStests` variant has a single `burn_in_tags_gen` task. This task will create
activate [`burn_in_tests`](burn_in_tests.md) tasks for all required and suggested & activate [`burn_in_tests`](burn_in_tests.md) tasks for all required and suggested variants. The
variants. The end result is that any jstests that have been modified in the patch will end result is that any jstests that have been modified in the patch will run on all required and
run on all required and suggested variants. This should give users a clear signal on suggested variants. This should give users a clear signal on whether their jstests changes have
whether their jstests changes have introduced a failure that could potentially lead introduced a failure that could potentially lead to a revert or follow-up bug fix commit.
to a revert or follow-up bug fix commit.

View File

@ -3,19 +3,21 @@
## What it is ## What it is
`burn_in_tests` detects the javascript tests (under the `burn_in_tests` detects the javascript tests (under the
[jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) that are new or have changed [jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) that are new or have
since the last git command and then runs those tests in repeated mode to validate their stability. changed since the last git command and then runs those tests in repeated mode to validate their
stability.
## How to use it ## How to use it
You can use `burn_in_tests` on evergreen by selecting the `burn_in_tests_gen` task when creating a patch, You can use `burn_in_tests` on evergreen by selecting the `burn_in_tests_gen` task when creating a
since `burn_in_tests` task is a [generated task](task_generation.md) generated by the patch, since `burn_in_tests` task is a [generated task](task_generation.md) generated by the
[mongo-task-generator](https://github.com/mongodb/mongo-task-generator). [mongo-task-generator](https://github.com/mongodb/mongo-task-generator). `burn_in_tests` task will
`burn_in_tests` task will be generated on each of the applicable build variants, and be generated on each of the applicable build variants, and may have multiple sub-tasks which run the
may have multiple sub-tasks which run the test suites only for the new or changed javascript tests (note test suites only for the new or changed javascript tests (note that a javascript test can be
that a javascript test can be included in multiple test suites). Each of those tests will be run 2 times included in multiple test suites). Each of those tests will be run 2 times minimum, and 1000 times
minimum, and 1000 times maximum or for 10 minutes, whichever is reached first. maximum or for 10 minutes, whichever is reached first.
You can also use `burn_in_tests` locally from within the [mongo repo](https://github.com/mongodb/mongo) You can also use `burn_in_tests` locally from within the
by running the script `python buildscripts/burn_in_tests.py`. For more information about this usage, you can [mongo repo](https://github.com/mongodb/mongo) by running the script
run `python buildscripts/burn_in_tests.py --help`. `python buildscripts/burn_in_tests.py`. For more information about this usage, you can run
`python buildscripts/burn_in_tests.py --help`.

View File

@ -34,37 +34,37 @@ For some of the versions we are using such generic names as `latest`, `last-lts`
- `latest` - the current version. In Evergreen, the version that was compiled in the current build. - `latest` - the current version. In Evergreen, the version that was compiled in the current build.
- `last-lts` - the latest LTS (Long Term Support) Major release version. In Evergreen, the version - `last-lts` - the latest LTS (Long Term Support) Major release version. In Evergreen, the version
that was downloaded from the last LTS release branch project. It resolves to an entry that was downloaded from the last LTS release branch project. It resolves to an entry in
in `longTermSupportReleases` of [releases.yml](../../src/mongo/util/version/releases.yml). `longTermSupportReleases` of [releases.yml](../../src/mongo/util/version/releases.yml).
- `last-continuous` - the latest Rapid release version. In Evergreen, the version that was - `last-continuous` - the latest Rapid release version. In Evergreen, the version that was
downloaded from the Rapid release branch project. It resolves to the entry in downloaded from the Rapid release branch project. It resolves to the entry in
`featureCompatibilityVersions` of [releases.yml](../../src/mongo/util/version/releases.yml) `featureCompatibilityVersions` of [releases.yml](../../src/mongo/util/version/releases.yml) that
that looks older than the output of `git describe`. Will not be tested against if it is listed in looks older than the output of `git describe`. Will not be tested against if it is listed in
`eolVersions` as being end of life. `eolVersions` as being end of life.
Note: The latest release.yml file from master is always used, even fetched remotely when on another branch. Note: The latest release.yml file from master is always used, even fetched remotely when on another
branch.
### Old vs new ### Old vs new
Many multiversion tasks are running tests against `latest`/`last-lts` or `latest`/`last-continuous` Many multiversion tasks are running tests against `latest`/`last-lts` or `latest`/`last-continuous`
versions. In such context we refer to `last-lts` and `last-continuous` versions as the `old` versions. In such context we refer to `last-lts` and `last-continuous` versions as the `old` version
version and to `latest` as a `new` version. and to `latest` as a `new` version.
A `new` version is compiled in the same way as for non-multiversion tasks. The `old` versions of A `new` version is compiled in the same way as for non-multiversion tasks. The `old` versions of
compiled binaries are downloaded from the old branch projects with compiled binaries are downloaded from the old branch projects with
[`db-contrib-tool`](https://github.com/10gen/db-contrib-tool). [`db-contrib-tool`](https://github.com/10gen/db-contrib-tool). `db-contrib-tool` searches for the
`db-contrib-tool` searches for the latest available compiled binaries on the old branch projects in latest available compiled binaries on the old branch projects in Evergreen.
Evergreen.
### Explicit and Implicit multiversion suites ### Explicit and Implicit multiversion suites
Multiversion suites can be explicit and implicit. Multiversion suites can be explicit and implicit.
- Explicit - JS tests are aware of the binary versions they are running, - Explicit - JS tests are aware of the binary versions they are running, e.g.
e.g. [multiversion.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/multiversion.yml). [multiversion.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/multiversion.yml).
The version of binaries is explicitly set in JS tests, The version of binaries is explicitly set in JS tests, e.g.
e.g. [jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js](https://github.com/mongodb/mongo/blob/397c8da541940b3fbe6257243f97a342fe7e0d3b/jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js#L33-L44): [jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js](https://github.com/mongodb/mongo/blob/397c8da541940b3fbe6257243f97a342fe7e0d3b/jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js#L33-L44):
```js ```js
const versions = [ const versions = [
@ -101,8 +101,8 @@ const versions = [
]; ];
``` ```
- Implicit - JS tests know nothing about the binary versions they are running, - Implicit - JS tests know nothing about the binary versions they are running, e.g.
e.g. [retryable_writes_downgrade.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/retryable_writes_downgrade.yml). [retryable_writes_downgrade.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/retryable_writes_downgrade.yml).
Most of the implicit multiversion suites are using matrix suites, e.g. `replica_sets_last_lts`: Most of the implicit multiversion suites are using matrix suites, e.g. `replica_sets_last_lts`:
```bash ```bash
@ -134,7 +134,8 @@ test_kind: js_test
In implicit multiversion suites the version of binaries is defined on the resmoke fixture level. In implicit multiversion suites the version of binaries is defined on the resmoke fixture level.
The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L5-L8) The
[example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L5-L8)
of replica set fixture configuration override: of replica set fixture configuration override:
```yaml ```yaml
@ -144,7 +145,8 @@ fixture:
mixed_bin_versions: new_new_old mixed_bin_versions: new_new_old
``` ```
The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L53-L57) The
[example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L53-L57)
of sharded cluster fixture configuration override: of sharded cluster fixture configuration override:
```yaml ```yaml
@ -155,7 +157,8 @@ fixture:
mixed_bin_versions: new_old_old_new mixed_bin_versions: new_old_old_new
``` ```
The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L139-L145) The
[example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L139-L145)
of shell fixture configuration override: of shell fixture configuration override:
```yaml ```yaml
@ -171,20 +174,25 @@ value:
### Version combinations ### Version combinations
In implicit multiversion suites the same set of tests may run in similar suites that are using In implicit multiversion suites the same set of tests may run in similar suites that are using
various mixed version combinations. Those version combinations depend on the type of resmoke various mixed version combinations. Those version combinations depend on the type of resmoke fixture
fixture the suite is running with. These are the recommended version combinations to test against based on the suite fixtures: the suite is running with. These are the recommended version combinations to test against based on
the suite fixtures:
- Replica set fixture combinations: - Replica set fixture combinations:
- `last-lts new-new-old` (i.e. suite runs the replica set fixture that spins up the `latest` and - `last-lts new-new-old` (i.e. suite runs the replica set fixture that spins up the `latest` and
the `last-lts` versions in a 3-node replica set where the 1st node is the `latest`, 2nd - `latest`, the `last-lts` versions in a 3-node replica set where the 1st node is the `latest`, 2nd -
3rd - `last-lts`, etc.) `latest`, 3rd - `last-lts`, etc.)
- `last-lts new-old-new` - `last-lts new-old-new`
- `last-lts old-new-new` - `last-lts old-new-new`
- `last-continuous new-new-old` - `last-continuous new-new-old`
- `last-continuous new-old-new` - `last-continuous new-old-new`
- `last-continuous old-new-new` - `last-continuous old-new-new`
- Ex: [change_streams](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml) uses a [`ReplicaSetFixture`](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml#L50) so the corresponding multiversion suites are - Ex:
[change_streams](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml)
uses a
[`ReplicaSetFixture`](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml#L50)
so the corresponding multiversion suites are
- [`change_streams_last_continuous_new_new_old`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_new_old.yml) - [`change_streams_last_continuous_new_new_old`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_new_old.yml)
- [`change_streams_last_continuous_new_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_old_new.yml) - [`change_streams_last_continuous_new_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_old_new.yml)
- [`change_streams_last_continuous_old_new_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_old_new_new.yml) - [`change_streams_last_continuous_old_new_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_old_new_new.yml)
@ -199,7 +207,11 @@ fixture the suite is running with. These are the recommended version combination
replica sets per shard where the 1st node of the 1st shard is the `latest`, 2nd node of 1st replica sets per shard where the 1st node of the 1st shard is the `latest`, 2nd node of 1st
shard - `last-lts`, 1st node of 2nd shard - `last-lts`, 2nd node of 2nd shard - `latest`, etc.) shard - `last-lts`, 1st node of 2nd shard - `last-lts`, 2nd node of 2nd shard - `latest`, etc.)
- `last-continuous new-old-old-new` - `last-continuous new-old-old-new`
- Ex: [change_streams_downgrade](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml) uses a [`ShardedClusterFixture`](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml#L408) so the corresponding multiversion suites are - Ex:
[change_streams_downgrade](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml)
uses a
[`ShardedClusterFixture`](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml#L408)
so the corresponding multiversion suites are
- [`change_streams_downgrade_last_continuous_new_old_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_continuous_new_old_old_new.yml) - [`change_streams_downgrade_last_continuous_new_old_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_continuous_new_old_old_new.yml)
- [`change_streams_downgrade_last_lts_new_old_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_lts_new_old_old_new.yml) - [`change_streams_downgrade_last_lts_new_old_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_lts_new_old_old_new.yml)
@ -207,18 +219,21 @@ fixture the suite is running with. These are the recommended version combination
- `last-lts` (i.e. suite runs the shell fixture that spins up `last-lts` as the `old` versions, - `last-lts` (i.e. suite runs the shell fixture that spins up `last-lts` as the `old` versions,
etc.) etc.)
- `last-continuous` - `last-continuous`
- Ex: [initial_sync_fuzzer](https://github.com/mongodb/mongo/blob/908625ffdec050a71aa2ce47c35788739f629c60/buildscripts/resmokeconfig/suites/initial_sync_fuzzer.yml) uses a Shell Fixture, so the corresponding multiversion suites are - Ex:
[initial_sync_fuzzer](https://github.com/mongodb/mongo/blob/908625ffdec050a71aa2ce47c35788739f629c60/buildscripts/resmokeconfig/suites/initial_sync_fuzzer.yml)
uses a Shell Fixture, so the corresponding multiversion suites are
- [`initial_sync_fuzzer_last_lts`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_lts.yml) - [`initial_sync_fuzzer_last_lts`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_lts.yml)
- [`initial_sync_fuzzer_last_continuous`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_continuous.yml) - [`initial_sync_fuzzer_last_continuous`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_continuous.yml)
If `last-lts` and `last-continuous` versions happen to be the same, or last-continuous is EOL, we skip `last-continuous` If `last-lts` and `last-continuous` versions happen to be the same, or last-continuous is EOL, we
and run multiversion suites with only `last-lts` combinations in Evergreen. skip `last-continuous` and run multiversion suites with only `last-lts` combinations in Evergreen.
## Working with multiversion tasks in Evergreen ## Working with multiversion tasks in Evergreen
### Multiversion task generation ### Multiversion task generation
Please refer to mongo-task-generator [documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#multiversion-testing) Please refer to mongo-task-generator
[documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#multiversion-testing)
for generating multiversion tasks in Evergreen. for generating multiversion tasks in Evergreen.
### Exclude tests from multiversion testing ### Exclude tests from multiversion testing
@ -240,20 +255,21 @@ multiversion where `XX` is the version number, e.g. `requires_fcv_70` stands for
``` ```
Tests with `requires_fcv_XX` tags are excluded from multiversion tasks that may run the versions Tests with `requires_fcv_XX` tags are excluded from multiversion tasks that may run the versions
below the specified FCV version, e.g. when the `latest` version is `6.2`, `last-continuous` is below the specified FCV version, e.g. when the `latest` version is `6.2`, `last-continuous` is `6.1`
`6.1` and `last-lts` is `6.0`, tests tagged with `requires_fcv_61` will NOT run in multiversion and `last-lts` is `6.0`, tests tagged with `requires_fcv_61` will NOT run in multiversion tasks that
tasks that run `latest` with `last-lts`, but will run in multiversion tasks that run `lastest` with run `latest` with `last-lts`, but will run in multiversion tasks that run `lastest` with
`last-continuous`. `last-continuous`.
In addition to disabling multiversion tests based on FCV, there is no need to run in-development `featureFlagXYZ` tests In addition to disabling multiversion tests based on FCV, there is no need to run in-development
(featureFlags that have `default: false`) because these tests will most likely fail on older versions that `featureFlagXYZ` tests (featureFlags that have `default: false`) because these tests will most
have not implemented this feature. For multiversion tasks, we pass the `--runNoFeatureFlagTests` flag to avoid these likely fail on older versions that have not implemented this feature. For multiversion tasks, we
failures on `all feature flag` variants. pass the `--runNoFeatureFlagTests` flag to avoid these failures on `all feature flag` variants.
For more info on FCV, take a look at [FCV_AND_FEATURE_FLAG_README.md](https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/FCV_AND_FEATURE_FLAG_README.md). For more info on FCV, take a look at
[FCV_AND_FEATURE_FLAG_README.md](https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/FCV_AND_FEATURE_FLAG_README.md).
Another common case could be that the changes on master branch are breaking multiversion tests, Another common case could be that the changes on master branch are breaking multiversion tests, but
but with those changes backported to the older branches the multiversion tests should work. with those changes backported to the older branches the multiversion tests should work. In order to
In order to temporarily disable the test from running in multiversion it can be added to the temporarily disable the test from running in multiversion it can be added to the
[etc/backports_required_for_multiversion_tests.yml](https://github.com/mongodb/mongo/blob/fcdfe29cee066278b94ea2749456fc433cc398c6/etc/backports_required_for_multiversion_tests.yml#L1-L19). [etc/backports_required_for_multiversion_tests.yml](https://github.com/mongodb/mongo/blob/fcdfe29cee066278b94ea2749456fc433cc398c6/etc/backports_required_for_multiversion_tests.yml#L1-L19).
Please follow the instructions described in the file. Please follow the instructions described in the file.

View File

@ -7,21 +7,22 @@ evergreen command.
Task generation allow us to do things like dynamically split a task into sub-tasks that can be run Task generation allow us to do things like dynamically split a task into sub-tasks that can be run
in parallel, or generate sub-tasks to run against different mongodb versions. in parallel, or generate sub-tasks to run against different mongodb versions.
Task generation is typically done with the [mongo-task-generator](https://github.com/mongodb/mongo-task-generator) Task generation is typically done with the
tool. Refer to its [documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md) [mongo-task-generator](https://github.com/mongodb/mongo-task-generator) tool. Refer to its
[documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md)
for details on how it works. for details on how it works.
## Configuring a task to be generated ## Configuring a task to be generated
In order to generate a task, we typically create a placeholder task. By convention the name of In order to generate a task, we typically create a placeholder task. By convention the name of these
these tasks should end in "\_gen". Most of the time, generated tasks should inherit the tasks should end in "\_gen". Most of the time, generated tasks should inherit the
[gen_task_template](https://github.com/mongodb/mongo/blob/31864e3866ce9cc54c08463019846ded2ad9e6e5/etc/evergreen_yml_components/definitions.yml#L99-L107) [gen_task_template](https://github.com/mongodb/mongo/blob/31864e3866ce9cc54c08463019846ded2ad9e6e5/etc/evergreen_yml_components/definitions.yml#L99-L107)
which configures the required dependencies. which configures the required dependencies.
The placeholder tasks needs to have the "generate resmoke tasks" function as one of its `commands`. The placeholder tasks needs to have the "generate resmoke tasks" function as one of its `commands`.
This is how the `mongo-task-generator` knows that the task needs to be generated. You can also This is how the `mongo-task-generator` knows that the task needs to be generated. You can also add
add `vars` to the function call to configure how the task will generated. You can refer to `vars` to the function call to configure how the task will generated. You can refer to the
the [mongo-task-generator](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#use-cases) [mongo-task-generator](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#use-cases)
documentation for details on what options are available. documentation for details on what options are available.
Once a placeholder task in defined, you can reference it just like a normal task. Once a placeholder task in defined, you can reference it just like a normal task.
@ -40,15 +41,15 @@ Task generation is performed as a 2-step process.
additional tasks in the future, they will exist to be run. additional tasks in the future, they will exist to be run.
This step will also hide all the placeholder tasks into a display task called `generator_tasks` This step will also hide all the placeholder tasks into a display task called `generator_tasks`
in each build variant. Once task generation is completed, the user should perform actions on in each build variant. Once task generation is completed, the user should perform actions on the
the generated tasks instead of the placeholder tasks, we encourage this by hiding the generated tasks instead of the placeholder tasks, we encourage this by hiding the placeholder
placeholder tasks from view. tasks from view.
2. After the tasks have been generated, the placeholder tasks are free to run. The placeholder tasks 2. After the tasks have been generated, the placeholder tasks are free to run. The placeholder tasks
simply find the task generated for them and mark it activated. Since generated tasks are simply find the task generated for them and mark it activated. Since generated tasks are created
created in the "inactive" state, this will activate any generated tasks whose placeholder task in the "inactive" state, this will activate any generated tasks whose placeholder task runs. This
runs. This enables users to select tasks to run on the initial task selection page even though enables users to select tasks to run on the initial task selection page even though the tasks
the tasks have not yet been generated. have not yet been generated.
**Note**: While this 2-step process allows a similar user experience to working with normal tasks, **Note**: While this 2-step process allows a similar user experience to working with normal tasks,
it does create a few UI quirks. For example, evergreen will hide "inactive" tasks in the UI, as a it does create a few UI quirks. For example, evergreen will hide "inactive" tasks in the UI, as a

View File

@ -2,10 +2,15 @@
## Types of timeouts ## Types of timeouts
There are two types of timeouts that [Evergreen supports](https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#timeoutupdate): There are two types of timeouts that
[Evergreen supports](https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#timeoutupdate):
- **Exec Timeout**: The _exec timeout_ is the overall timeout for a task. Once the total runtime for a test exceeds this value, the timeout logic will be triggered. This value is specified by `exec_timeout_secs` in the Evergreen configuration. - **Exec Timeout**: The _exec timeout_ is the overall timeout for a task. Once the total runtime for
- **Idle Timeout**: The _idle timeout_ is the amount of time Evergreen will wait for output to be generated before considering the task hung and triggering the timeout logic. This value is specified by `timeout_secs` in the Evergreen configuration. a test exceeds this value, the timeout logic will be triggered. This value is specified by
`exec_timeout_secs` in the Evergreen configuration.
- **Idle Timeout**: The _idle timeout_ is the amount of time Evergreen will wait for output to be
generated before considering the task hung and triggering the timeout logic. This value is
specified by `timeout_secs` in the Evergreen configuration.
**Note**: In most cases, the **exec timeout** is the more useful of the two timeouts. **Note**: In most cases, the **exec timeout** is the more useful of the two timeouts.
@ -15,15 +20,27 @@ There are several ways to set the timeout for a task running in Evergreen.
### Specifying timeouts in the Evergreen YAML configuration ### Specifying timeouts in the Evergreen YAML configuration
Timeouts can be specified directly in the `evergreen.yml` (and related) files, both for tasks and build variants. This approach is useful for setting default timeout values but is limited because different build variants often have varying runtime characteristics. This means it is not possible to set timeouts for a specific task running on a specific build variant using only this method. Timeouts can be specified directly in the `evergreen.yml` (and related) files, both for tasks and
build variants. This approach is useful for setting default timeout values but is limited because
different build variants often have varying runtime characteristics. This means it is not possible
to set timeouts for a specific task running on a specific build variant using only this method.
### Overrides: [etc/evergreen_timeouts.yml](../../etc/evergreen_timeouts.yml) ### Overrides: [etc/evergreen_timeouts.yml](../../etc/evergreen_timeouts.yml)
The `etc/evergreen_timeouts.yml` file allows overriding timeouts for specific tasks on specific build variants. This workaround helps address the limitations of directly specifying timeouts in `evergreen.yml`. To use this method, the task must include the `determine task timeout` and `update task timeout expansions` functions at the beginning of its Evergreen definition. Many Resmoke tasks already incorporate these functions. The `etc/evergreen_timeouts.yml` file allows overriding timeouts for specific tasks on specific
build variants. This workaround helps address the limitations of directly specifying timeouts in
`evergreen.yml`. To use this method, the task must include the `determine task timeout` and
`update task timeout expansions` functions at the beginning of its Evergreen definition. Many
Resmoke tasks already incorporate these functions.
### Resmoke tasks: [buildscripts/evergreen_task_timeout.py](../../buildscripts/evergreen_task_timeout.py) ### Resmoke tasks: [buildscripts/evergreen_task_timeout.py](../../buildscripts/evergreen_task_timeout.py)
This script reads the `etc/evergreen_timeouts.yml` file to calculate the appropriate timeout settings. Additionally, it checks historical test results for the task being run to determine if enough information is available to calculate timeouts based on past data. The script also supports more advanced methods of determining timeouts, such as applying aggressive timeout measures for tasks executed in the commit queue or on required build variants. In cases of conflict, the commit queue and required build variant limits take precedence over the previous two methods. This script reads the `etc/evergreen_timeouts.yml` file to calculate the appropriate timeout
settings. Additionally, it checks historical test results for the task being run to determine if
enough information is available to calculate timeouts based on past data. The script also supports
more advanced methods of determining timeouts, such as applying aggressive timeout measures for
tasks executed in the commit queue or on required build variants. In cases of conflict, the commit
queue and required build variant limits take precedence over the previous two methods.
The timeout that was calculated by the script can be retrieved from the logs: The timeout that was calculated by the script can be retrieved from the logs:
@ -38,4 +55,8 @@ The timeout that was calculated by the script can be retrieved from the logs:
### Compile tasks: [evergreen/generate_override_timeout.py](../../evergreen/generate_override_timeout.py) ### Compile tasks: [evergreen/generate_override_timeout.py](../../evergreen/generate_override_timeout.py)
This script is used for compile tasks defined in files such as `etc/evergreen_yml_components/tasks/compile_tasks.yml` and `etc/evergreen_yml_components/tasks/compile_tasks_shared.yml`. The script reads the `etc/evergreen_timeouts.yml` file and calculates appropriate timeouts. The Evergreen function `override task timeout` then runs this script to update the timeouts accordingly. This script is used for compile tasks defined in files such as
`etc/evergreen_yml_components/tasks/compile_tasks.yml` and
`etc/evergreen_yml_components/tasks/compile_tasks_shared.yml`. The script reads the
`etc/evergreen_timeouts.yml` file and calculates appropriate timeouts. The Evergreen function
`override task timeout` then runs this script to update the timeouts accordingly.

View File

@ -1,37 +1,47 @@
# Build Variants # Build Variants
This document describes build variants (a.k.a. variants, or builds, or buildvariants) that are used in `mongodb-mongo-*` projects. This document describes build variants (a.k.a. variants, or builds, or buildvariants) that are used
To know more about build variants, please refer to the [Build Variants](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#build-variants) section of the Evergreen wiki. in `mongodb-mongo-*` projects. To know more about build variants, please refer to the
[Build Variants](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#build-variants)
section of the Evergreen wiki.
## YAML files structure ## YAML files structure
Build variant configuration files are in `etc/evergreen_yml_components/variants` directory. Build variant configuration files are in `etc/evergreen_yml_components/variants` directory. They are
They are merged into `etc/evergreen.yml` and `etc/evergreen_nightly.yml` with Evergreen's [include](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#include) feature. merged into `etc/evergreen.yml` and `etc/evergreen_nightly.yml` with Evergreen's
[include](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#include)
feature.
Inside `etc/evergreen_yml_components/variants` directory there are more directories, Inside `etc/evergreen_yml_components/variants` directory there are more directories, which are in
which are in most cases platform names (e.g. amazon, rhel etc.) or build variant group names (e.g. sanitizer etc.). most cases platform names (e.g. amazon, rhel etc.) or build variant group names (e.g. sanitizer
etc.).
Be aware that some of these files could be also used or re-used to be merged into `etc/system_perf.yml` which is used for `sys-perf` project. Be aware that some of these files could be also used or re-used to be merged into
`etc/system_perf.yml` which is used for `sys-perf` project.
## Build Variants in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` ## Build Variants in `mongodb-mongo-master` and `mongodb-mongo-master-nightly`
`mongodb-mongo-master` evergreen project uses `etc/evergreen.yml` and contains all build variants for development, including all feature-specific, patch build required, and suggested variants. `mongodb-mongo-master` evergreen project uses `etc/evergreen.yml` and contains all build variants
for development, including all feature-specific, patch build required, and suggested variants.
`mongodb-mongo-master-nightly` evergreen project uses `etc/evergreen_nightly.yml` and contains build variants for public nightly builds. `mongodb-mongo-master-nightly` evergreen project uses `etc/evergreen_nightly.yml` and contains build
variants for public nightly builds.
## Required and Suggested Build Variants ## Required and Suggested Build Variants
"Required" build variants are defined as any build variant with a `!` at the front of its display name in Evergreen. "Required" build variants are defined as any build variant with a `!` at the front of its display
These build variants also have `required` tag. name in Evergreen. These build variants also have `required` tag.
[Required Patch Builds Policy](https://wiki.corp.mongodb.com/display/KERNEL/Required+Patch+Builds+Policy) [Required Patch Builds Policy](https://wiki.corp.mongodb.com/display/KERNEL/Required+Patch+Builds+Policy)
"Suggested" build variants are defined as any build variant with a `*` at the front of its display name in Evergreen. "Suggested" build variants are defined as any build variant with a `*` at the front of its display
These build variants also have `suggested` tag. name in Evergreen. These build variants also have `suggested` tag.
## Build Variants with forbid_tasks_tagged_with_experimental ## Build Variants with forbid_tasks_tagged_with_experimental
Build variants with the `forbid_tasks_tagged_with_experimental` tag indicate that they do not allow tasks tagged as `experimental` to run. This tag is used in conjunction with the `forbid-tasks-with-tag-on-variants` evergreen lint rule to enforce this restriction. Build variants with the `forbid_tasks_tagged_with_experimental` tag indicate that they do not allow
tasks tagged as `experimental` to run. This tag is used in conjunction with the
`forbid-tasks-with-tag-on-variants` evergreen lint rule to enforce this restriction.
## Build Variants after branching ## Build Variants after branching
@ -39,34 +49,48 @@ In each of platform or build variant group directory there can be these files:
- `test_dev.yml` - `test_dev.yml`
- these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project on master branch - these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project
- after branching on all new branches these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project on master branch
- after branching on all new branches these files are merged into `etc/evergreen_nightly.yml`
which is used for a new branch `mongodb-mongo-vX.Y` project
- `test_dev_master_and_lts_branches_only.yml` - `test_dev_master_and_lts_branches_only.yml`
- these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project on master branch - these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project
- after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project on master branch
- **important**: all tests that are running on these build variants will NOT run on a new Rapid release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects - after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into
`etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project
- **important**: all tests that are running on these build variants will NOT run on a new Rapid
release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects
- `test_dev_master_branch_only.yml` - `test_dev_master_branch_only.yml`
- these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project on master branch - these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project
on master branch
- after branching on all new branches these files are NOT used - after branching on all new branches these files are NOT used
- **important**: all tests that are running on these build variants will NOT run on a new branch `mongodb-mongo-vX.Y` project - **important**: all tests that are running on these build variants will NOT run on a new branch
`mongodb-mongo-vX.Y` project
- `test_release.yml` - `test_release.yml`
- these files are merged into `etc/evergreen_nightly.yml` which is used for `mongodb-mongo-master-nightly` project on master branch - these files are merged into `etc/evergreen_nightly.yml` which is used for
- after branching on all new branches these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project `mongodb-mongo-master-nightly` project on master branch
- after branching on all new branches these files are merged into `etc/evergreen_nightly.yml`
which is used for a new branch `mongodb-mongo-vX.Y` project
- `test_release_master_and_lts_branches_only.yml` - `test_release_master_and_lts_branches_only.yml`
- these files are merged into `etc/evergreen_nightly.yml` which is used for `mongodb-mongo-master-nightly` project on master branch - these files are merged into `etc/evergreen_nightly.yml` which is used for
- after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project `mongodb-mongo-master-nightly` project on master branch
- **important**: all tests that are running on these build variants will NOT run on a new Rapid release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects - after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into
`etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project
- **important**: all tests that are running on these build variants will NOT run on a new Rapid
release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects
- `test_release_master_branch_only.yml` - `test_release_master_branch_only.yml`
- these files are merged into `etc/evergreen_nightly.yml` which is used for `mongodb-mongo-master-nightly` project on master branch - these files are merged into `etc/evergreen_nightly.yml` which is used for
`mongodb-mongo-master-nightly` project on master branch
- after branching on all new branches these files are NOT used - after branching on all new branches these files are NOT used
- **important**: all tests that are running on these build variants will NOT run on a new branch `mongodb-mongo-vX.Y` project - **important**: all tests that are running on these build variants will NOT run on a new branch
`mongodb-mongo-vX.Y` project

View File

@ -11,14 +11,14 @@ section of the Evergreen wiki.
### `mongodb-mongo-master` ### `mongodb-mongo-master`
The main project for testing MongoDB's dev environments with a number build variants, The main project for testing MongoDB's dev environments with a number build variants, each one
each one corresponding to a particular compile or testing environment to support development. corresponding to a particular compile or testing environment to support development. Each build
Each build variant runs a set of tasks; each task ususally runs one or more tests. variant runs a set of tasks; each task ususally runs one or more tests.
### `mongodb-mongo-master-nightly` ### `mongodb-mongo-master-nightly`
Tracks the same branch as `mongodb-mongo-master`, each build variant corresponds to a Tracks the same branch as `mongodb-mongo-master`, each build variant corresponds to a (version, OS,
(version, OS, architecure) triplet for a supported MongoDB nightly release. architecure) triplet for a supported MongoDB nightly release.
### `sys_perf` ### `sys_perf`
@ -28,22 +28,23 @@ The system performance project.
The above Evergreen projects are defined in the following files: The above Evergreen projects are defined in the following files:
- `etc/evergreen_yml_components/**.yml`. YAML files containing definitions for tasks, functions, buildvariants, etc. - `etc/evergreen_yml_components/**.yml`. YAML files containing definitions for tasks, functions,
They are copied from the existing evergreen.yml file. buildvariants, etc. They are copied from the existing evergreen.yml file.
- `etc/evergreen.yml`. Imports components from above and serves as the project config for mongodb-mongo-master, - `etc/evergreen.yml`. Imports components from above and serves as the project config for
containing all build variants for development, including all feature-specific, patch build required, and suggested mongodb-mongo-master, containing all build variants for development, including all
variants. feature-specific, patch build required, and suggested variants.
- `etc/evergreen_nightly.yml`. The project configuration for mongodb-mongo-master-nightly, containing only build - `etc/evergreen_nightly.yml`. The project configuration for mongodb-mongo-master-nightly,
variants for public nightly builds, imports similar components as evergreen.yml to ensure consistency. containing only build variants for public nightly builds, imports similar components as
evergreen.yml to ensure consistency.
- `etc/sys_perf.yml`. Configuration file for the system performance project. - `etc/sys_perf.yml`. Configuration file for the system performance project.
## Release Branching Process ## Release Branching Process
Only the `mongodb-mongo-master-nightly` project will be branched with required and other Only the `mongodb-mongo-master-nightly` project will be branched with required and other necessary
necessary variants (e.g. sanitizers) added back in. Most variants in `mongodb-mongo-master` variants (e.g. sanitizers) added back in. Most variants in `mongodb-mongo-master` would be dropped
would be dropped by default but can be re-introduced to the release branches manually on an by default but can be re-introduced to the release branches manually on an as-needed basis. For
as-needed basis. For Rapid releases, all but the variants relevant to Atlas in Rapid releases, all but the variants relevant to Atlas in `mongodb-mongo-master-nightly` may be
`mongodb-mongo-master-nightly` may be dropped as well. dropped as well.

View File

@ -1,11 +1,15 @@
# Task ownership tags # Task ownership tags
This document describes task ownership tags that are used in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects. This document describes task ownership tags that are used in `mongodb-mongo-master` and
`mongodb-mongo-master-nightly` projects.
Every task in in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects should be tag with exactly one `assigned_to_jira_team_.+` tag. Every task in in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects should be tag
Team names (the part after `assigned_to_jira_team_`) should match `evergreen_tag_name` from team configurations in [mothra](https://github.com/10gen/mothra/tree/main/mothra/teams). with exactly one `assigned_to_jira_team_.+` tag. Team names (the part after
`assigned_to_jira_team_`) should match `evergreen_tag_name` from team configurations in
[mothra](https://github.com/10gen/mothra/tree/main/mothra/teams).
This is enforced by linter. YAML linter configuration could be found [here](../../../etc/evergreen_lint.yml). This is enforced by linter. YAML linter configuration could be found
[here](../../../etc/evergreen_lint.yml).
If the linter configuration is missing your team: If the linter configuration is missing your team:
@ -13,4 +17,7 @@ If the linter configuration is missing your team:
2. Make sure that your team configuration in mothra has `evergreen_tag_name` 2. Make sure that your team configuration in mothra has `evergreen_tag_name`
3. Update the tag list with `assigned_to_jira_team_{evergreen_tag_name}` tag for your team 3. Update the tag list with `assigned_to_jira_team_{evergreen_tag_name}` tag for your team
Dynamically generated tasks for resmoke suites (i.e. the ones named like `//buildscripts/resmokeconfig:core`) will set the ownership tag based on a best effort lookup from the codeowner of the test's definition to a team name from mothra, picking the first encountered in case of multiple possible assignments. Dynamically generated tasks for resmoke suites (i.e. the ones named like
`//buildscripts/resmokeconfig:core`) will set the ownership tag based on a best effort lookup from
the codeowner of the test's definition to a team name from mothra, picking the first encountered in
case of multiple possible assignments.

View File

@ -1,49 +1,58 @@
# Task selection tags # Task selection tags
This document describes task selection tags that are used in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects. This document describes task selection tags that are used in `mongodb-mongo-master` and
To know more about task tags, please refer to the [Task and Variant Tags](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#task-and-variant-tags) section of the Evergreen wiki. `mongodb-mongo-master-nightly` projects. To know more about task tags, please refer to the
[Task and Variant Tags](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#task-and-variant-tags)
section of the Evergreen wiki.
The majority of variants in `mongodb-mongo-master-nightly` project and the most significat variants in `mongodb-mongo-master` project are using required and optional groups of task selection tags. The majority of variants in `mongodb-mongo-master-nightly` project and the most significat variants
In order to add tasks to those variants, please use them as described in the following sections. in `mongodb-mongo-master` project are using required and optional groups of task selection tags. In
order to add tasks to those variants, please use them as described in the following sections.
## Required task selection tags ## Required task selection tags
Every task in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` project must be tagged with exactly one required selection tag. Every task in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` project must be tagged with
This is enforced by linter. YAML linter configuration could be found [here](../../../etc/evergreen_lint.yml). exactly one required selection tag. This is enforced by linter. YAML linter configuration could be
found [here](../../../etc/evergreen_lint.yml).
- `development_critical` - these tasks should be green prior to the merge and will block merging if failing, e.g. jsCore. - `development_critical` - these tasks should be green prior to the merge and will block merging if
We run these tasks on all variants and in the commit-queue. failing, e.g. jsCore. We run these tasks on all variants and in the commit-queue.
- `development_critical_single_variant` - the same as `development_critical` but these tasks do not require to run on multiple variants, e.g. clang-tidy, formatters, linters etc. - `development_critical_single_variant` - the same as `development_critical` but these tasks do not
We run these tasks on the required variant and in the commit-queue. require to run on multiple variants, e.g. clang-tidy, formatters, linters etc. We run these tasks
on the required variant and in the commit-queue.
- `no_commit_queue` - add this to tasks in development_critical that you do not want in the commit-queue - `no_commit_queue` - add this to tasks in development_critical that you do not want in the
commit-queue
- `release_critical` - these tasks should be green prior to the release. - `release_critical` - these tasks should be green prior to the release. We run these tasks on all
We run these tasks on all release and development (required and suggested) variants. release and development (required and suggested) variants. It should be uncommon to add tasks to
It should be uncommon to add tasks to this tag but if your task needs to run on many different OSes and it is extremely broad in coverage then you can add it to this tag. this tag but if your task needs to run on many different OSes and it is extremely broad in
coverage then you can add it to this tag.
- `default` - these tasks are running as part of a required patch build. - `default` - these tasks are running as part of a required patch build. We run these tasks on the
We run these tasks on the most significant development variants (required patches, tsan, aubsan, etc.). most significant development variants (required patches, tsan, aubsan, etc.). Use this tag if you
Use this tag if you are not sure which tag to use for your new task. are not sure which tag to use for your new task.
- `non_deterministic` - these tasks depend significantly on randomization and we expect to see some unique failures, e.g. fuzzers etc. - `non_deterministic` - these tasks depend significantly on randomization and we expect to see some
We run these tasks on non-required development variants. unique failures, e.g. fuzzers etc. We run these tasks on non-required development variants.
- `experimental` - these tasks are not running anywhere regularly. - `experimental` - these tasks are not running anywhere regularly. We do not use this tag for
We do not use this tag for selecting tasks to run on variants. selecting tasks to run on variants. This tag could be used for tasks that you would like to run on
This tag could be used for tasks that you would like to run on your own custom variants. your own custom variants.
- `auxiliary` - these are various setup, helper, etc. tasks and should be mostly owned by infrastructure team. - `auxiliary` - these are various setup, helper, etc. tasks and should be mostly owned by
You should almost never use this tag. infrastructure team. You should almost never use this tag. Please reach out to
Please reach out to [#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N) before adding tasks with this tag. [#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N) before adding tasks
with this tag.
**Important**: Do not change anything in this list without talking to [#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N). **Important**: Do not change anything in this list without talking to
[#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N).
## Optional task selection tags ## Optional task selection tags
In addition to the required task selection tags there is a list of optional selection tags. In addition to the required task selection tags there is a list of optional selection tags. Every
Every task could be tagged with any number of the following tags: task could be tagged with any number of the following tags:
- `incompatible_community` - the task should be excluded from the community variants. - `incompatible_community` - the task should be excluded from the community variants.
- `incompatible_windows` - the task should be excluded from Windows variants. - `incompatible_windows` - the task should be excluded from Windows variants.
@ -55,16 +64,20 @@ Every task could be tagged with any number of the following tags:
- `incompatible_aubsan` - the task should be excluded from {A,UB}SAN variants. - `incompatible_aubsan` - the task should be excluded from {A,UB}SAN variants.
- `incompatible_tsan` - the task should be excluded from TSAN variants. - `incompatible_tsan` - the task should be excluded from TSAN variants.
- `incompatible_debug_mode` - the task should be excluded from Debug Mode variants. - `incompatible_debug_mode` - the task should be excluded from Debug Mode variants.
- `incompatible_system_allocator` - the task should be excluded from variants that use the system allocator. - `incompatible_system_allocator` - the task should be excluded from variants that use the system
allocator.
- `incompatible_all_feature_flags` - the task should be excluded from all-feature-flags variants. - `incompatible_all_feature_flags` - the task should be excluded from all-feature-flags variants.
- `incompatible_development_variant` - the task should be excluded from the development variants. - `incompatible_development_variant` - the task should be excluded from the development variants.
- `incompatible_oscrypto` - the task should be excluded from variants unsupported by oscrypto. - `incompatible_oscrypto` - the task should be excluded from variants unsupported by oscrypto.
- `requires_compile_variant` - the task can (or should) only run on variants that has compile releated expansions. - `requires_compile_variant` - the task can (or should) only run on variants that has compile
releated expansions.
- `requires_large_host` - the task requires a large host to run. - `requires_large_host` - the task requires a large host to run.
- `requires_large_host_aubsan` - the task requires a large host to run on {A,UB}SAN variants. - `requires_large_host_aubsan` - the task requires a large host to run on {A,UB}SAN variants.
- `requires_large_host_tsan` - the task requires a large host to run on TSAN variants. - `requires_large_host_tsan` - the task requires a large host to run on TSAN variants.
- `requires_large_host_debug_mode` - the task requires a large host to run on Debug Mode variants. - `requires_large_host_debug_mode` - the task requires a large host to run on Debug Mode variants.
- `requires_large_host_commit_queue` - the task requires a large host to run on in the commit-queue. - `requires_large_host_commit_queue` - the task requires a large host to run on in the commit-queue.
- `requires_all_feature_flags` - the task can only run on variants that has all-feature-flags configuration. - `requires_all_feature_flags` - the task can only run on variants that has all-feature-flags
- `requires_execution_on_windows_patch_build` - the task should be run on the required Windows build variant on each patch configuration.
build. See [SERVER-79037](https://jira.mongodb.org/browse/SERVER-79037) for how this was calculated. - `requires_execution_on_windows_patch_build` - the task should be run on the required Windows build
variant on each patch build. See [SERVER-79037](https://jira.mongodb.org/browse/SERVER-79037) for
how this was calculated.

View File

@ -5,16 +5,16 @@ MongoDB code uses the following types of assertions that are available for use:
- `uassert` and `iassert` - `uassert` and `iassert`
- Checks for per-operation user errors. Operation-fatal. - Checks for per-operation user errors. Operation-fatal.
- `tassert` - `tassert`
- Like uassert in that it checks for per-operation user errors, but inhibits clean shutdown - Like uassert in that it checks for per-operation user errors, but inhibits clean shutdown in
in tests. Operation-fatal, but process-fatal in testing environments during shutdown. tests. Operation-fatal, but process-fatal in testing environments during shutdown.
- `massert` - `massert`
- Checks per-operation invariants. Operation-fatal. - Checks per-operation invariants. Operation-fatal.
- `fassert` - `fassert`
- Checks fatal process invariants. Process-fatal. Use to detect unexpected situations (such - Checks fatal process invariants. Process-fatal. Use to detect unexpected situations (such as a
as a system function returning an unexpected error status). system function returning an unexpected error status).
- `invariant` - `invariant`
- Checks process invariant. Process-fatal. Use to detect code logic errors ("pointer should - Checks process invariant. Process-fatal. Use to detect code logic errors ("pointer should never
never be null", "we should always be locked"). be null", "we should always be locked").
**Note**: Calling C function `assert` is not allowed. Use one of the above instead. **Note**: Calling C function `assert` is not allowed. Use one of the above instead.
@ -50,8 +50,8 @@ Some assertions will increment an assertion counter. The `serverStatus` command
- `tripwire` - `tripwire`
- Incremented by `tassert`. - Incremented by `tassert`.
- `rollovers` - `rollovers`
- When any counter reaches a value of `1 << 30`, all of the counters are reset and - When any counter reaches a value of `1 << 30`, all of the counters are reset and the "rollovers"
the "rollovers" counter is incremented. counter is incremented.
## Considerations ## Considerations
@ -61,52 +61,53 @@ terminate the current operation, not the whole process. Be careful not to corrup
mistakenly using these assertions midway through mutating process state. mistakenly using these assertions midway through mutating process state.
`fassert` failures will terminate the entire process; this is used for low-level checks where `fassert` failures will terminate the entire process; this is used for low-level checks where
continuing might lead to corrupt data or loss of data on disk. Additionally, `fassert` will log continuing might lead to corrupt data or loss of data on disk. Additionally, `fassert` will log a
a generic assertion message with fatal severity and add a breakpoint before terminating. generic assertion message with fatal severity and add a breakpoint before terminating.
To log a custom assertion message and terminate the server, use `LOGV2_FATAL`. To log a custom assertion message and terminate the server, use `LOGV2_FATAL`. To avoid printing a
To avoid printing a stacktrace on failure use `fassertNoTrace` or `LOGV2_FATAL_NO_TRACE`. stacktrace on failure use `fassertNoTrace` or `LOGV2_FATAL_NO_TRACE`. Consider using them if there
Consider using them if there is only one way to reach this fatal point in code. is only one way to reach this fatal point in code.
`tassert` will fail the operation like `uassert`, but also triggers a "deferred-fatality tripwire `tassert` will fail the operation like `uassert`, but also triggers a "deferred-fatality tripwire
flag". In testing environments, if the tripwire flag is set during shutdown, the process will flag". In testing environments, if the tripwire flag is set during shutdown, the process will invoke
invoke the tripwire fatal assertion. In non-testing environments, there will only be a warning the tripwire fatal assertion. In non-testing environments, there will only be a warning during
during shutdown that tripwire assertions have failed. shutdown that tripwire assertions have failed.
`tassert` presents more diagnostics than `uassert`. `tassert` will log the assertion as an error, `tassert` presents more diagnostics than `uassert`. `tassert` will log the assertion as an error,
log scoped debug info (for more info, see ScopedDebugInfoStack defined in log scoped debug info (for more info, see ScopedDebugInfoStack defined in
[mongo/util/assert_util.h][assert_util_h]), print the stack trace, and add a breakpoint. [mongo/util/assert_util.h][assert_util_h]), print the stack trace, and add a breakpoint. The purpose
The purpose of `tassert` is to ensure that operation failures will cause a test suite to fail of `tassert` is to ensure that operation failures will cause a test suite to fail without resorting
without resorting to different behavior during testing. `tassert` should only be used to check to different behavior during testing. `tassert` should only be used to check for unexpected values
for unexpected values produced by defined behavior. produced by defined behavior.
Both `massert` and `uassert` take error codes, so that all assertions have codes associated with Both `massert` and `uassert` take error codes, so that all assertions have codes associated with
them. Currently, programmers are free to provide the error code by either [using a unique location them. Currently, programmers are free to provide the error code by either
number](#choosing-a-unique-location-number) or choosing a named code from `ErrorCodes`. Unique location [using a unique location number](#choosing-a-unique-location-number) or choosing a named code from
numbers have no meaning other than a way to associate a log message with a line of code. `ErrorCodes`. Unique location numbers have no meaning other than a way to associate a log message
with a line of code.
`massert` will log the assertion message as an error, while `uassert` will log the message with `massert` will log the assertion message as an error, while `uassert` will log the message with
debug level of 1 (for more info about log debug level, see [docs/logging.md][logging_md]). debug level of 1 (for more info about log debug level, see [docs/logging.md][logging_md]).
`iassert` provides similar functionality to `uassert`, but it logs at a debug level of 3 and `iassert` provides similar functionality to `uassert`, but it logs at a debug level of 3 and does
does not increment user assertion counters. We should always choose `iassert` over `uassert` not increment user assertion counters. We should always choose `iassert` over `uassert` when we
when we expect a failure, a failure might be recoverable, or failure accounting is not interesting. expect a failure, a failure might be recoverable, or failure accounting is not interesting.
### Choosing a unique location number ### Choosing a unique location number
The current convention for choosing a unique location number is to use the 5 or 6 digit SERVER ticket number The current convention for choosing a unique location number is to use the 5 or 6 digit SERVER
for the ticket being addressed when the assertion is added, followed by a two digit counter to distinguish ticket number for the ticket being addressed when the assertion is added, followed by a two digit
between codes added as part of the same ticket. For example, if you're working on SERVER-12345, the first counter to distinguish between codes added as part of the same ticket. For example, if you're
error code would be 1234500, the second would be 1234501, etc. This convention can also be used for LOGV2 working on SERVER-12345, the first error code would be 1234500, the second would be 1234501, etc.
logging id numbers. This convention can also be used for LOGV2 logging id numbers.
The only real constraint for unique location numbers is that they must be unique across the codebase. This is The only real constraint for unique location numbers is that they must be unique across the
verified at compile time with a [python script][errorcodes_py]. codebase. This is verified at compile time with a [python script][errorcodes_py].
## Exception ## Exception
A failed operation-fatal assertion throws an `AssertionException` or a child of that. A failed operation-fatal assertion throws an `AssertionException` or a child of that. The
The inheritance hierarchy resembles: inheritance hierarchy resembles:
- `std::exception` - `std::exception`
- `mongo::DBException` - `mongo::DBException`
@ -123,14 +124,14 @@ upwards harmlessly. The code should also expect, and properly handle, `UserExcep
## ErrorCodes and Status ## ErrorCodes and Status
MongoDB uses `ErrorCodes` both internally and externally: a subset of error codes (e.g., MongoDB uses `ErrorCodes` both internally and externally: a subset of error codes (e.g., `BadValue`)
`BadValue`) are used externally to pass errors over the wire and to clients. These error codes are are used externally to pass errors over the wire and to clients. These error codes are the means for
the means for MongoDB processes (e.g., _mongod_ and _mongo_) to communicate errors, and are visible MongoDB processes (e.g., _mongod_ and _mongo_) to communicate errors, and are visible to client
to client applications. Other error codes are used internally to indicate the underlying reason for applications. Other error codes are used internally to indicate the underlying reason for a failed
a failed operation. For instance, `PeriodicJobIsStopped` is an internal error code that is passed operation. For instance, `PeriodicJobIsStopped` is an internal error code that is passed to callback
to callback functions running inside a [`PeriodicRunner`][periodic_runner_h] once the runner is functions running inside a [`PeriodicRunner`][periodic_runner_h] once the runner is stopped. The
stopped. The internal error codes are for internal use only and must never be returned to clients internal error codes are for internal use only and must never be returned to clients (i.e., in a
(i.e., in a network response). network response).
Zero or more error categories can be assigned to `ErrorCodes`, which allows a single handler to Zero or more error categories can be assigned to `ErrorCodes`, which allows a single handler to
serve a group of `ErrorCodes`. `RetriableError`, for instance, is an `ErrorCategory` that includes serve a group of `ErrorCodes`. `RetriableError`, for instance, is an `ErrorCategory` that includes
@ -140,10 +141,10 @@ operation that fails with any error code in this category can be safely retried.
we can use `ErrorCodes::is${category}(${error})` to check error categories. Both methods provide we can use `ErrorCodes::is${category}(${error})` to check error categories. Both methods provide
similar functionality. similar functionality.
To represent the status of an executed operation (e.g., a command or a function invocation), we To represent the status of an executed operation (e.g., a command or a function invocation), we use
use `Status` objects, which represent an error state or the absence thereof. A `Status` uses the `Status` objects, which represent an error state or the absence thereof. A `Status` uses the
standardized `ErrorCodes` to determine the underlying cause of an error. It also allows assigning standardized `ErrorCodes` to determine the underlying cause of an error. It also allows assigning a
a textual description, as well as code-specific extra info, to the error code for further textual description, as well as code-specific extra info, to the error code for further
clarification. The extra info is a subclass of `ErrorExtraInfo` and specific to `ErrorCodes`. Look clarification. The extra info is a subclass of `ErrorExtraInfo` and specific to `ErrorCodes`. Look
for `extra` in [here][error_codes_yml] for reference. for `extra` in [here][error_codes_yml] for reference.
@ -153,28 +154,26 @@ functions with multiple out parameters. We can either pass an error code or an a
`StatusWith` object, indicating failure or success of the operation. For examples of the proper `StatusWith` object, indicating failure or success of the operation. For examples of the proper
usage of `StatusWith`, see [mongo/base/status_with.h][status_with_h] and usage of `StatusWith`, see [mongo/base/status_with.h][status_with_h] and
[mongo/base/status_with_test.cpp][status_with_test_cpp]. It is highly recommended to use `uassert` [mongo/base/status_with_test.cpp][status_with_test_cpp]. It is highly recommended to use `uassert`
or `iassert` over `StatusWith`, and catch exceptions instead of checking `Status` objects or `iassert` over `StatusWith`, and catch exceptions instead of checking `Status` objects returned
returned from functions. Using `StatusWith` to indicate exceptions, instead of throwing via from functions. Using `StatusWith` to indicate exceptions, instead of throwing via `uassert` and
`uassert` and `iassert`, makes it very difficult to identify that an error has occurred, and `iassert`, makes it very difficult to identify that an error has occurred, and could lead to the
could lead to the wrong error being propagated. wrong error being propagated.
## Using noexcept ## Using noexcept
Server code should generally be written to be exception safe. Historically, Server code should generally be written to be exception safe. Historically, we've had bugs due to
we've had bugs due to code being overzealously marked `noexcept`. In such code being overzealously marked `noexcept`. In such contexts, throwing an exception crashes the
contexts, throwing an exception crashes the server, which can compromise server, which can compromise availability. However, _just_ removing `noexcept` from such code is not
availability. However, _just_ removing `noexcept` from such code is not a viable a viable solution \- exception unsafe code may _need_ to crash in order to avoid causing an even
solution \- exception unsafe code may _need_ to crash in order to avoid causing worse failure. We want to work towards ensuring that functions that ought to be are in fact
an even worse failure. We want to work towards ensuring that functions that exception safe, and remove `noexcept` usage where it's not warranted. Here, we outline guidelines
ought to be are in fact exception safe, and remove `noexcept` usage where it's for doing so.
not warranted. Here, we outline guidelines for doing so.
Noexcept is a runtime check that terminates the process rather than allowing Noexcept is a runtime check that terminates the process rather than allowing the function to exit
the function to exit because of a throw. Noexcept may be used when it can be because of a throw. Noexcept may be used when it can be thought of as a bug for any uncaught
thought of as a bug for any uncaught exception to be thrown. There is no exception to be thrown. There is no compile-time check that exceptions will not be thrown within a
compile-time check that exceptions will not be thrown within a `noexcept` `noexcept` function. Instead, putting `noexcept` on a function may be thought of as similar to using
function. Instead, putting `noexcept` on a function may be thought of as similar invariant in the following way:
to using invariant in the following way:
```c ```c
// Example noexcept code. // Example noexcept code.
@ -190,92 +189,80 @@ void func() try {
} }
``` ```
**As with invariant, be very careful when putting `noexcept` on a function that **As with invariant, be very careful when putting `noexcept` on a function that interacts with
interacts with untrusted input.** This has been the root cause of serious past untrusted input.** This has been the root cause of serious past bugs.
bugs.
### Adding or Removing noexcept ### Adding or Removing noexcept
When considering removing `noexcept` from a function, the author of that change When considering removing `noexcept` from a function, the author of that change must ensure that the
must ensure that the functions implementation and its callsites are not functions implementation and its callsites are not relying on the function not throwing for
relying on the function not throwing for correctness. Because of this, **be correctness. Because of this, **be careful putting `noexcept` on a function** if theres a chance it
careful putting `noexcept` on a function** if theres a chance it may need to be may need to be removed later. `noexcept` generally **should not be used** solely for reasons of
removed later. `noexcept` generally **should not be used** solely for reasons of performance optimization. Aside from the cases listed in the next section, it should not be assumed
performance optimization. Aside from the cases listed in the next section, it to improve performance without solid evidence.
should not be assumed to improve performance without solid evidence.
If a part of the implementation would benefit from relying on not throwing, but If a part of the implementation would benefit from relying on not throwing, but `noexcept` is not
`noexcept` is not meant to be a part of the functions contract, it is acceptable meant to be a part of the functions contract, it is acceptable to use a try/catch/invariant
to use a try/catch/invariant construction similar to the example above or an construction similar to the example above or an internal `noexcept` helper function.
internal `noexcept` helper function.
When adding or removing `noexcept`, also consider what types of exceptions are When adding or removing `noexcept`, also consider what types of exceptions are possible in that
possible in that context and in our codebase. Refer to the “Where Exceptions context and in our codebase. Refer to the “Where Exceptions are Possible” section for more details.
are Possible” section for more details.
If you are uncertain about adding or removing `noexcept` in a given situation, If you are uncertain about adding or removing `noexcept` in a given situation, reach out to
reach out to \#server-programmability on slack. \#server-programmability on slack.
### Cases Where noexcept is Encouraged ### Cases Where noexcept is Encouraged
This list is not exhaustive and there are cases not enumerated here that are This list is not exhaustive and there are cases not enumerated here that are valid uses of
valid uses of `noexcept`. `noexcept`.
#### Move operations #### Move operations
Using `noexcept` with move operations allows operations to skip generating Using `noexcept` with move operations allows operations to skip generating exception handling code.
exception handling code. If a types move operation will not throw exceptions, If a types move operation will not throw exceptions, it is strictly worse not to use `noexcept`.
it is strictly worse not to use `noexcept`. For instance, std::vector\<T\> can For instance, std::vector\<T\> can use optimized versions of certain operations when T has
use optimized versions of certain operations when T has `noexcept` move `noexcept` move operations. In these cases, **`noexcept` can be considered a requirement**. Of
operations. In these cases, **`noexcept` can be considered a requirement**. Of course, if a move operation genuinely needs to throw exceptions, then dont mark it `noexcept`. This
course, if a move operation genuinely needs to throw exceptions, then dont should be very rare moves should be non-throwing in almost all cases.
mark it `noexcept`. This should be very rare moves should be non-throwing in
almost all cases.
#### Swap operations #### Swap operations
Allows callers to optimize for an exception-free pathway. **Swap operations Allows callers to optimize for an exception-free pathway. **Swap operations should follow the same
should follow the same `noexcept` guidelines as move operations**. `noexcept` guidelines as move operations**.
#### Hash functions #### Hash functions
Allows some hashing library types to optimize for an exception-free pathway. Allows some hashing library types to optimize for an exception-free pathway. This can even affect
This can even affect the behavior, performance, and even layout of certain the behavior, performance, and even layout of certain container types (such as libstdc++s
container types (such as libstdc++s [unordered_map](https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html)). **Hash
[unordered_map](https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html)). functions should follow the `noexcept` guidelines as move operations.**
**Hash functions should follow the `noexcept` guidelines as move operations.**
#### Destructors and “Destructor-Safe” Functions #### Destructors and “Destructor-Safe” Functions
Destructors are generally implicitly `noexcept`, and are encouraged to remain Destructors are generally implicitly `noexcept`, and are encouraged to remain implicitly `noexcept`
implicitly `noexcept` \- that is, by not marking them with `noexcept(false)`. \- that is, by not marking them with `noexcept(false)`. Functions where “destructor safety” is a
Functions where “destructor safety” is a core part of their functionality **may core part of their functionality **may be marked `noexcept`**. This is not a requirement
be marked `noexcept`**. This is not a requirement destructors are allowed to destructors are allowed to call potentially-throwing functions. It is also not a blanket
call potentially-throwing functions. It is also not a blanket recommendation to recommendation to consider `noexcept` for all functions called from destructors. When calling a
consider `noexcept` for all functions called from destructors. When calling a potentially-throwing function from a destructor, think about whether or not it can indeed throw in
potentially-throwing function from a destructor, think about whether or not it that context, and if exceptions need to be handled. If it can indeed throw in that context,
can indeed throw in that context, and if exceptions need to be handled. If it exceptions almost certainly need to be handled \- otherwise the server will crash.
can indeed throw in that context, exceptions almost certainly need to be
handled \- otherwise the server will crash.
The lambda passed to `ON_BLOCK_EXIT()` and `ScopeGuard()` should be treated The lambda passed to `ON_BLOCK_EXIT()` and `ScopeGuard()` should be treated similarly to
similarly to destructors: it is executed in a `noexcept` context (a destructor) destructors: it is executed in a `noexcept` context (a destructor) and marking it as such is
and marking it as such is discouraged as being noisy. But code intended to be discouraged as being noisy. But code intended to be called from them can be.
called from them can be.
### Where Exceptions are Possible ### Where Exceptions are Possible
In our codebase, generally DBException is the only type of exception that In our codebase, generally DBException is the only type of exception that should be crossing API
should be crossing API boundaries. If an exception other than a DBException boundaries. If an exception other than a DBException does cross an API boundary, it should be
does cross an API boundary, it should be considered a bug. Whichever component considered a bug. Whichever component throws the exception should handle it locally, even if only by
throws the exception should handle it locally, even if only by translating it translating it to a DBException. Generally any caller you would consider to be an external caller
to a DBException. Generally any caller you would consider to be an external should be able to rely on DBException being the only exception type your function will throw.
caller should be able to rely on DBException being the only exception type your
function will throw.
Allocations using the global new allocator or std::allocator in our codebase do Allocations using the global new allocator or std::allocator in our codebase do not throw, instead
not throw, instead terminating the process directly when OOM conditions are terminating the process directly when OOM conditions are encountered. As such, there is no need to
encountered. As such, there is no need to handle exceptions from these sources. handle exceptions from these sources.
## Gotchas ## Gotchas
@ -284,10 +271,10 @@ Gotchas to watch out for:
- Generally, do not throw an `AssertionException` directly. Functions like `uasserted()` do work - Generally, do not throw an `AssertionException` directly. Functions like `uasserted()` do work
beyond just that. In particular, it makes sure that the `getLastError` structures are set up beyond just that. In particular, it makes sure that the `getLastError` structures are set up
properly. properly.
- Think about the location of your asserts in constructors, as the destructor would not be - Think about the location of your asserts in constructors, as the destructor would not be called.
called. But at a minimum, use `wassert` a lot therein, we want to know if something is wrong. But at a minimum, use `wassert` a lot therein, we want to know if something is wrong.
- Do **not** throw in destructors or allow exceptions to leak out (if you call a function that - Do **not** throw in destructors or allow exceptions to leak out (if you call a function that may
may throw). throw).
[raii]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization [raii]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization
[error_codes_yml]: ../src/mongo/base/error_codes.yml [error_codes_yml]: ../src/mongo/base/error_codes.yml

View File

@ -6,18 +6,17 @@ branches, enhance diagnostics, or achieve any number of other aims. Fail points
configured, and disabled via command request to a remote process or via an API within the same configured, and disabled via command request to a remote process or via an API within the same
process. process.
For more on what test-only means and how to enable the `configureFailPoint` command, see [test_commands][test_only]. For more on what test-only means and how to enable the `configureFailPoint` command, see
[test_commands][test_only].
## Using Fail Points ## Using Fail Points
A fail point must first be defined using `MONGO_FAIL_POINT_DEFINE(myFailPoint)`. This statement A fail point must first be defined using `MONGO_FAIL_POINT_DEFINE(myFailPoint)`. This statement adds
adds the fail point to a registry and allows it to be evaluated in code. There are three common the fail point to a registry and allows it to be evaluated in code. There are three common patterns
patterns for evaluating a fail point: for evaluating a fail point:
- Exercise a rarely used branch: - Exercise a rarely used branch: `if (whenPigsFly || myFailPoint.shouldFail()) { ... }`
`if (whenPigsFly || myFailPoint.shouldFail()) { ... }` - Block until the fail point is unset: `myFailPoint.pauseWhileSet();`
- Block until the fail point is unset:
`myFailPoint.pauseWhileSet();`
- Use the fail point's payload to perform custom behavior: - Use the fail point's payload to perform custom behavior:
`myFailPoint.execute([](const BSONObj& data) { useMyPayload(data); };` `myFailPoint.execute([](const BSONObj& data) { useMyPayload(data); };`
@ -30,9 +29,9 @@ Fail point configuration involves choosing a "mode" for activation (e.g., "alway
providing additional data in the form of a BSON object. For the vast majority of cases, this is done providing additional data in the form of a BSON object. For the vast majority of cases, this is done
by issuing a `configureFailPoint` command request. This is made easier in JavaScript using the by issuing a `configureFailPoint` command request. This is made easier in JavaScript using the
`configureFailPoint` helper from [fail_point_util.js][fail_point_util]. Fail points can also be `configureFailPoint` helper from [fail_point_util.js][fail_point_util]. Fail points can also be
useful in C++ unit tests and integration tests. To configure fail points on the local process, use useful in C++ unit tests and integration tests. To configure fail points on the local process, use a
a `FailPointEnableBlock` to enable and configure the fail point for a given block scope. Finally, `FailPointEnableBlock` to enable and configure the fail point for a given block scope. Finally, a
a fail point can also be set via setParameter by its name prefixed with "failpoint." (e.g., fail point can also be set via setParameter by its name prefixed with "failpoint." (e.g.,
"failpoint.myFailPoint"). "failpoint.myFailPoint").
Users can also wait until a fail point has been evaluated a certain number of times **_over its Users can also wait until a fail point has been evaluated a certain number of times **_over its
@ -50,8 +49,8 @@ command implementations, see [here][fail_point_commands].
The `failCommand` fail point is a special fail point used to mock arbitrary response behaviors to The `failCommand` fail point is a special fail point used to mock arbitrary response behaviors to
requests filtered by command, appName, etc. It is most often used to simulate specific conditions requests filtered by command, appName, etc. It is most often used to simulate specific conditions
between nodes like invalid replica set configurations. For examples of use, see the between nodes like invalid replica set configurations. For examples of use, see the [failCommand
[failCommand JavaScript tests][fail_command_javascript_test]. JavaScript tests][fail_command_javascript_test].
[fail_point]: ../src/mongo/util/fail_point.h [fail_point]: ../src/mongo/util/fail_point.h
[fail_point_test]: ../src/mongo/util/fail_point_test.cpp [fail_point_test]: ../src/mongo/util/fail_point_test.cpp

View File

@ -68,11 +68,11 @@ Future<Message> call(Message& toSend) {
First, notice that our calls to `TransportSession::sourceMessage` and First, notice that our calls to `TransportSession::sourceMessage` and
`TransportSession::sinkMessage` have been replaced with calls to asynchronous versions of those `TransportSession::sinkMessage` have been replaced with calls to asynchronous versions of those
functions. These asynchronous versions are future-returning; they don't block, but also don't return functions. These asynchronous versions are future-returning; they don't block, but also don't return
a result right away. Instead, they return a future that we can chain continuations onto; `then, a result right away. Instead, they return a future that we can chain continuations onto;
onError` and `onCompletion` are all member functions of `Future<T>` that take a callable as argument `then, onError` and `onCompletion` are all member functions of `Future<T>` that take a callable as
and invoke that callable when the chained-to future is ready. Unsurprisingly, continuations chained argument and invoke that callable when the chained-to future is ready. Unsurprisingly, continuations
with `.then` are run when the future is readied successfully with a `T`, and therefore callables chained with `.then` are run when the future is readied successfully with a `T`, and therefore
chained with `.then` should take a `T` as argument. Mirroring this behavior, `.onError` callables chained with `.then` should take a `T` as argument. Mirroring this behavior, `.onError`
continuations are run only when the future is readied with an error, and continuations chained this continuations are run only when the future is readied with an error, and continuations chained this
way take a `Status` as argument which they can inspect to discover the error explaining why a `T` way take a `Status` as argument which they can inspect to discover the error explaining why a `T`
could not be delivered. Continuations chained with `.onCompletion` are run when the future resolves, could not be delivered. Continuations chained with `.onCompletion` are run when the future resolves,
@ -107,18 +107,17 @@ associated Futures exactly one time, and must do so before being destroyed (othe
will be set with the `ErrorCodes::BrokenPromise` error, which is considered a programmer error and will be set with the `ErrorCodes::BrokenPromise` error, which is considered a programmer error and
may crash debug builds of the server in the future). may crash debug builds of the server in the future).
To create a `Promise` that has a Future, you may use the [`PromiseAndFuture<T>`][pf] To create a `Promise` that has a Future, you may use the [`PromiseAndFuture<T>`][pf] utility type.
utility type. Upon construction, it contains a created `Promise<T>` and its Upon construction, it contains a created `Promise<T>` and its corresponding `Future<T>`. The
corresponding `Future<T>`. The perhaps-familiar `makePromiseFuture<T>` factory perhaps-familiar `makePromiseFuture<T>` factory function now simply returns `PromiseAndFuture<T>{}`.
function now simply returns `PromiseAndFuture<T>{}`.
As was previously alluded to, it's As was previously alluded to, it's also possible to make a "ready future" - one that has no
also possible to make a "ready future" - one that has no associated promise and is already filled associated promise and is already filled with a value or error. These might be useful in cases where
with a value or error. These might be useful in cases where the code that produces values in a way the code that produces values in a way that's normally asynchronous happens to have one available
that's normally asynchronous happens to have one available already when a request comes in, and already when a request comes in, and would like to return it right away. To create such a ready
would like to return it right away. To create such a ready future, use `Future<T>::makeReady()`, or future, use `Future<T>::makeReady()`, or the helper function [makeReadyFutureWith(Func&&
the helper function [makeReadyFutureWith(Func&& func)][mrfw] which will call the specified `func` func)][mrfw] which will call the specified `func` and create a ready `Future` from its returned
and create a ready `Future` from its returned value. value.
Lastly, there might be occasions when multiple futures should be fulfilled with the same value, at Lastly, there might be occasions when multiple futures should be fulfilled with the same value, at
the same time. This use case is best served by `SharedPromise` and the associated `SharedSemiFuture` the same time. This use case is best served by `SharedPromise` and the associated `SharedSemiFuture`
@ -144,8 +143,8 @@ calling threads, and return `Future<T>`s to those threads that will be readied o
available. The service may have its own internal threads it uses to produce `T`s, and doesn't want available. The service may have its own internal threads it uses to produce `T`s, and doesn't want
to lend out its internal threads to do the work chained via continuations to the `Future<T>`s it's to lend out its internal threads to do the work chained via continuations to the `Future<T>`s it's
given to calling threads. Instead, it needs to insist that continuations are not chained onto the given to calling threads. Instead, it needs to insist that continuations are not chained onto the
futures it gives out, or that the caller receiving the future futures it gives out, or that the caller receiving the future arranges for some _other_ thread to
arranges for some _other_ thread to run continuations. run continuations.
Fortunately, the service can enforce these guarantees using two types closely related to Fortunately, the service can enforce these guarantees using two types closely related to
`Future<T>`: the types `SemiFuture<T>` and `ExecutorFuture<T>`. `Future<T>`: the types `SemiFuture<T>` and `ExecutorFuture<T>`.
@ -270,33 +269,32 @@ will traverse the remaining continuation chain, and find the continuation chaine
is run. is run.
Note that all of the continuation-chaining functions we've discussed, like `.then()`, return future- Note that all of the continuation-chaining functions we've discussed, like `.then()`, return future-
like types themselves (i.e. `Future<T>`, `SemiFuture<T>`, and the like). When we chain like types themselves (i.e. `Future<T>`, `SemiFuture<T>`, and the like). When we chain continuations
continuations in the manner we've been discussing here, subsequent continuations run when the future in the manner we've been discussing here, subsequent continuations run when the future returned by
returned by the previous continuation is ready, and the future-like type is "unwrapped" such that the previous continuation is ready, and the future-like type is "unwrapped" such that the type
the type wrapped by the future (or, in the case of failure, the error) is passed directly to the wrapped by the future (or, in the case of failure, the error) is passed directly to the subsequent
subsequent continuation. For more detail on this topic, see the block comment above the continuation. For more detail on this topic, see the block comment above the continuation-chaining
continuation-chaining member functions in [future.h][future], starting above the definition for member functions in [future.h][future], starting above the definition for `then()`.
`then()`.
At some point, we may have no more continuations to add to a future chain, and will want to either At some point, we may have no more continuations to add to a future chain, and will want to either
synchronously extract the value or error held in the last future of the chain, or add a callback to synchronously extract the value or error held in the last future of the chain, or add a callback to
asynchronously consume this value. The `.get()` and `.getAsync()` members of future-like types asynchronously consume this value. The `.get()` and `.getAsync()` members of future-like types
provide these facilities for terminating a future chain by extracting or asynchronously provide these facilities for terminating a future chain by extracting or asynchronously consuming
consuming the result of the chain. The `.getAsync()` function works much like `.onCompletion()`, the result of the chain. The `.getAsync()` function works much like `.onCompletion()`, taking a
taking a `Status` or `StatusWith<T>` and running regardless of whether or not the previous link in `Status` or `StatusWith<T>` and running regardless of whether or not the previous link in the chain
the chain resolved with error or success, and running asynchronously when the previous results are resolved with error or success, and running asynchronously when the previous results are ready (to
ready (to determine what thread `.getAsync()` will run on, follow the rules laid out in the previous determine what thread `.getAsync()` will run on, follow the rules laid out in the previous "Where Do
"Where Do Continuations Run?" section.) Conversely, `.get()` takes no arguments, and blocks when it Continuations Run?" section.) Conversely, `.get()` takes no arguments, and blocks when it is called
is called until the entirety of the continuation chain is resolved, with the final result given back until the entirety of the continuation chain is resolved, with the final result given back to the
to the blocking caller. Note that if the final result of the chain was an error that can be blocking caller. Note that if the final result of the chain was an error that can be converted to a
converted to a MongoDB `Status` type (i.e. either a `Status`-family type or `DBException`), it will MongoDB `Status` type (i.e. either a `Status`-family type or `DBException`), it will be re-thrown as
be re-thrown as a `DBException` at the site where `.get()` is called when it is available. If the a `DBException` at the site where `.get()` is called when it is available. If the code calling
code calling `.get()` is not capable of handling an exception, use `.getNoThrow()` instead to `.get()` is not capable of handling an exception, use `.getNoThrow()` instead to extract the same
extract the same error in the form of a `Status`. In the case of `.getAsync()`, all errors are error in the form of a `Status`. In the case of `.getAsync()`, all errors are converted to `Status`,
converted to `Status`, and crucially, callables chained as continuations via `.getAsync()` cannot and crucially, callables chained as continuations via `.getAsync()` cannot throw any exceptions, as
throw any exceptions, as there is no appropriate context with which to handle an asynchronous there is no appropriate context with which to handle an asynchronous exception. If an exception is
exception. If an exception is thrown from a continuation chained via `.getAsync()`, the entire thrown from a continuation chained via `.getAsync()`, the entire process will be terminated (i.e.
process will be terminated (i.e. the program will crash). the program will crash).
## Notes and Links ## Notes and Links

View File

@ -2,31 +2,27 @@
title: FuzzTest title: FuzzTest
--- ---
FuzzTest is a coverage-guided fuzzing framework for C++ that integrates FuzzTest is a coverage-guided fuzzing framework for C++ that integrates directly with GoogleTest.
directly with GoogleTest. FuzzTest lets you write _property-based tests_: you FuzzTest lets you write _property-based tests_: you describe the shape of your inputs using typed
describe the shape of your inputs using typed _domains_, and the framework _domains_, and the framework generates and mutates values that satisfy those constraints. FuzzTest
generates and mutates values that satisfy those constraints. FuzzTest uses Centipede as its fuzzing engine and AUBSAN to surface undefined behavior.
uses Centipede as its fuzzing engine and AUBSAN to surface undefined
behavior.
# When to use FuzzTest # When to use FuzzTest
- Your function under test accepts structured inputs (integers, strings, - Your function under test accepts structured inputs (integers, strings, custom types, BSON objects,
custom types, BSON objects, etc.) rather than an opaque byte blob. etc.) rather than an opaque byte blob.
- You want to express correctness properties beyond "does not crash", such - You want to express correctness properties beyond "does not crash", such as API invariants,
as API invariants, differential equivalence, or roundtrip symmetry. differential equivalence, or roundtrip symmetry.
- You want a fuzz test that also runs cleanly as a unit test in normal CI, - You want a fuzz test that also runs cleanly as a unit test in normal CI, without needing a special
without needing a special fuzzer build variant. fuzzer build variant.
# How to use FuzzTest # How to use FuzzTest
## The property function and FUZZ_TEST macro ## The property function and FUZZ_TEST macro
A FuzzTest consists of a _property function_ and a registration macro. A FuzzTest consists of a _property function_ and a registration macro. The property function is a
The property function is a plain C++ function whose parameters define the plain C++ function whose parameters define the inputs to fuzz. The framework calls it repeatedly
inputs to fuzz. The framework calls it repeatedly with generated values, with generated values, looking for any call that triggers an assertion failure or sanitizer error.
looking for any call that triggers an assertion failure or sanitizer
error.
```cpp ```cpp
#include "fuzztest/fuzztest.h" #include "fuzztest/fuzztest.h"
@ -38,14 +34,16 @@ void MyFunctionFuzzer(const std::string& input) {
FUZZ_TEST(MyTestSuite, MyFunctionFuzzer); FUZZ_TEST(MyTestSuite, MyFunctionFuzzer);
``` ```
When no `.WithDomains()` clause is provided, each parameter defaults to When no `.WithDomains()` clause is provided, each parameter defaults to `fuzztest::Arbitrary<T>()`,
`fuzztest::Arbitrary<T>()`, which covers most standard library types. which covers most standard library types.
## Specifying input domains ## Specifying input domains
Use `.WithDomains()` to constrain the generated inputs: Use `.WithDomains()` to constrain the generated inputs:
> ⚠️ **Warning:** Never initialize input domains with global objects initialized in other compilation units. For more information see [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md) > ⚠️ **Warning:** Never initialize input domains with global objects initialized in other
> compilation units. For more information see
> [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md)
```cpp ```cpp
void ProcessRequestFuzzer(int opcode, const std::string& payload) { void ProcessRequestFuzzer(int opcode, const std::string& payload) {
@ -56,14 +54,18 @@ FUZZ_TEST(MyTestSuite, ProcessRequestFuzzer)
/*payload=*/fuzztest::Arbitrary<std::string>()); /*payload=*/fuzztest::Arbitrary<std::string>());
``` ```
FuzzTest ships with a rich set of built-in domains. A complete list of default types implemented in fuzztest can be found in the [Fuzztest Domain Reference](https://github.com/google/fuzztest/blob/main/doc/domains-reference.md). Also see [BSON Fuzzing](#fuzzing-bson). FuzzTest ships with a rich set of built-in domains. A complete list of default types implemented in
fuzztest can be found in the
[Fuzztest Domain Reference](https://github.com/google/fuzztest/blob/main/doc/domains-reference.md).
Also see [BSON Fuzzing](#fuzzing-bson).
## Providing seeds ## Providing seeds
Seed values give the fuzzer a head start by providing known-interesting Seed values give the fuzzer a head start by providing known-interesting inputs to mutate:
inputs to mutate:
> ⚠️ **Warning:** Never initialize seeds with global objects initialized in other compilation units. For more information see [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md) > ⚠️ **Warning:** Never initialize seeds with global objects initialized in other compilation units.
> For more information see
> [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md)
```cpp ```cpp
FUZZ_TEST(MyTestSuite, ProcessRequestFuzzer) FUZZ_TEST(MyTestSuite, ProcessRequestFuzzer)
@ -82,11 +84,9 @@ FUZZ_TEST(MyTestSuite, ProcessRequestFuzzer)
## Common correctness patterns ## Common correctness patterns
Beyond "does not crash", FuzzTest makes it easy to assert higher-level Beyond "does not crash", FuzzTest makes it easy to assert higher-level properties.
properties.
**Roundtrip**: verify that encode→decode (or serialize→parse) is the **Roundtrip**: verify that encode→decode (or serialize→parse) is the identity:
identity:
```cpp ```cpp
void SerializeRoundtrips(const MyMessage& msg) { void SerializeRoundtrips(const MyMessage& msg) {
@ -97,8 +97,7 @@ void SerializeRoundtrips(const MyMessage& msg) {
FUZZ_TEST(MyTestSuite, SerializeRoundtrips); FUZZ_TEST(MyTestSuite, SerializeRoundtrips);
``` ```
**Differential fuzzing**: compare two implementations of the same **Differential fuzzing**: compare two implementations of the same operation:
operation:
```cpp ```cpp
void ImplementationsAgree(const std::string& input) { void ImplementationsAgree(const std::string& input) {
@ -109,10 +108,11 @@ FUZZ_TEST(MyTestSuite, ImplementationsAgree);
## Using fixtures ## Using fixtures
If your test requires expensive one-time setup (e.g. starting a service), If your test requires expensive one-time setup (e.g. starting a service), use a fixture with
use a fixture with `FUZZ_TEST_F`. Any default-constructible class can be `FUZZ_TEST_F`. Any default-constructible class can be a fixture; the constructor and destructor run
a fixture; the constructor and destructor run once for the whole fuzz test, once for the whole fuzz test, not once per iteration. When using fixtures, care should be taken to
not once per iteration. When using fixtures, care should be taken to ensure that only the initial fixture state is retained. Program state created during a test _**must**_ not affect or be affected by subsequent iterations. ensure that only the initial fixture state is retained. Program state created during a test
_**must**_ not affect or be affected by subsequent iterations.
```cpp ```cpp
class MyServiceFuzzTest { class MyServiceFuzzTest {
@ -132,10 +132,10 @@ FUZZ_TEST_F(MyServiceFuzzTest, RequestFuzzer);
## Fuzzing BSON ## Fuzzing BSON
MongoDB provides a custom FuzzTest domain for generating valid BSON MongoDB provides a custom FuzzTest domain for generating valid BSON objects:
objects: `mongo::bson_mutator::BSONObjImpl`. It is registered as the `mongo::bson_mutator::BSONObjImpl`. It is registered as the `Arbitrary<ConstSharedBuffer>`
`Arbitrary<ConstSharedBuffer>` specialization, so any fuzz test that specialization, so any fuzz test that accepts a `ConstSharedBuffer` will automatically receive
accepts a `ConstSharedBuffer` will automatically receive well-formed BSON. well-formed BSON.
```cpp ```cpp
#include "mongo/bson/bson_mutator/bson_mutator.h" #include "mongo/bson/bson_mutator/bson_mutator.h"
@ -147,8 +147,7 @@ void MyCommandFuzzer(ConstSharedBuffer input) {
FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer); FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer);
``` ```
To constrain which fields are present and their types, use the To constrain which fields are present and their types, use the `.With<Type>()` builders:
`.With<Type>()` builders:
```cpp ```cpp
FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer) FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer)
@ -158,8 +157,8 @@ FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer)
.WithLong("limit", fuzztest::InRange(0LL, 1000LL))); .WithLong("limit", fuzztest::InRange(0LL, 1000LL)));
``` ```
Fields added via `.With<Type>()` are not guaranteed to appear in every Fields added via `.With<Type>()` are not guaranteed to appear in every generated object, which
generated object, which exercises missing-field error handling as well. exercises missing-field error handling as well.
Use `.WithVariant()` when a field may legally hold more than one type: Use `.WithVariant()` when a field may legally hold more than one type:
@ -171,8 +170,7 @@ fuzztest::Arbitrary<mongo::ConstSharedBuffer>()
}); });
``` ```
Use `.WithAny()` when a key should be present but its type is Use `.WithAny()` when a key should be present but its type is unconstrained:
unconstrained:
```cpp ```cpp
fuzztest::Arbitrary<mongo::ConstSharedBuffer>().WithAny("filter"); fuzztest::Arbitrary<mongo::ConstSharedBuffer>().WithAny("filter");
@ -180,8 +178,8 @@ fuzztest::Arbitrary<mongo::ConstSharedBuffer>().WithAny("filter");
## Bazel target ## Bazel target
Use `mongo_cc_fuzztest` (from `//bazel:mongo_src_rules.bzl`) to declare a Use `mongo_cc_fuzztest` (from `//bazel:mongo_src_rules.bzl`) to declare a fuzz test target. It links
fuzz test target. It links in FuzzTest and GoogleTest automatically: in FuzzTest and GoogleTest automatically:
```python ```python
mongo_cc_fuzztest( mongo_cc_fuzztest(
@ -198,8 +196,8 @@ mongo_cc_fuzztest(
## Unit test mode ## Unit test mode
Every `FUZZ_TEST` is also a regular GoogleTest test. In unit test mode, Every `FUZZ_TEST` is also a regular GoogleTest test. In unit test mode, the property function is
the property function is called a small number of times with minimal inputs. This lets fuzz tests run in ordinary CI called a small number of times with minimal inputs. This lets fuzz tests run in ordinary CI
alongside unit tests: alongside unit tests:
``` ```
@ -208,10 +206,9 @@ bazel test --compiler_type=clang --config=fuzztest --fsan --opt=debug --allocato
## Fuzzing mode ## Fuzzing mode
Fuzzing mode enables sanitizer and coverage instrumentation and runs the Fuzzing mode enables sanitizer and coverage instrumentation and runs the test indefinitely (or until
test indefinitely (or until a crash is found). It requires the `fsan` a crash is found). It requires the `fsan` build configuration. Check our Evergreen configuration for
build configuration. Check our Evergreen configuration for the current the current bazel arguments, or run:
bazel arguments, or run:
``` ```
bazel run --compiler_type=clang --config=fuzztest --fsan --opt=debug --allocator=system +my_command_fuzztest -- \ bazel run --compiler_type=clang --config=fuzztest --fsan --opt=debug --allocator=system +my_command_fuzztest -- \
@ -226,7 +223,9 @@ bazel run --compiler_type=clang --config=fuzztest --fsan --opt=debug --allocator
## Evergreen ## Evergreen
Fuzz tests defined in bazel using `mongo_cc_fuzztest` will periodically run on the master branch in evergreen. The compiled tests and their associated corpus are saved to S3 and can be downloaded for debugging issues. The corpus is reused between evergreen runs in order to increase fuzzing coverage. Fuzz tests defined in bazel using `mongo_cc_fuzztest` will periodically run on the master branch in
evergreen. The compiled tests and their associated corpus are saved to S3 and can be downloaded for
debugging issues. The corpus is reused between evergreen runs in order to increase fuzzing coverage.
## Useful flags ## Useful flags

View File

@ -33,24 +33,24 @@ outputs.
code changes. code changes.
- Multiple test variations MAY be bundled into a single test. Recommended when testing same feature - Multiple test variations MAY be bundled into a single test. Recommended when testing same feature
with different inputs. This helps reviewing the outputs by grouping similar tests together, and also with different inputs. This helps reviewing the outputs by grouping similar tests together, and
reduces the number of output files. also reduces the number of output files.
- Changes to test fixture or test code that affect non-trivial amount test outputs MUST BE done in - Changes to test fixture or test code that affect non-trivial amount test outputs MUST BE done in
separate pull request from production code changes: separate pull request from production code changes:
- Pull request for test code only changes can be easily reviewed, even if large number of test - Pull request for test code only changes can be easily reviewed, even if large number of test
outputs are modified. While such changes can still introduce merge conflicts, they don't introduce outputs are modified. While such changes can still introduce merge conflicts, they don't
risk of regression (if outputs were valid introduce risk of regression (if outputs were valid
- Pull requests with mixed production - Pull requests with mixed production
- Tests in the same suite SHOULD share the fixtures when appropriate. This reduces cost of adding - Tests in the same suite SHOULD share the fixtures when appropriate. This reduces cost of adding
new tests to the suite. Changes to the fixture may only affect expected outputs from that fixtures, new tests to the suite. Changes to the fixture may only affect expected outputs from that
and those output can be updated in bulk. fixtures, and those output can be updated in bulk.
- Tests in different suites SHOULD NOT reuse/share fixtures. Changes to the fixture can affect large - Tests in different suites SHOULD NOT reuse/share fixtures. Changes to the fixture can affect large
number of expected outputs. number of expected outputs. There are exceptions to that rule, and tests in different suites MAY
There are exceptions to that rule, and tests in different suites MAY reuse/share fixtures if: reuse/share fixtures if:
- Test fixture is considered stable and changes rarely. - Test fixture is considered stable and changes rarely.
- Tests suites are related, either by sharing tests, or testing similar components. - Tests suites are related, either by sharing tests, or testing similar components.
@ -59,9 +59,8 @@ outputs.
- Tests SHOULD print both inputs and outputs of the tested code. This makes it easy for reviewers to - Tests SHOULD print both inputs and outputs of the tested code. This makes it easy for reviewers to
verify of the expected outputs are indeed correct by having both input and output next to each verify of the expected outputs are indeed correct by having both input and output next to each
other. other. Otherwise finding the input used to produce the new output may not be practical, and might
Otherwise finding the input used to produce the new output may not be practical, and might not even not even be included in the diff.
be included in the diff.
- When resolving merge conflicts on the expected output files, one of the approaches below SHOULD be - When resolving merge conflicts on the expected output files, one of the approaches below SHOULD be
used: used:
@ -71,8 +70,8 @@ outputs.
hanges done by local branch. hanges done by local branch.
- "Accept yours", rerun the tests and verify the new outputs. This approach requires knowledge of - "Accept yours", rerun the tests and verify the new outputs. This approach requires knowledge of
production/test code changes in "theirs" branch. However, if such changes resulted in production/test code changes in "theirs" branch. However, if such changes resulted in
straightforward and repetitive output changes, like due to printing code change or fixture change, straightforward and repetitive output changes, like due to printing code change or fixture
it may be easier to verify than reinspecting local changes. change, it may be easier to verify than reinspecting local changes.
- Expected test outputs SHOULD be reused across tightly-coupled test suites. The suites are - Expected test outputs SHOULD be reused across tightly-coupled test suites. The suites are
tightly-coupled if: tightly-coupled if:
@ -92,8 +91,8 @@ outputs.
- Versioned tests, where expected behavior is the same for majority of test inputs/scenarios. - Versioned tests, where expected behavior is the same for majority of test inputs/scenarios.
- AVOID manually modifying expected output files. Those files are considered to be auto generated. - AVOID manually modifying expected output files. Those files are considered to be auto generated.
Instead, run the tests and then copy the generated output as a new expected output file. See "How to Instead, run the tests and then copy the generated output as a new expected output file. See "How
diff and accept new test outputs" section for instructions. to diff and accept new test outputs" section for instructions.
# How to use write Golden Data tests? # How to use write Golden Data tests?
@ -121,9 +120,10 @@ outputs. Verifies the output with the expected output that is in the source repo
See: [golden_test.h](../src/mongo/unittest/golden_test.h) See: [golden_test.h](../src/mongo/unittest/golden_test.h)
Before running `bazel test`, set up the golden test framework as described in the `Setup` section below. Before running `bazel test`, set up the golden test framework as described in the `Setup` section
This will ensure that the C++ test outputs are written to a location where `buildscripts/golden_test.py` below. This will ensure that the C++ test outputs are written to a location where
can find them so that the `diff` and `accept` functions work as expected. `buildscripts/golden_test.py` can find them so that the `diff` and `accept` functions work as
expected.
**Example:** **Example:**
@ -160,8 +160,7 @@ TEST_F(MySuiteFixture, MyFeatureBTest) {
} }
``` ```
Also see self-test: Also see self-test: [golden_test_test.cpp](../src/mongo/unittest/golden_test_test.cpp)
[golden_test_test.cpp](../src/mongo/unittest/golden_test_test.cpp)
# How to diff and accept new test outputs on a workstation # How to diff and accept new test outputs on a workstation
@ -177,13 +176,15 @@ buildscripts/golden_test.py requires a one-time workstation setup.
Note: this setup is only required to use buildscripts/golden_test.py itself. It is NOT required to Note: this setup is only required to use buildscripts/golden_test.py itself. It is NOT required to
just run the Golden Data tests when not using buildscripts/golden_test.py. just run the Golden Data tests when not using buildscripts/golden_test.py.
1. Create a yaml config file, as described by [Appendix - Config file reference](#appendix---config-file-reference). 1. Create a yaml config file, as described by
[Appendix - Config file reference](#appendix---config-file-reference).
2. Set GOLDEN_TEST_CONFIG_PATH environment variable to config file location, so that is available 2. Set GOLDEN_TEST_CONFIG_PATH environment variable to config file location, so that is available
when running tests and when running buildscripts/golden_test.py tool. when running tests and when running buildscripts/golden_test.py tool.
### Automatic Setup ### Automatic Setup
Use buildscripts/golden_test.py builtin setup to initialize default config for your current platform. Use buildscripts/golden_test.py builtin setup to initialize default config for your current
platform.
**Instructions for Linux** **Instructions for Linux**
@ -195,8 +196,8 @@ buildscripts/golden_test.py setup
**Instructions for Windows** **Instructions for Windows**
Run buildscripts/golden_test.py setup utility. Run buildscripts/golden_test.py setup utility. You may be asked for a password, when not running in
You may be asked for a password, when not running in "Run as administrator" shell. "Run as administrator" shell.
```cmd ```cmd
c:\python\python310\python.exe buildscripts/golden_test.py setup c:\python\python310\python.exe buildscripts/golden_test.py setup
@ -295,7 +296,8 @@ $> buildscripts/golden_test.py --help
### Update multiple expected files at once ### Update multiple expected files at once
Some tests will run in multiple passthroughs or build variants, so they have multiple expected files. Some tests will run in multiple passthroughs or build variants, so they have multiple expected
files.
Whenever the test is updated, all the expected files should be updated together as well. Whenever the test is updated, all the expected files should be updated together as well.
@ -306,8 +308,8 @@ buildscripts/golden_test.py --verbose clean-run-accept jstests/query_golden/NAME
This option uses `resmoke.py find-suites` to determine the passthrough suites a test belongs to and This option uses `resmoke.py find-suites` to determine the passthrough suites a test belongs to and
runs them. runs them.
If the test is found to only belong to the `query_golden_classic` passthrough, it is assumed that If the test is found to only belong to the `query_golden_classic` passthrough, it is assumed that it
it can have multiple expected results due to being run under multiple build variants with a different can have multiple expected results due to being run under multiple build variants with a different
`internalQueryFrameworkControl` settings. So the test will be run with various values for `internalQueryFrameworkControl` settings. So the test will be run with various values for
`internalQueryFrameworkControl`. `internalQueryFrameworkControl`.
@ -348,22 +350,21 @@ outputRootPattern:
type: String type: String
optional: true optional: true
description: description:
Root path patten that will be used to write expected and actual test outputs for all tests Root path patten that will be used to write expected and actual test outputs for all tests in
in the test run. the test run. If not specified a temporary folder location will be used. Path pattern string may
If not specified a temporary folder location will be used. use '%' characters in the last part of the path. '%' characters in the last part of the path
Path pattern string may use '%' characters in the last part of the path. '%' characters in will be replaced with random lowercase hexadecimal digits.
the last part of the path will be replaced with random lowercase hexadecimal digits. examples: /var/tmp/test_output/out-%%%%-%%%%-%%%%-%%%% /var/tmp/test_output
examples: /var/tmp/test_output/out-%%%%-%%%%-%%%%-%%%%
/var/tmp/test_output
diffCmd: diffCmd:
type: String type: String
optional: true optional: true
description: Shell command to diff a single golden test run output. description:
{{expected}} and {{actual}} variables should be used and will be replaced with expected and Shell command to diff a single golden test run output. {{expected}} and {{actual}} variables
actual output folder paths respectively. should be used and will be replaced with expected and actual output folder paths respectively.
This property is not used to decide whether the test passes or fails; it is only used to This property is not used to decide whether the test passes or fails; it is only used to display
display differences once we've decided that a test failed. differences once we've decided that a test failed.
examples: git diff --no-index "{{expected}}" "{{actual}}" examples:
diff -ruN --unidirectional-new-file --color=always "{{expected}}" "{{actual}}" git diff --no-index "{{expected}}" "{{actual}}" diff -ruN --unidirectional-new-file
--color=always "{{expected}}" "{{actual}}"
``` ```

View File

@ -142,8 +142,8 @@ mongo_idl_library(
``` ```
Bazel knows how to invoke the IDL compiler and generate files in the build directory with the C++ Bazel knows how to invoke the IDL compiler and generate files in the build directory with the C++
code. This code can also be generated by `--build_tag_filters=gen_source` tag in bazel which is useful for code. This code can also be generated by `--build_tag_filters=gen_source` tag in bazel which is
code navigation. useful for code navigation.
The generated IDL code looks something like the simplified code below. The generated IDL code looks something like the simplified code below.
@ -206,17 +206,17 @@ fields on the `commands` object.
The special features/requirements of commands: The special features/requirements of commands:
1. First element must match the name of the command, and the parsing rules of this element 1. First element must match the name of the command, and the parsing rules of this element can be
can be customized via the `namespace` field. customized via the `namespace` field.
2. In `OP_MSG`, `$db` must be present or defaults to `admin` 2. In `OP_MSG`, `$db` must be present or defaults to `admin`
3. Commands may have a `struct` as a reply 3. Commands may have a `struct` as a reply
4. Commands may be a part of API Version 1 4. Commands may be a part of API Version 1
5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files 5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files will
will automatically be chained to all commands. The IDL compiler imports automatically be chained to all commands. The IDL compiler imports
[`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct [`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct defined
defined in that file will be chained to all commands by default. in that file will be chained to all commands by default.
6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc 6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc during
during parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl). parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl).
Example Command: Example Command:
@ -388,7 +388,8 @@ void idlDeserialize(StringEnumEnum& en, ::mongo::StringData value, const IDLPars
constexpr ::mongo::StringData idlGetDefaultParserFieldName(StringEnumEnum) { return "StringEnumEnum"; } constexpr ::mongo::StringData idlGetDefaultParserFieldName(StringEnumEnum) { return "StringEnumEnum"; }
``` ```
These ADL hooks are not intended to be used directly by user code. See [Serialization/Deserialization API](#serializationdeserialization-api). These ADL hooks are not intended to be used directly by user code. See
[Serialization/Deserialization API](#serializationdeserialization-api).
### Integer Enums ### Integer Enums
@ -420,7 +421,8 @@ std::int32_t idlSerialize(IntEnum value);
constexpr ::mongo::StringData idlGetDefaultParserFieldName(IntEnum) { return "IntEnum"; } constexpr ::mongo::StringData idlGetDefaultParserFieldName(IntEnum) { return "IntEnum"; }
``` ```
These ADL hooks are not intended to be used directly by user code. See [Serialization/Deserialization API](#serializationdeserialization-api). These ADL hooks are not intended to be used directly by user code. See
[Serialization/Deserialization API](#serializationdeserialization-api).
### Serialization/Deserialization API ### Serialization/Deserialization API
@ -432,9 +434,9 @@ The public API to serialize and deserialize IDL-generated enums is defined in
auto parsedEnum = idl::deserialize<IdlEnum>(value); auto parsedEnum = idl::deserialize<IdlEnum>(value);
``` ```
The definitions of `idl::serialize()` and `idl::deserialize()` rely on the autogenerated ADL hooks to The definitions of `idl::serialize()` and `idl::deserialize()` rely on the autogenerated ADL hooks
find the serializer/deserializer implementations for each enum. User code should use this public API to find the serializer/deserializer implementations for each enum. User code should use this public
and not the ADL hooks directly. API and not the ADL hooks directly.
### Reference ### Reference
@ -482,8 +484,8 @@ types allow users to customize IDL parsing for their own unique needs.
A field in a struct or command can be defined as a type but a field can also be an array, enum, A field in a struct or command can be defined as a type but a field can also be an array, enum,
struct or variant. Declaring a field as something other then a type preferred to using types since struct or variant. Declaring a field as something other then a type preferred to using types since
it allows more type information to be represented in IDL over C++. See `type` in the [field it allows more type information to be represented in IDL over C++. See `type` in the
reference](#struct-fields-attribute-reference) for more information. [field reference](#struct-fields-attribute-reference) for more information.
Type supports builtin BSON types like int32, int64, and string. These are types built into Type supports builtin BSON types like int32, int64, and string. These are types built into
`BSONElement`/`BSONObjBuilder`. It also supports custom types to give the code full control of `BSONElement`/`BSONObjBuilder`. It also supports custom types to give the code full control of
@ -529,11 +531,11 @@ The five key things to note in this example:
`BSONElement` as a parameter. The IDL generator has custom rules for `BSONElement`. `BSONElement` as a parameter. The IDL generator has custom rules for `BSONElement`.
- `serializer` - omitted in this example because `BSONObjBuilder` has builtin support for - `serializer` - omitted in this example because `BSONObjBuilder` has builtin support for
`std::string` `std::string`
- `is_view` - indicates whether the type is a view or not. If the type is a view, then it's - `is_view` - indicates whether the type is a view or not. If the type is a view, then it's possible
possible that objects of the type will not own all of its members. If the type is not a view, that objects of the type will not own all of its members. If the type is not a view, then objects
then objects of the type are guaranteed to own all of its members. This field is optional and of the type are guaranteed to own all of its members. This field is optional and defaults to True.
defaults to True. To reduce the size of the C++ representation of structs including this type, To reduce the size of the C++ representation of structs including this type, you can specify this
you can specify this field as False if the type is not a view type. field as False if the type is not a view type.
### Custom Types ### Custom Types
@ -590,22 +592,29 @@ IDLAnyType:
- `std::vector<_>` - When using `std::vector<->`, the getters/setters using - `std::vector<_>` - When using `std::vector<->`, the getters/setters using
`mongo::ConstDataRange` instead `mongo::ConstDataRange` instead
- `deserializer` - string - a method name to all deserialize the type. Typically this is a function - `deserializer` - string - a method name to all deserialize the type. Typically this is a function
that takes `BSONElement` as a parameter. The IDL generator has custom rules for `BSONElement`. - By default, IDL assumes it is a instance methods of `cpp_type`. - If prefixed with `::`, assumes the function is a global static function - By default, the deserializer's function signature is `<function_name>(<cpp_type>)`. - For `object` types, the deserializer's function signature is `<function_name>(const BSONObj& that takes `BSONElement` as a parameter. The IDL generator has custom rules for `BSONElement`. -
obj)` - For `any` types, the deserializer's function signature is `<function_name>(BSONElement By default, IDL assumes it is a instance methods of `cpp_type`. - If prefixed with `::`, assumes
element)`. the function is a global static function - By default, the deserializer's function signature is
- `serializer` - string -a method name to all serialize the type. - By default, IDL assumes it is a instance methods of `cpp_type`. - If prefixed with `::`, assumes the function is a global static function - By default, the deserializer's function signature is `<type_append> <function_name>(const `<function_name>(<cpp_type>)`. - For `object` types, the deserializer's function signature is
<cpp_type>&)` where `type_append` is a type `BSONObjBuilder` understands. - For `object` types, the deserializer's function signature is `<function_name>(const BSONObj& `<function_name>(const BSONObj& obj)` - For `any` types, the deserializer's function signature is
obj)` - For `any` types that are not in an array, the serializer's function signature is `<function_name>(BSONElement element)`.
`<function_name>(StringData fieldName, BSONObjBuilder* builder)`. - For `any` types that are in an array, the serializer's function signature is - `serializer` - string -a method name to all serialize the type. - By default, IDL assumes it is a
instance methods of `cpp_type`. - If prefixed with `::`, assumes the function is a global static
function - By default, the deserializer's function signature is
`<type_append> <function_name>(const <cpp_type>&)` where `type_append` is a type `BSONObjBuilder`
understands. - For `object` types, the deserializer's function signature is
`<function_name>(const BSONObj& obj)` - For `any` types that are not in an array, the serializer's
function signature is `<function_name>(StringData fieldName, BSONObjBuilder* builder)`. - For
`any` types that are in an array, the serializer's function signature is
`<function_name>(BSONArrayBuilder* builder)`. `<function_name>(BSONArrayBuilder* builder)`.
- `deserialize_with_tenant` - bool - if set, adds `TenantId` as the first parameter to - `deserialize_with_tenant` - bool - if set, adds `TenantId` as the first parameter to
`deserializer` `deserializer`
- `internal_only` - bool - undocumented, DO NOT USE - `internal_only` - bool - undocumented, DO NOT USE
- `default` - string - default value for a type. A field in a struct inherits this value if a field - `default` - string - default value for a type. A field in a struct inherits this value if a field
does not set a default. See struct's `default` rules for more information. does not set a default. See struct's `default` rules for more information.
- `is_view` - indicates whether the type is a view or not. If the type is a view, then it's - `is_view` - indicates whether the type is a view or not. If the type is a view, then it's possible
possible that objects of the type will not own all of its members. If the type is not a view, that objects of the type will not own all of its members. If the type is not a view, then objects
then objects of the type are guaranteed to own all of its members. of the type are guaranteed to own all of its members.
## Structs ## Structs
@ -638,9 +647,8 @@ exampleStruct:
optional: true optional: true
defaultedField: defaultedField:
description: >- description: >-
Most callers should rely on 42 Most callers should rely on 42 as it is the answer to the question of life the universe and
as it is the answer to the question everything.
of life the universe and everything.
type: long type: long
validator: validator:
gt: 0 gt: 0
@ -762,8 +770,8 @@ multi level chained structs.
- `is_command_reply` - bool - if true, marks the struct as a command reply. A struct marked a - `is_command_reply` - bool - if true, marks the struct as a command reply. A struct marked a
`is_command_reply` generates a parser that ignores known generic or common fields across all `is_command_reply` generates a parser that ignores known generic or common fields across all
replies when parsing replies (i.e. `ok`, `errmsg`, etc) replies when parsing replies (i.e. `ok`, `errmsg`, etc)
- `is_generic_cmd_list` - string - choice [`arg`, `reply`], if set, generates functions `bool - `is_generic_cmd_list` - string - choice [`arg`, `reply`], if set, generates functions
hasField(StringData)` and `bool shouldForwardToShards(StringData)` for each field in the `bool hasField(StringData)` and `bool shouldForwardToShards(StringData)` for each field in the
struct. If set to `arg`, the struct will automatically be chained to every `command`. struct. If set to `arg`, the struct will automatically be chained to every `command`.
- `query_shape_component` - bool - true indicates this special serialization code will be generated - `query_shape_component` - bool - true indicates this special serialization code will be generated
to serialize as a query shape to serialize as a query shape
@ -784,10 +792,10 @@ hasField(StringData)` and `bool shouldForwardToShards(StringData)` for each fiel
have a variant of strings and structs. have a variant of strings and structs.
- Variant string support differentiates the type to choose based on the BSON type. - Variant string support differentiates the type to choose based on the BSON type.
- Variant struct support differentiates the type to choose based on the _first_ field of the - Variant struct support differentiates the type to choose based on the _first_ field of the
struct. The first field must be unique in each struct across the structs. When parsing a struct. The first field must be unique in each struct across the structs. When parsing a BSON
BSON object as a variant of multiple structs, the parser assumes that the first field object as a variant of multiple structs, the parser assumes that the first field declared in
declared in the IDL struct is always the first field in its BSON representation. the IDL struct is always the first field in its BSON representation. See `bulkWrite` for an
See `bulkWrite` for an example. example.
- `ignore` - bool - true means field generates no code but is ignored by the generated deserializer. - `ignore` - bool - true means field generates no code but is ignored by the generated deserializer.
Used to deprecate fields that no longer have an affect but allow strict parsers to ignore them. Used to deprecate fields that no longer have an affect but allow strict parsers to ignore them.
- `optional` - bool - true means the field is optional. Generated C++ type is - `optional` - bool - true means the field is optional. Generated C++ type is
@ -819,8 +827,9 @@ Comparisons are generated with C++ operators for these comparisons
- `lt` - string - Validates field is less than or equal to `string` - `lt` - string - Validates field is less than or equal to `string`
- `gte` - string - Validates field is greater than `string` - `gte` - string - Validates field is greater than `string`
- `lte` - string - Validates field is less than or equal to `string` - `lte` - string - Validates field is less than or equal to `string`
- `callback` - string - A static function to call of the shape `Status <function_name>(const - `callback` - string - A static function to call of the shape
<cpp_type> value)`. For non-simple types, `value` is passed by const-reference. `Status <function_name>(const <cpp_type> value)`. For non-simple types, `value` is passed by
const-reference.
## Commands ## Commands
@ -830,24 +839,24 @@ the `command` object when compared to `struct`.
The special features: The special features:
1. First element must match the name of the command, and the parsing rules of this element 1. First element must match the name of the command, and the parsing rules of this element can be
can be customized via the `namespace` field. customized via the `namespace` field.
2. In `OP_MSG`, `$db` must be present or defaults to `admin` 2. In `OP_MSG`, `$db` must be present or defaults to `admin`
3. Commands may have a `struct` as a reply 3. Commands may have a `struct` as a reply
4. Commands may be a part of API Version 1 4. Commands may be a part of API Version 1
5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files 5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files will
will automatically be chained to all commands. The IDL compiler imports automatically be chained to all commands. The IDL compiler imports
[`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct [`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct defined
defined in that file will be chained to all commands by default. in that file will be chained to all commands by default.
6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc 6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc during
during parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl). parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl).
The `namespace` field is the field that describes one kind of parameter a command takes. The `namespace` field is the field that describes one kind of parameter a command takes.
1. `concatenate_with_db` - takes a collection name. Generates a method `const NamespaceString 1. `concatenate_with_db` - takes a collection name. Generates a method
getNamespace()`. Examples: `insert`, `update`, `delete` `const NamespaceString getNamespace()`. Examples: `insert`, `update`, `delete`
2. `concatenate_with_db_or_uuid` - takes a collection name. Generates a method `const 2. `concatenate_with_db_or_uuid` - takes a collection name. Generates a method
NamespaceStringOrUUID& getNamespaceOrUUID()`. Examples: `find`, `count` `const NamespaceStringOrUUID& getNamespaceOrUUID()`. Examples: `find`, `count`
3. `ignored` - ignores the first argument entirely. Examples: `hello`, `setParameter`, `ping` 3. `ignored` - ignores the first argument entirely. Examples: `hello`, `setParameter`, `ping`
4. `type` - takes a struct as the first argument. Examples: `getLog`, `clearLog`, `renameCollection` 4. `type` - takes a struct as the first argument. Examples: `getLog`, `clearLog`, `renameCollection`
@ -866,15 +875,16 @@ Commands can also specify their replies that they return. Replies are regular `s
- `immutable` - [see structs](#struct-reference) - `immutable` - [see structs](#struct-reference)
- `non_const_getter` - [see structs](#struct-reference) - `non_const_getter` - [see structs](#struct-reference)
- `namespace` - string - choice of a string [`concatenate_with_db`, `concatenate_with_db_or_uuid`, - `namespace` - string - choice of a string [`concatenate_with_db`, `concatenate_with_db_or_uuid`,
`ignored`, `type`]. Instructs how the value of command field should be parsed - `concatenate_with_db` - Indicates the command field is a string and should be treated as a `ignored`, `type`]. Instructs how the value of command field should be parsed -
collection name. Typically used by commands that deal with collections. Automatically `concatenate_with_db` - Indicates the command field is a string and should be treated as a
concatenated with `$db` by the IDL parser. Adds a method `const NamespaceString getNamespace()` collection name. Typically used by commands that deal with collections. Automatically concatenated
to the generated class. - `concatenate_with_db_or_uuid` - Indicates the command field is a string or uuid, and should be with `$db` by the IDL parser. Adds a method `const NamespaceString getNamespace()` to the
treated as a collection name. Typically used by commands that deal with collections. generated class. - `concatenate_with_db_or_uuid` - Indicates the command field is a string or
Automatically concatenated with `$db` by the IDL parser. Adds a method `const uuid, and should be treated as a collection name. Typically used by commands that deal with
NamespaceStringOrUUID& getNamespaceOrUUID()` to the generated class. - `ignored` - Ignores the value of the command field. Used by commands that ignore their command collections. Automatically concatenated with `$db` by the IDL parser. Adds a method
argument entirely - `type` - Indicates the command takes a custom type for the first field. `type` field must be `const NamespaceStringOrUUID& getNamespaceOrUUID()` to the generated class. - `ignored` - Ignores
set. the value of the command field. Used by commands that ignore their command argument entirely -
`type` - Indicates the command takes a custom type for the first field. `type` field must be set.
- `type` - string - name of IDL type or struct to parse the command field as - `type` - string - name of IDL type or struct to parse the command field as
- `command_name` - string - IDL generated parser expects the command to be named the name of YAML - `command_name` - string - IDL generated parser expects the command to be named the name of YAML
map. This can be overwritten with `command_name`. Commands should be `camelCase` map. This can be overwritten with `command_name`. Commands should be `camelCase`
@ -893,8 +903,8 @@ NamespaceStringOrUUID& getNamespaceOrUUID()` to the generated class. - `ignored`
### Access Check Reference ### Access Check Reference
A list of privileges the command checks. Only applicable for commands that are a part of A list of privileges the command checks. Only applicable for commands that are a part of API
API Version 1. Checked at runtime when test commands are enabled. Version 1. Checked at runtime when test commands are enabled.
- `none` - bool - No privileges required - `none` - bool - No privileges required
- `simple` - mapping - single [check or privilege](#check-or-privilege) - `simple` - mapping - single [check or privilege](#check-or-privilege)
@ -1002,28 +1012,29 @@ unit tests exercise all features and combinations IDL can handle.
#### BSONObj Anchor #### BSONObj Anchor
The parsing method a struct is initialized with indicates what type of ownership the constructed The parsing method a struct is initialized with indicates what type of ownership the constructed
object has on the `BSONObj` parameter. An internal `BSONObj` anchor ensures that the lifetime of object has on the `BSONObj` parameter. An internal `BSONObj` anchor ensures that the lifetime of the
the `BSONObj` matches the lifetime of the object in the cases that the `BSONObj` parameter is `BSONObj` matches the lifetime of the object in the cases that the `BSONObj` parameter is owned or
owned or shared. shared.
#### View Types #### View Types
If the struct is a view, then it's possible that objects of the type will not own all of its If the struct is a view, then it's possible that objects of the type will not own all of its
members. If the struct is not a view, then objects of the type are guaranteed to own all of its members. If the struct is not a view, then objects of the type are guaranteed to own all of its
members. This is determined by recursively checking the fields of a struct. This info is used members. This is determined by recursively checking the fields of a struct. This info is used during
during generation to determine whether or not a struct will need a `BSONObj` anchor. generation to determine whether or not a struct will need a `BSONObj` anchor.
## Best Practices ## Best Practices
IDL has been in use since 2017. In that time, here are a few best practices: IDL has been in use since 2017. In that time, here are a few best practices:
1. strict or non-strict parsers - Structs that are persisted to disk should set `strict: false`. 1. strict or non-strict parsers - Structs that are persisted to disk should set `strict: false`.
It's better for upgrade/downgrade. Commands should set `strict: true` or omit it as `strict: It's better for upgrade/downgrade. Commands should set `strict: true` or omit it as
true` is the default. 1. For persistance: For upgrade/downgrade, if a persisted document with a strict parser has a `strict: true` is the default. 1. For persistance: For upgrade/downgrade, if a persisted document
field added in new version N+1 and then the user downgrades to old version N, the strict with a strict parser has a field added in new version N+1 and then the user downgrades to old
parser will throw an exception and reject the document. If this document was part of the version N, the strict parser will throw an exception and reject the document. If this document
storage catalog for instance, the server would fail to start. 2. For commands: By using strict parsers, it gives the server the ability to add fields without was part of the storage catalog for instance, the server would fail to start. 2. For commands: By
the risk of clients accidentally sending fields with the same name that had been ignored. using strict parsers, it gives the server the ability to add fields without the risk of clients
accidentally sending fields with the same name that had been ignored.
2. Extending existing structs/commands - all new fields in a struct/command must be marked optional 2. Extending existing structs/commands - all new fields in a struct/command must be marked optional
to support backwards compatibility. For new structs/commands, there should be some required to support backwards compatibility. For new structs/commands, there should be some required
fields. It does not matter if the struct is not persisted, non-optional fields break backwards fields. It does not matter if the struct is not persisted, non-optional fields break backwards

View File

@ -2,28 +2,26 @@
title: LibFuzzer title: LibFuzzer
--- ---
> **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See [FuzzTest](fuzztest.md) for new fuzzing implementations > **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See
> [FuzzTest](fuzztest.md) for new fuzzing implementations
LibFuzzer is a tool for performing coverage guided fuzzing of C/C++ LibFuzzer is a tool for performing coverage guided fuzzing of C/C++ code. LibFuzzer will try to
code. LibFuzzer will try to trigger AUBSAN failures in a function you trigger AUBSAN failures in a function you provide, by repeatedly calling it with a carefully crafted
provide, by repeatedly calling it with a carefully crafted byte array as byte array as input. Each input will be assigned a "score". Byte arrays which exercise new or more
input. Each input will be assigned a "score". Byte arrays which exercise regions of code will score better. LibFuzzer will merge and mutate high scoring inputs in order to
new or more regions of code will score better. LibFuzzer will merge and gradually cover more and more possible behavior.
mutate high scoring inputs in order to gradually cover more and more
possible behavior.
# When to use LibFuzzer # When to use LibFuzzer
> **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See [FuzzTest](fuzztest.md) for new fuzzing implementations > **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See
> [FuzzTest](fuzztest.md) for new fuzzing implementations
LibFuzzer is great for testing functions which accept a opaque blob of LibFuzzer is great for testing functions which accept a opaque blob of untrusted user-provided data.
untrusted user-provided data.
# How to use LibFuzzer # How to use LibFuzzer
LibFuzzer implements `int main`, and expects to be linked with an object LibFuzzer implements `int main`, and expects to be linked with an object file which provides the
file which provides the function under test. You will achieve this by function under test. You will achieve this by writing a cpp file which implements
writing a cpp file which implements
```cpp ```cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) { extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
@ -31,26 +29,22 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
} }
``` ```
`LLVMFuzzerTestOneInput` will be called repeatedly, with fuzzer `LLVMFuzzerTestOneInput` will be called repeatedly, with fuzzer generated bytes in `Data`. `Size`
generated bytes in `Data`. `Size` will always truthfully tell your will always truthfully tell your implementation how many bytes are in `Data`. If your function
implementation how many bytes are in `Data`. If your function crashes or crashes or induces an AUBSAN fault, LibFuzzer will consider that to be a finding worth reporting.
induces an AUBSAN fault, LibFuzzer will consider that to be a finding
worth reporting.
Keep in mind that your function will often "just" be adapting `Data` to Keep in mind that your function will often "just" be adapting `Data` to whatever format our internal
whatever format our internal C++ functions requires. However, you have a C++ functions requires. However, you have a lot of freedom in exactly what you choose to do. Just
lot of freedom in exactly what you choose to do. Just make sure your make sure your function crashes or produces an invariant when something interesting happens! As just
function crashes or produces an invariant when something interesting a few ideas:
happens! As just a few ideas:
- You might choose to call multiple implementations of a single - You might choose to call multiple implementations of a single operation, and validate that they
operation, and validate that they produce the same output when produce the same output when presented the same input.
presented the same input. - You could tease out individual bytes from `Data` and provide them as different arguments to the
- You could tease out individual bytes from `Data` and provide them as function under test.
different arguments to the function under test.
Finally, your cpp file will need a bazel target. There is a method which Finally, your cpp file will need a bazel target. There is a method which defines fuzzer targets,
defines fuzzer targets, much like how we define unittests. For example: much like how we define unittests. For example:
```python ```python
mongo_cc_fuzzer_test( mongo_cc_fuzzer_test(
@ -70,25 +64,21 @@ defines fuzzer targets, much like how we define unittests. For example:
# Running LibFuzzer # Running LibFuzzer
Your test's object file and **all** of its dependencies must be compiled Your test's object file and **all** of its dependencies must be compiled with the "fuzzer"
with the "fuzzer" sanitizer, plus a set of sanitizers which might sanitizer, plus a set of sanitizers which might produce interesting runtime errors like AUBSAN.
produce interesting runtime errors like AUBSAN. Evergreen has a build Evergreen has a build variant, whose name will include the string "FUZZER", which will compile and
variant, whose name will include the string "FUZZER", which will compile run all of the fuzzer tests.
and run all of the fuzzer tests.
The fuzzers can be built locally, for development and debugging. Check The fuzzers can be built locally, for development and debugging. Check our Evergreen configuration
our Evergreen configuration for the current bazel arguments. for the current bazel arguments.
LibFuzzer binaries will accept a path to a directory containing its LibFuzzer binaries will accept a path to a directory containing its "corpus". A corpus is a list of
"corpus". A corpus is a list of examples known to produce interesting examples known to produce interesting outputs. LibFuzzer will start producing interesting results
outputs. LibFuzzer will start producing interesting results more quickly more quickly if starts off with a set of inputs which it can begin mutating. When its done, it will
if starts off with a set of inputs which it can begin mutating. When its write down any new inputs it discovered into its corpus. Re-using a corpus across executions is a
done, it will write down any new inputs it discovered into its corpus. good way to make LibFuzzer return more results in less time. Our Evergreen tasks will try to acquire
Re-using a corpus across executions is a good way to make LibFuzzer and re-use a corpus from an earlier commit, if it can.
return more results in less time. Our Evergreen tasks will try to
acquire and re-use a corpus from an earlier commit, if it can.
# References # References
- [LibFuzzer's official - [LibFuzzer's official documentation](https://llvm.org/docs/LibFuzzer.html)
documentation](https://llvm.org/docs/LibFuzzer.html)

View File

@ -60,9 +60,8 @@ Ex: `bash buildscripts/yamllinters.sh`
## Python Linters ## Python Linters
The `bazel run lint` command runs all Python linters as well as several other linters in our code base. You can The `bazel run lint` command runs all Python linters as well as several other linters in our code
run auto-remediations via: base. You can run auto-remediations via: `bazel run lint --fix`.
`bazel run lint --fix`.
Ex: `bazel run lint` Ex: `bazel run lint`

View File

@ -1,18 +1,18 @@
# Proxy protocol support # Proxy protocol support
`mongod` and `mongos` have built-in support for connections made via L4 load balancers using `mongod` and `mongos` have built-in support for connections made via L4 load balancers using the
the [proxy protocol][proxy-protocol-url] header. Placing `mongos` or `mongod` behind load balancers [proxy protocol][proxy-protocol-url] header. Placing `mongos` or `mongod` behind load balancers
requires proper configuration of the load balancers, `mongos`, and `mongod`. requires proper configuration of the load balancers, `mongos`, and `mongod`.
# Configuring mongod # Configuring mongod
To use `mongod` with a L4 load balancer (or reverse proxy) it _must_ be configured with the To use `mongod` with a L4 load balancer (or reverse proxy) it _must_ be configured with the
`proxyPort` config option whose value can be specified at program start in any of the ways `proxyPort` config option whose value can be specified at program start in any of the ways mentioned
mentioned in the server config documentation. This config option opens a new port to which the in the server config documentation. This config option opens a new port to which the L4 load
L4 load balancer _must_ connect. balancer _must_ connect.
The L4 load balancer (or reverse proxy) _must_ emit a [proxy protocol][proxy-protocol-url] header The L4 load balancer (or reverse proxy) _must_ emit a [proxy protocol][proxy-protocol-url] header at
at the start of its connection stream. `mongod` supports both version 1 and version 2 of the proxy the start of its connection stream. `mongod` supports both version 1 and version 2 of the proxy
standard. standard.
# Reverse proxy vs load balancer # Reverse proxy vs load balancer
@ -20,8 +20,8 @@ standard.
Sharded clusters might be configured to work with either a L4 load balancer or a reverse proxy. In Sharded clusters might be configured to work with either a L4 load balancer or a reverse proxy. In
both cases the proxy or load balancer _must_ connect to the `mongos`'s load-balancer port. both cases the proxy or load balancer _must_ connect to the `mongos`'s load-balancer port.
Placing `mongos` behind a reverse proxy does not hide the list of `mongos`. The driver will choose Placing `mongos` behind a reverse proxy does not hide the list of `mongos`. The driver will choose a
a specific `mongos` to connect to via the reverse proxy. specific `mongos` to connect to via the reverse proxy.
Placing `mongos` behind an L4 load balancer hides the list of `mongos`. The driver only sees the Placing `mongos` behind an L4 load balancer hides the list of `mongos`. The driver only sees the
load balancer and, the connections it makes are routed by the load balancer to a `mongos`. There is load balancer and, the connections it makes are routed by the load balancer to a `mongos`. There is
@ -33,11 +33,18 @@ that connections from a driver are distributed among multiple `mongos`.
When a sharded cluster is deployed with a reverse proxy, there are two conditions that must be When a sharded cluster is deployed with a reverse proxy, there are two conditions that must be
fulfilled : fulfilled :
- `mongos` must be configured with the [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/) `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in the server parameter documentation. - `mongos` must be configured with the
This option causes `mongos` to open a second port. All connections made from reverse proxy _must_ be made over this port, and no regular connections (without HAProxy protocol header) may be made over this port. [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/)
- The reverse proxy _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in
at the [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85). `mongos` [supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both version 1 and version 2 of the proxy the server parameter documentation. This option causes `mongos` to open a second port. All
protocol standard. connections made from reverse proxy _must_ be made over this port, and no regular connections
(without HAProxy protocol header) may be made over this port.
- The reverse proxy _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header at
the
[start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85).
`mongos`
[supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both
version 1 and version 2 of the proxy protocol standard.
The driver does not require any configuration change compared to a cluster without a reverse proxy. The driver does not require any configuration change compared to a cluster without a reverse proxy.
@ -46,22 +53,32 @@ The driver does not require any configuration change compared to a cluster witho
When a sharded cluster is deployed with an L4 load balancer there are three conditions that must be When a sharded cluster is deployed with an L4 load balancer there are three conditions that must be
fulfilled : fulfilled :
- `mongos` must be configured with the [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/) `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in the server parameter documentation. - `mongos` must be configured with the
This option causes `mongos` to open a second port. All connections made from load [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/)
balancers _must_ be made over this port, and no regular connections (without HAProxy protocol header) may be made over this port. `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in
- The L4 load balancer _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header the server parameter documentation. This option causes `mongos` to open a second port. All
at the [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85). `mongos` [supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both version 1 and version 2 of the proxy connections made from load balancers _must_ be made over this port, and no regular connections
protocol standard. (without HAProxy protocol header) may be made over this port.
- Clients (drivers or shells) connecting to a `mongos` through the load balancer must set the `loadBalanced` option, - The L4 load balancer _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header at
e.g., when connecting to a local `mongos` instance through the load balancer, if the `loadBalancerPort` server parameter was set to 20100, the the
connection string must be of the form `"mongodb://localhost:20100/?loadBalanced=true"`. [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85).
`mongos`
[supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both
version 1 and version 2 of the proxy protocol standard.
- Clients (drivers or shells) connecting to a `mongos` through the load balancer must set the
`loadBalanced` option, e.g., when connecting to a local `mongos` instance through the load
balancer, if the `loadBalancerPort` server parameter was set to 20100, the connection string must
be of the form `"mongodb://localhost:20100/?loadBalanced=true"`.
There are some subtle behavioral differences that the load balancer options enable, chief of There are some subtle behavioral differences that the load balancer options enable, chief of which
which is how `mongos` deals with open cursors on client disconnection. Over a normal connection, is how `mongos` deals with open cursors on client disconnection. Over a normal connection, `mongos`
`mongos` will keep open cursors alive for a short while after client disconnection in case the will keep open cursors alive for a short while after client disconnection in case the client
client reconnects and continues to request more from the given cursor. Since client reconnections reconnects and continues to request more from the given cursor. Since client reconnections aren't
aren't expected behind a load balancer (as the load balancer will likely redirect a given client expected behind a load balancer (as the load balancer will likely redirect a given client to a
to a different `mongos` instance upon reconnection), we eagerly [close cursors](https://github.com/mongodb/mongo/commit/b429d5dda98bbe18ab0851ffd1729d3b57fc8a4e) on load balanced different `mongos` instance upon reconnection), we eagerly
client disconnects. We also [abort any in-progress transactions](https://github.com/mongodb/mongo/commit/74628ed4e314dfe0fd69d3fbae1411981a869f6b) that were initiated by the load balanced client. [close cursors](https://github.com/mongodb/mongo/commit/b429d5dda98bbe18ab0851ffd1729d3b57fc8a4e) on
load balanced client disconnects. We also
[abort any in-progress transactions](https://github.com/mongodb/mongo/commit/74628ed4e314dfe0fd69d3fbae1411981a869f6b)
that were initiated by the load balanced client.
[proxy-protocol-url]: https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt [proxy-protocol-url]: https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

View File

@ -1,9 +1,9 @@
# Log System Overview # Log System Overview
The new log system adds capability to produce structured logs in the [Relaxed The new log system adds capability to produce structured logs in the [Relaxed Extended JSON
Extended JSON 2.0.0][relaxed_json_2] format. The new API requires names to be 2.0.0][relaxed_json_2] format. The new API requires names to be given to variables, forming field
given to variables, forming field names for the variables in structured JSON names for the variables in structured JSON logs. Named variables are called attributes in the log
logs. Named variables are called attributes in the log system. system.
# Style guide # Style guide
@ -13,43 +13,38 @@ Log lines are composed primarily of a message (`msg`) and attributes (`attr` fie
## Philosophy ## Philosophy
As you write log messages, keep the following in mind: A big thing that makes As you write log messages, keep the following in mind: A big thing that makes JSON and BSON useful
JSON and BSON useful as data formats is the ability to provide rich field names. as data formats is the ability to provide rich field names.
What makes logv2 machine readable is that we write an intact Extended BSON What makes logv2 machine readable is that we write an intact Extended BSON format.
format.
But, what makes these lines human readable is that the `msg` provides a simple, But, what makes these lines human readable is that the `msg` provides a simple, clear context for
clear context for interpreting well-formed field names and values in the `attr` interpreting well-formed field names and values in the `attr` subdocument.
subdocument.
## Specific Guidance ## Specific Guidance
For maximum readability, a log message additionally has the least amount of For maximum readability, a log message additionally has the least amount of repetition possible, and
repetition possible, and shares attribute names with other related log lines. shares attribute names with other related log lines.
### Message (the msg field) ### Message (the msg field)
The `msg` field predicates a reader's interpretation of the log line. It should The `msg` field predicates a reader's interpretation of the log line. It should be crafted with care
be crafted with care and attention. and attention.
- Concisely describe what the log line is reporting, providing enough - Concisely describe what the log line is reporting, providing enough context necessary for
context necessary for interpreting attribute field names and values interpreting attribute field names and values
- Capitalize the first letter, as in a sentence - Capitalize the first letter, as in a sentence
- Avoid unnecessary punctuation, but punctuate between sentences if using - Avoid unnecessary punctuation, but punctuate between sentences if using multiple sentences
multiple sentences
- Do not conclude with punctuation - Do not conclude with punctuation
- You may occasionally encounter `msg` strings containing fmt-style - You may occasionally encounter `msg` strings containing fmt-style `{expr}` braces. These are
`{expr}` braces. These are legacy artifacts and should be rephrased legacy artifacts and should be rephrased according to these guidelines.
according to these guidelines.
### Attributes (fields in the attr subdocument) ### Attributes (fields in the attr subdocument)
The `attr` subdocument includes important metrics/statistics about the logged The `attr` subdocument includes important metrics/statistics about the logged event for the purposes
event for the purposes of debugging or performance analysis. These variables of debugging or performance analysis. These variables should be named very well, as though intended
should be named very well, as though intended for a very human-readable portion for a very human-readable portion of the codebase (like config variable declaration, abstract class
of the codebase (like config variable declaration, abstract class definitions, definitions, etc.)
etc.)
For `attr` field names, do the following: For `attr` field names, do the following:
@ -57,40 +52,38 @@ For `attr` field names, do the following:
The bar for understanding should be: The bar for understanding should be:
- Someone with reasonable understanding of mongod behavior should understand - Someone with reasonable understanding of mongod behavior should understand immediately what is
immediately what is being logged being logged
- Someone with reasonable troubleshooting skill should be able to extract doc- - Someone with reasonable troubleshooting skill should be able to extract doc- or code-searchable
or code-searchable phrases to learn about what is being logged phrases to learn about what is being logged
#### Precisely describe values and units #### Precisely describe values and units
Exception: Do not add a unit suffix when logging a Duration type. The system Exception: Do not add a unit suffix when logging a Duration type. The system automatically adds this
automatically adds this unit. unit.
#### When providing an execution time attribute, ensure it is named "durationMillis" #### When providing an execution time attribute, ensure it is named "durationMillis"
To describe the execution time of an operation using our preferred method: To describe the execution time of an operation using our preferred method: Specify an `attr` name of
Specify an `attr` name of “duration” and provide a value using the Milliseconds “duration” and provide a value using the Milliseconds Duration type. The log system will
Duration type. The log system will automatically append "Millis" to the automatically append "Millis" to the attribute name.
attribute name.
Alternatively, specify an `attr` name of “durationMillis” and provide the Alternatively, specify an `attr` name of “durationMillis” and provide the number of milliseconds as
number of milliseconds as an integer type. an integer type.
**Importantly**: downstream analysis tools will rely on this convention, as a **Importantly**: downstream analysis tools will rely on this convention, as a replacement for the
replacement for the "[0-9]+ms$" format of prior logs. "[0-9]+ms$" format of prior logs.
#### Use certain specific terms whenever possible #### Use certain specific terms whenever possible
When logging the below information, do so with these specific terms: When logging the below information, do so with these specific terms:
- **namespace** - when logging a value of the form - **namespace** - when logging a value of the form "\<db name\>.\<collection name\>". Do not use
"\<db name\>.\<collection name\>". Do not use "collection" or abbreviate to "ns" "collection" or abbreviate to "ns"
- **db** - instead of "database" - **db** - instead of "database"
- **error** - when an error occurs, instead of "status". Use this for objects - **error** - when an error occurs, instead of "status". Use this for objects of type Status and
of type Status and DBException DBException
- **reason** - to provide rationale for an event/action when "error" isn't - **reason** - to provide rationale for an event/action when "error" isn't appropriate
appropriate
### Examples ### Examples
@ -122,11 +115,10 @@ The log system is made available with the following header:
#include "mongo/logv2/log.h" #include "mongo/logv2/log.h"
The macro `MONGO_LOGV2_DEFAULT_COMPONENT` is expanded by all logging macros. The macro `MONGO_LOGV2_DEFAULT_COMPONENT` is expanded by all logging macros. This configuration
This configuration macro must expand at their point of use to a `LogComponent` macro must expand at their point of use to a `LogComponent` expression, which is implicitly attached
expression, which is implicitly attached to the emitted message. It is to the emitted message. It is conventionally defined near the top of a `.cpp` file after headers are
conventionally defined near the top of a `.cpp` file after headers are included, included, and before any logging macros are invoked. Example:
and before any logging macros are invoked. Example:
#define MONGO_LOGV2_DEFAULT_COMPONENT ::mongo::logv2::LogComponent::kDefault #define MONGO_LOGV2_DEFAULT_COMPONENT ::mongo::logv2::LogComponent::kDefault
@ -138,22 +130,19 @@ Logging is performed using function style macros:
..., ...,
"nameN"_attr = varN); "nameN"_attr = varN);
The ID is a signed 32bit integer in the same number space as the error code The ID is a signed 32bit integer in the same number space as the error code numbers. It is used to
numbers. It is used to uniquely identify a log statement. If changing existing uniquely identify a log statement. If changing existing code, using a new ID is strongly advised to
code, using a new ID is strongly advised to avoid any parsing ambiguity. When avoid any parsing ambiguity. When selecting ID during work on JIRA ticket `SERVER-ABCDE` you can use
selecting ID during work on JIRA ticket `SERVER-ABCDE` you can use the JIRA the JIRA ticket number to avoid ID collisions with other engineers by taking ID from the range
ticket number to avoid ID collisions with other engineers by taking ID from the `ABCDE00` - `ABCDE99`.
range `ABCDE00` - `ABCDE99`.
Attributes are created with the `_attr` user-defined literal. The intermediate Attributes are created with the `_attr` user-defined literal. The intermediate object that gets
object that gets instantiated provides the assignment operator `=` for instantiated provides the assignment operator `=` for assigning a value to the attribute.
assigning a value to the attribute.
The message string must be a compile time constant. The message string must be a compile time constant. This is to avoid dynamic attribute names in the
This is to avoid dynamic attribute names in the log output and to be able to log output and to be able to add compile time verification of log statements in the future. If the
add compile time verification of log statements in the future. If the string string needs to be shared with anything else (like constructing a Status object) you can use this
needs to be shared with anything else (like constructing a Status object) you pattern:
can use this pattern:
static constexpr char str[] = "the string"; static constexpr char str[] = "the string";
@ -172,13 +161,12 @@ can use this pattern:
### Log Component ### Log Component
To override the default component, a separate logging API can be used that To override the default component, a separate logging API can be used that takes a `LogOptions`
takes a `LogOptions` structure: structure:
LOGV2_OPTIONS(options, message-string, attr0, ...); LOGV2_OPTIONS(options, message-string, attr0, ...);
`LogOptions` can be constructed with a `LogComponent` to avoid verbosity in the `LogOptions` can be constructed with a `LogComponent` to avoid verbosity in the log statement.
log statement.
##### Example ##### Example
@ -186,9 +174,8 @@ log statement.
### Log Severity ### Log Severity
`LOGV2` is the logging macro for the default informational (0) severity. To log `LOGV2` is the logging macro for the default informational (0) severity. To log to different
to different severities there are separate logging macros to be used, they all severities there are separate logging macros to be used, they all take paramaters like `LOGV2`:
take paramaters like `LOGV2`:
- `LOGV2_WARNING` - `LOGV2_WARNING`
- `LOGV2_ERROR` - `LOGV2_ERROR`
@ -202,18 +189,17 @@ There is also variations that take `LogOptions` if needed:
- `LOGV2_ERROR_OPTIONS` - `LOGV2_ERROR_OPTIONS`
- `LOGV2_FATAL_OPTIONS` - `LOGV2_FATAL_OPTIONS`
Fatal level log statements using `LOGV2_FATAL` perform `fassert` after logging, Fatal level log statements using `LOGV2_FATAL` perform `fassert` after logging, using the provided
using the provided ID as assert id. `LOGV2_FATAL_NOTRACE` perform ID as assert id. `LOGV2_FATAL_NOTRACE` perform `fassertNoTrace` and `LOGV2_FATAL_CONTINUE` does not
`fassertNoTrace` and `LOGV2_FATAL_CONTINUE` does not `fassert` allowing for `fassert` allowing for continued execution. `LOGV2_FATAL_CONTINUE` is meant to be used when a fatal
continued execution. `LOGV2_FATAL_CONTINUE` is meant to be used when a fatal error has occurred but a different way of halting execution is desired such as `std::terminate` or
error has occurred but a different way of halting execution is desired such as `fassertFailedWithStatus`.
`std::terminate` or `fassertFailedWithStatus`.
`LOGV2_FATAL_OPTIONS` performs `fassert` by default like `LOGV2_FATAL` but this `LOGV2_FATAL_OPTIONS` performs `fassert` by default like `LOGV2_FATAL` but this can be changed by
can be changed by setting the `FatalMode` on the `LogOptions`. setting the `FatalMode` on the `LogOptions`.
Debug-level logging is slightly different where an additional parameter (as Debug-level logging is slightly different where an additional parameter (as integer) required to
integer) required to indicate the desired debug level: indicate the desired debug level:
LOGV2_DEBUG(ID, debug-level, message-string, attr0, ...); LOGV2_DEBUG(ID, debug-level, message-string, attr0, ...);
@ -224,17 +210,15 @@ integer) required to indicate the desired debug level:
message-string, message-string,
attr0, ...); attr0, ...);
`LOGV2_PROD_ONLY` logs like a default `LOGV2` log in production, but debug-1 log `LOGV2_PROD_ONLY` logs like a default `LOGV2` log in production, but debug-1 log in internal
in internal testing. It accepts the same arguments as `LOGV2`. This log level is testing. It accepts the same arguments as `LOGV2`. This log level is for log lines that may be
for log lines that may be spammy in testing but are more rare in production. As spammy in testing but are more rare in production. As such, they may be useful in investigations.
such, they may be useful in investigations. This level also preserves backwards This level also preserves backwards compatibility for logs that are no longer as useful as when they
compatibility for logs that are no longer as useful as when they were introduced. were introduced. To determine whether to log, this macro uses the `LogSeverity::ProdOnly()` level,
To determine whether to log, this macro uses the `LogSeverity::ProdOnly()` which returns level `LogSeverity::Debug(1)` when in a testing environment and `LogSeverity::Log()`
level, which returns level `LogSeverity::Debug(1)` when in a testing environment otherwise. Whether the server is in a testing environment is determined using the
and `LogSeverity::Log()` otherwise. Whether the server is in a testing `enableTestCommands` server parameter. It is preferred to use other macros over this one as it
environment is determined using the `enableTestCommands` server parameter. introduces a difference between testing and production. There is also the `LOGV2_PROD_ONLY_OPTIONS`
It is preferred to use other macros over this one as it introduces a difference
between testing and production. There is also the `LOGV2_PROD_ONLY_OPTIONS`
variation that takes `LogOptions`. variation that takes `LogOptions`.
##### Example ##### Example
@ -248,15 +232,13 @@ variation that takes `LogOptions`.
### Log Tags ### Log Tags
Log tags are replacing the Tee from the old log system as the way to indicate Log tags are replacing the Tee from the old log system as the way to indicate that the log should
that the log should also be written to a `RamLog` (accessible with the `getLog` also be written to a `RamLog` (accessible with the `getLog` command).
command).
Tags are added to a log statement with the options API similarly to how Tags are added to a log statement with the options API similarly to how non-default components are
non-default components are specified by constructing a `LogOptions`. specified by constructing a `LogOptions`.
Multiple tags can be attached to a log statement using the bitwise or operator Multiple tags can be attached to a log statement using the bitwise or operator `|`.
`|`.
##### Example ##### Example
@ -267,19 +249,18 @@ Multiple tags can be attached to a log statement using the bitwise or operator
### Dynamic attributes ### Dynamic attributes
Sometimes there is a need to add attributes depending on runtime conditionals. Sometimes there is a need to add attributes depending on runtime conditionals. To support this there
To support this there is the `DynamicAttributes` class that has an `add` method is the `DynamicAttributes` class that has an `add` method to add named attributes one by one. This
to add named attributes one by one. This class is meant to be used when you class is meant to be used when you have this specific requirement and is not the general logging
have this specific requirement and is not the general logging API. API.
When finished, it is logged using the regular logging API but the When finished, it is logged using the regular logging API but the `DynamicAttributes` instance is
`DynamicAttributes` instance is passed as the first attribute parameter. Mixing passed as the first attribute parameter. Mixing `_attr` literals with the `DynamicAttributes` is not
`_attr` literals with the `DynamicAttributes` is not supported. supported.
When using the `DynamicAttributes` you need to be careful about parameter When using the `DynamicAttributes` you need to be careful about parameter lifetimes. The
lifetimes. The `DynamicAttributes` binds attributes _by reference_ and the `DynamicAttributes` binds attributes _by reference_ and the reference must be valid when passing the
reference must be valid when passing the `DynamicAttributes` to the log `DynamicAttributes` to the log statement.
statement.
##### Example ##### Example
@ -321,11 +302,11 @@ Many basic types have built in support:
### User-defined types ### User-defined types
To make a user-defined type loggable it needs a serialization member function To make a user-defined type loggable it needs a serialization member function that the log system
that the log system can bind to. can bind to.
The system binds and uses serialization functions by looking for functions in The system binds and uses serialization functions by looking for functions in the following priority
the following priority order: order:
- Structured serialization functions - Structured serialization functions
- `void x.serialize(BSONObjBuilder*) const` (member) - `void x.serialize(BSONObjBuilder*) const` (member)
@ -338,19 +319,18 @@ the following priority order:
- `x.toString() ` (member) - `x.toString() ` (member)
- `toString(x)` (non-member) - `toString(x)` (non-member)
Enums cannot have member functions, but they will still try to bind to the Enums cannot have member functions, but they will still try to bind to the `toStringForLogging(e)`
`toStringForLogging(e)` or `toString(e)` non-members. If neither is available, or `toString(e)` non-members. If neither is available, the enum value will be logged as its
the enum value will be logged as its underlying integral type. underlying integral type.
In order to offer structured serialization and output, a type would need to In order to offer structured serialization and output, a type would need to supply a structured
supply a structured serialization function. Otherwise, if only stringification serialization function. Otherwise, if only stringification is provided, the output will be an
is provided, the output will be an escaped string. escaped string.
The `toStringForLogging` non-member is an ADL customization hook used to The `toStringForLogging` non-member is an ADL customization hook used to override `toString` for
override `toString` for very rare cases where `toString` is inappropriate for very rare cases where `toString` is inappropriate for logging perhaps because it's needed for other
logging perhaps because it's needed for other non-logging formatting. Usually a non-logging formatting. Usually a `toString` (member or nonmember) is a sufficient customization
`toString` (member or nonmember) is a sufficient customization point and should point and should be preferred as a canonical stringification of the object.
be preferred as a canonical stringification of the object.
_NOTE: No `operator<<` overload is used even if available_ _NOTE: No `operator<<` overload is used even if available_
@ -370,20 +350,19 @@ _NOTE: No `operator<<` overload is used even if available_
### Container support ### Container support
STL containers and data structures that have STL like interfaces are loggable STL containers and data structures that have STL like interfaces are loggable as long as they
as long as they contain loggable elements (built-in, user-defined or other contain loggable elements (built-in, user-defined or other containers).
containers).
#### Sequential containers #### Sequential containers
Sequential containers like `std::vector`, `std::deque` and `std::list` are Sequential containers like `std::vector`, `std::deque` and `std::list` are loggable and the elements
loggable and the elements get formatted as JSON array in structured output. get formatted as JSON array in structured output.
#### Associative containers #### Associative containers
Associative containers such as `std::map` and `stdx::unordered_map` loggable Associative containers such as `std::map` and `stdx::unordered_map` loggable with the requirement
with the requirement that they key is of a string type. The structured format that they key is of a string type. The structured format is a JSON object where the field names are
is a JSON object where the field names are the key. the key.
#### Ranges #### Ranges
@ -392,11 +371,10 @@ Ranges is loggable via helpers to indicate what type of range it is
- `seqLog(begin, end)` - `seqLog(begin, end)`
- `mapLog(begin, end)` - `mapLog(begin, end)`
seqLog indicates that it is a sequential range where the iterators point to seqLog indicates that it is a sequential range where the iterators point to loggable value directly.
loggable value directly.
mapLog indicates that it is a range coming from an associative container where mapLog indicates that it is a range coming from an associative container where the iterators point
the iterators point to a key-value pair. to a key-value pair.
##### Examples ##### Examples
@ -425,10 +403,9 @@ the iterators point to a key-value pair.
#### Containers and `uint64_t` #### Containers and `uint64_t`
Logging of containers uses `BSONObj` as an internal representation and Logging of containers uses `BSONObj` as an internal representation and `uint64_t` is not a supported
`uint64_t` is not a supported type with `BSONObjBuilder::append()`. As a user type with `BSONObjBuilder::append()`. As a user you can use `boost::transform_iterator` to cast the
you can use `boost::transform_iterator` to cast the `uint64_t` to a supported `uint64_t` to a supported type.
type.
##### Example ##### Example
@ -448,17 +425,14 @@ type.
### Duration types ### Duration types
Duration types have special formatting to match existing practices in the Duration types have special formatting to match existing practices in the server code base. Their
server code base. Their resulting format depends on the context they are resulting format depends on the context they are logged.
logged.
When durations are formatted as JSON or BSON a unit suffix is added to the When durations are formatted as JSON or BSON a unit suffix is added to the attribute name when
attribute name when building the field name. The value will be count of the building the field name. The value will be count of the duration as a number.
duration as a number.
When logging containers with durations there is no attribute per duration When logging containers with durations there is no attribute per duration instance that can have the
instance that can have the suffix added. In this case durations are instead suffix added. In this case durations are instead formatted as a BSON object.
formatted as a BSON object.
##### Examples ##### Examples
@ -485,9 +459,9 @@ formatted as a BSON object.
# Attribute naming abstraction # Attribute naming abstraction
The style guide contains recommendations for attribute naming in certain cases. The style guide contains recommendations for attribute naming in certain cases. To make abstraction
To make abstraction of attribute naming possible a `logAttrs` function can be of attribute naming possible a `logAttrs` function can be implemented as a friend function in a
implemented as a friend function in a class with the following signature: class with the following signature:
class AnyUserType { class AnyUserType {
public: public:
@ -505,15 +479,13 @@ implemented as a friend function in a class with the following signature:
## Multiple attributes ## Multiple attributes
In some cases a loggable type might be composed as a hierarchy in the C++ type In some cases a loggable type might be composed as a hierarchy in the C++ type system which would
system which would lead to a very verbose structured log output as every level lead to a very verbose structured log output as every level in the hierarcy needs a name when
in the hierarcy needs a name when outputted as JSON. The attribute naming outputted as JSON. The attribute naming abstraction system can also be used to collapse such
abstraction system can also be used to collapse such hierarchies. Instead of hierarchies. Instead of making a type loggable it can instead return one or more attributes from its
making a type loggable it can instead return one or more attributes from its
members by using `multipleAttrs` in `logAttrs` functions. members by using `multipleAttrs` in `logAttrs` functions.
`multipleAttrs(...)` accepts attributes or instances of types with `logAttrs` `multipleAttrs(...)` accepts attributes or instances of types with `logAttrs` functions implemented.
functions implemented.
##### Examples ##### Examples
@ -535,12 +507,11 @@ functions implemented.
## Handling temporary lifetime with multiple attributes ## Handling temporary lifetime with multiple attributes
To avoid lifetime issues (log attributes bind their values by reference) it is To avoid lifetime issues (log attributes bind their values by reference) it is recommended to
recommended to **not** create attributes when using `multipleAttrs` unless **not** create attributes when using `multipleAttrs` unless attributes are created for members
attributes are created for members directly. If `logAttrs` or `""_attr=` is directly. If `logAttrs` or `""_attr=` is used inside a `logAttrs` function on the return of a
used inside a `logAttrs` function on the return of a function returning by function returning by value it will result in a dangling reference. The following example
value it will result in a dangling reference. The following example illustrates illustrates the problem:
the problem:
class SomeSubType { class SomeSubType {
public: public:
@ -566,10 +537,9 @@ the problem:
std::string name_; std::string name_;
}; };
The better implementation would be to let the log system control the The better implementation would be to let the log system control the lifetime by passing the
lifetime by passing the instance to `multipleAttrs` without creating the instance to `multipleAttrs` without creating the attribute. The log system will detect that it is
attribute. The log system will detect that it is not an attribute and will not an attribute and will attempt to create attributes by calling `logAttrs`:
attempt to create attributes by calling `logAttrs`:
friend auto logAttrs(const SomeType& type) { friend auto logAttrs(const SomeType& type) {
return logv2::multipleAttrs("name"_attr=type.name(), type.sub()); return logv2::multipleAttrs("name"_attr=type.name(), type.sub());
@ -579,11 +549,10 @@ attempt to create attributes by calling `logAttrs`:
## Combining uassert with log statement ## Combining uassert with log statement
Code that emits a high severity log statement may also need to emit a `uassert` Code that emits a high severity log statement may also need to emit a `uassert` after the log. There
after the log. There is the `UserAssertAfterLog` logging option that allows you is the `UserAssertAfterLog` logging option that allows you to re-use the log statement to do the
to re-use the log statement to do the formatting required for the `uassert`. formatting required for the `uassert`. The assertion id can be either the logging ID by passing
The assertion id can be either the logging ID by passing `UserAssertAfterLog` `UserAssertAfterLog` with no arguments or the assertion id can set by constructing
with no arguments or the assertion id can set by constructing
`UserAssertAfterLog` with an `ErrorCodes::Error`. `UserAssertAfterLog` with an `ErrorCodes::Error`.
The assertion reason string will be a plain text log and can be provided with additional attribute The assertion reason string will be a plain text log and can be provided with additional attribute
@ -614,26 +583,23 @@ Would emit a `uassert` after performing the log that is equivalent to:
## Unstructured logging for local development ## Unstructured logging for local development
To make it easier to use the log system for tracing in local development, there To make it easier to use the log system for tracing in local development, there is a special API
is a special API that does not use IDs or attribute names: that does not use IDs or attribute names:
logd(format-string, value0, ..., valueN); logd(format-string, value0, ..., valueN);
It formats the string using libfmt similarly to what It formats the string using libfmt similarly to what
`fmt::format(format-string, value0, ..., valueN)` would produce but using the `fmt::format(format-string, value0, ..., valueN)` would produce but using the regular log system
regular log system type support on how types are made loggable. The formatted type support on how types are made loggable. The formatted string is logged as the `msg` field in
string is logged as the `msg` field in the JSON output, with no `attr` the JSON output, with no `attr` subobject.
subobject.
When using `logd` the log will emitted with standard severity and the default When using `logd` the log will emitted with standard severity and the default component.
component.
A difference from regular logging, `logd` is allowed to be used in header files A difference from regular logging, `logd` is allowed to be used in header files by including
by including `logv2/log_debug.h`. `logv2/log_debug.h`.
Unstructured logging is not allowed to be used in code committed to master, Unstructured logging is not allowed to be used in code committed to master, there is a lint check to
there is a lint check to validate this. It is however allowed to be used in validate this. It is however allowed to be used in Evergreen patch builds.
Evergreen patch builds.
##### Examples ##### Examples
@ -642,8 +608,8 @@ Evergreen patch builds.
## Rate limiting ## Rate limiting
Rate limiting logs is useful to reduce the impact of logging on database throughput. At high Rate limiting logs is useful to reduce the impact of logging on database throughput. At high rate
rate and concurrency, logging can be expensive and reduce performance. Attention should be paid and concurrency, logging can be expensive and reduce performance. Attention should be paid
specifically to logs that can occur on every operation, whether they fail or succeed. specifically to logs that can occur on every operation, whether they fail or succeed.
The rate limiting feature is implemented by `SeveritySuppressor` (see The rate limiting feature is implemented by `SeveritySuppressor` (see
@ -653,8 +619,8 @@ severity; subsequent logs within that interval are emitted at a "quiet" severity
level). This ensures logs are not always written unless the logging level is increased for the level). This ensures logs are not always written unless the logging level is increased for the
component. component.
`SeveritySuppressor` is typically used with `StaticImmortal` for static storage. The interval can `SeveritySuppressor` is typically used with `StaticImmortal` for static storage. The interval can be
be configured with a server parameter when constructing SeveritySuppressor. configured with a server parameter when constructing SeveritySuppressor.
##### Example ##### Example
@ -666,18 +632,17 @@ be configured with a server parameter when constructing SeveritySuppressor.
"Slow network response send time", "Slow network response send time",
"elapsed"_attr = bob.obj()); "elapsed"_attr = bob.obj());
In this example, the first log within each gSlowNetworkLogRate-second window is emitted at Info level; In this example, the first log within each gSlowNetworkLogRate-second window is emitted at Info
subsequent logs within that window are emitted at Debug(2), which requires increasing the component's level; subsequent logs within that window are emitted at Debug(2), which requires increasing the
log level to be visible. component's log level to be visible.
For per-key rate limiting (e.g., one log per key per interval), use `KeyedSeveritySuppressor` For per-key rate limiting (e.g., one log per key per interval), use `KeyedSeveritySuppressor`
instead. instead.
# JSON output format # JSON output format
Produces structured logs of the [Relaxed Extended JSON 2.0.0][relaxed_json_2] Produces structured logs of the [Relaxed Extended JSON 2.0.0][relaxed_json_2] format. Below is an
format. Below is an example of a log statement in C++ and a pretty-printed JSON example of a log statement in C++ and a pretty-printed JSON output:
output:
C++ statement: C++ statement:
@ -717,5 +682,7 @@ Output:
--- ---
[relaxed_json_2]: https://github.com/mongodb/specifications/blob/master/source/extended-json.rst [relaxed_json_2]: https://github.com/mongodb/specifications/blob/master/source/extended-json.rst
[_lastOplogEntryFetcherCallbackForStopTimestamp]: https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/initial_syncer.cpp#L1500-L1512 [_lastOplogEntryFetcherCallbackForStopTimestamp]:
[_summarizeRollback]: https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/rollback_impl.cpp#L1263-L1305 https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/initial_syncer.cpp#L1500-L1512
[_summarizeRollback]:
https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/rollback_impl.cpp#L1263-L1305

View File

@ -2,5 +2,5 @@
- Avoid using bare pointers for dynamically allocated objects. Prefer `std::unique_ptr`, - Avoid using bare pointers for dynamically allocated objects. Prefer `std::unique_ptr`,
`std::shared_ptr`, or another RAII class such as `BSONObj`. `std::shared_ptr`, or another RAII class such as `BSONObj`.
- If you assign the output of `new/malloc()` directly to a bare pointer you should document where - If you assign the output of `new/malloc()` directly to a bare pointer you should document where it
it gets deleted/freed, who owns it along the way, and how exception safety is ensured. gets deleted/freed, who owns it along the way, and how exception safety is ensured.

View File

@ -15,86 +15,87 @@ TODO
## Why are we doing this? ## Why are we doing this?
Having a clear delineation between public and private APIs for each module will improve the Having a clear delineation between public and private APIs for each module will improve the
maintainability and velocity of our codebase. Teams will have more freedom to evolve their maintainability and velocity of our codebase. Teams will have more freedom to evolve their internal
internal implementation details without affecting consumers. Consumers will benefit from implementation details without affecting consumers. Consumers will benefit from knowing what APIs
knowing what APIs are intended for their consumption. are intended for their consumption.
## Assigning files to modules ## Assigning files to modules
The file `modules_poc/modules.yaml` contains a list of modules, each containing The file `modules_poc/modules.yaml` contains a list of modules, each containing a list of files.
a list of files. Each file must be contained in only one module. Note that Each file must be contained in only one module. Note that module assignment is not required to map
module assignment is not required to map neatly to team ownership. neatly to team ownership.
In cases where multiple globs match a file, the current rule is that the In cases where multiple globs match a file, the current rule is that the longest glob wins. This is
longest glob wins. This is used as a simpler-to-implement version of used as a simpler-to-implement version of most-specific glob wins, which we may switch to in the
most-specific glob wins, which we may switch to in the future. future.
## How do I mark API visibility? ## How do I mark API visibility?
This section will just describe the basic process. Later sections will cover the tooling This section will just describe the basic process. Later sections will cover the tooling available
available to help, along with caveats to be aware of. to help, along with caveats to be aware of.
First read the documentation in [src/mongo/util/modules.h](https://github.com/mongodb/mongo/blob/master/src/mongo/util/modules.h) First read the documentation in
for the canonical list and description of visibility levels. As a brief overview of the main [src/mongo/util/modules.h](https://github.com/mongodb/mongo/blob/master/src/mongo/util/modules.h) for
levels from least to most restrictive: the canonical list and description of visibility levels. As a brief overview of the main levels from
least to most restrictive:
- `OPEN`: This is available for usage _and inheritance_ from anywhere in the codebase - `OPEN`: This is available for usage _and inheritance_ from anywhere in the codebase
- `PUBLIC`: This is available for usage from anywhere in the codebase. For types, subclasses may - `PUBLIC`: This is available for usage from anywhere in the codebase. For types, subclasses may
only be defined in the same module. only be defined in the same module.
- `NEEDS_REPLACEMENT` and `USE_REPLACEMENT(...)`: These are collectively considered - `NEEDS_REPLACEMENT` and `USE_REPLACEMENT(...)`: These are collectively considered "unfortunately
"unfortunately public" and are available for use, but should be avoided public" and are available for use, but should be avoided
- `PARENT_PRIVATE`: This is similar to `PRIVATE`, but allows usage from any file in the parent - `PARENT_PRIVATE`: This is similar to `PRIVATE`, but allows usage from any file in the parent
module, including other submodules module, including other submodules
- `PRIVATE`: This may only be used from the current module or one of its submodules - `PRIVATE`: This may only be used from the current module or one of its submodules
- `FILE_PRIVATE`: This may only be used from the current "file family" (roughly, header \+ cpp - `FILE_PRIVATE`: This may only be used from the current "file family" (roughly, header \+ cpp \+
\+ tests). It may not be used by other files, even from the same module. tests). It may not be used by other files, even from the same module.
You can think of public vs private similarly to how you would the sections of a `class`: they You can think of public vs private similarly to how you would the sections of a `class`: they
indicate whether something is intended to be part of the API or an implementation detail. The indicate whether something is intended to be part of the API or an implementation detail. The
difference is that they apply at a wider granularity of code than a single class, with difference is that they apply at a wider granularity of code than a single class, with
implementation details available to either the full module (and its submodules) for `PRIVATE` implementation details available to either the full module (and its submodules) for `PRIVATE` or the
or the file family for `FILE_PRIVATE`. file family for `FILE_PRIVATE`.
The macros in that header file are attached to declarations and set the visibility level for The macros in that header file are attached to declarations and set the visibility level for that
that declaration and all of its "semantic children"[^1]. The macros are C++ attributes which declaration and all of its "semantic children"[^1]. The macros are C++ attributes which means that
means that they need to go in specific places that differ based on what is being marked (for they need to go in specific places that differ based on what is being marked (for templates, the
templates, the location does not change and is always somewhere after the `template <...>` part): location does not change and is always somewhere after the `template <...>` part):
- `MONGO_MOD_PUBLIC;` by itself as the first line after includes in a header sets the default - `MONGO_MOD_PUBLIC;` by itself as the first line after includes in a header sets the default for
for that header (only `PUBLIC`, `PARENT_PRIVATE`, and `FILE_PRIVATE` are allowed here) that header (only `PUBLIC`, `PARENT_PRIVATE`, and `FILE_PRIVATE` are allowed here)
- `namespace MONGO_MOD mongo {` (this does not work with nested namespaces in a single - `namespace MONGO_MOD mongo {` (this does not work with nested namespaces in a single declaration
declaration like `namespace mongo::repl`) like `namespace mongo::repl`)
- `class MONGO_MOD Foo {` (Ditto for `enum`, `struct`, and `union`) - `class MONGO_MOD Foo {` (Ditto for `enum`, `struct`, and `union`)
- `MONGO_MOD void func(...);` - `MONGO_MOD void func(...);`
- `MONGO_MOD int var;` - `MONGO_MOD int var;`
- `concept isFooable MONGO_MOD {` - `concept isFooable MONGO_MOD {`
For the cases where it goes at the beginning of the line, if clang-format chooses an unfortunate For the cases where it goes at the beginning of the line, if clang-format chooses an unfortunate
place to break the line, it usually helps to undo the formatting then put the macro on its own place to break the line, it usually helps to undo the formatting then put the macro on its own line
line above the declaration. above the declaration.
APIs are marked one header at a time, by including `"mongo/util/modules.h"` in the header. APIs are marked one header at a time, by including `"mongo/util/modules.h"` in the header. This
This causes the header to be treated as "modularized" which has the following effects: causes the header to be treated as "modularized" which has the following effects:
- All declarations in that header (not transitive includes) default to `PRIVATE`, meaning that - All declarations in that header (not transitive includes) default to `PRIVATE`, meaning that the
the public API is what must be marked. public API is what must be marked.
- Members in `private:` sections in classes default to `PRIVATE`, regardless of the visibility - Members in `private:` sections in classes default to `PRIVATE`, regardless of the visibility of
of the class. The only way the language would allow them to be used from outside of the module the class. The only way the language would allow them to be used from outside of the module is if
is if you have cross-module friendships, which should generally be avoided. If needed you have cross-module friendships, which should generally be avoided. If needed temporarily, favor
temporarily, favor `NEEDS_REPLACEMENT` over `PUBLIC` for these declarations. `NEEDS_REPLACEMENT` over `PUBLIC` for these declarations.
- Declarations ending in `_forTest` default to `FILE_PRIVATE` to support the common case where - Declarations ending in `_forTest` default to `FILE_PRIVATE` to support the common case where they
they are only intended for testing that class. If they are actually intended to support testing are only intended for testing that class. If they are actually intended to support testing of
of consumers, not just the type they are defined on, they can be explicitly given `PUBLIC` or consumers, not just the type they are defined on, they can be explicitly given `PUBLIC` or
`PRIVATE` visibility. `PRIVATE` visibility.
- Internal and detail namespaces default to `PRIVATE` and cannot be made less restricted, but - Internal and detail namespaces default to `PRIVATE` and cannot be made less restricted, but can
can still be marked as `FILE_PRIVATE`. Individual declarations within the namespace can be still be marked as `FILE_PRIVATE`. Individual declarations within the namespace can be exposed as
exposed as necessary, but they cannot be exposed in bulk without changing the name of the necessary, but they cannot be exposed in bulk without changing the name of the namespace to
namespace to something that doesn't imply private. something that doesn't imply private.
For internal headers of a module which do not contribute to its public API, simply including For internal headers of a module which do not contribute to its public API, simply including
`modules.h` is sufficient. There is a [tool](#the-private-header-marker) to automate this `modules.h` is sufficient. There is a [tool](#the-private-header-marker) to automate this process.
process. You may additionally want to consider whether any APIs should be marked `FILE_PRIVATE`, You may additionally want to consider whether any APIs should be marked `FILE_PRIVATE`, but that is
but that is optional. optional.
For IDL files, you mark visibility of whole types (`struct`, `enum`, and `command`) with the For IDL files, you mark visibility of whole types (`struct`, `enum`, and `command`) with the
`mod_visibility` option. The value should be the same as one of the `MONGO_MOD` macros, but `mod_visibility` option. The value should be the same as one of the `MONGO_MOD` macros, but
@ -105,17 +106,17 @@ compelling use case for this.
## What tooling exists to help me? ## What tooling exists to help me?
Note that all tooling should be run from within a properly set-up python virtual environment. Note that all tooling should be run from within a properly set-up python virtual environment. This
This includes running `buildscripts/poetry_sync.sh` to ensure you have the correct dependencies. includes running `buildscripts/poetry_sync.sh` to ensure you have the correct dependencies.
### The scanner and merger ### The scanner and merger
The merger generates a cross reference of all first-party usages of first-party code and stores The merger generates a cross reference of all first-party usages of first-party code and stores it
it in `merged_decls.json`, which is used by the rest of our tooling. It is also where we validate in `merged_decls.json`, which is used by the rest of our tooling. It is also where we validate that
that there are no disallowed accesses. It will be invoked for you by the browser when you ask it there are no disallowed accesses. It will be invoked for you by the browser when you ask it to
to rescan, or you can also manually run it as `modules_poc/merge_decls.py`. If you are interested rescan, or you can also manually run it as `modules_poc/merge_decls.py`. If you are interested in
in analyzing that file, [`jq`](https://jqlang.org/) is a powerful tool, or you can just write analyzing that file, [`jq`](https://jqlang.org/) is a powerful tool, or you can just write some
some python. python.
As a rather extreme example of what you can do with `jq`, here is how the progress reports are As a rather extreme example of what you can do with `jq`, here is how the progress reports are
generated: generated:
@ -129,43 +130,43 @@ generated:
jq 'map(., .mod = "TOTAL") | group_by(.mod)[] | group_by(.loc | split(":")[0]) | {mod: .[0].[0].mod, total: length, marked: map(select(any(.visibility == "UNKNOWN") | not)) | length} | .done = (1000 * .marked / .total | round) / 10 | "\(.mod): \(" " * (.mod | 40-length)) \(.done)% (\(.marked) / \(.total))"' -r merged_decls.json jq 'map(., .mod = "TOTAL") | group_by(.mod)[] | group_by(.loc | split(":")[0]) | {mod: .[0].[0].mod, total: length, marked: map(select(any(.visibility == "UNKNOWN") | not)) | length} | .done = (1000 * .marked / .total | round) / 10 | "\(.mod): \(" " * (.mod | 40-length)) \(.done)% (\(.marked) / \(.total))"' -r merged_decls.json
``` ```
Internally, the merger will internally invoke `bazel build --config=mod-scanner //src/mongo/...` Internally, the merger will internally invoke `bazel build --config=mod-scanner //src/mongo/...` to
to run the scanner over the whole codebase (or the parts that have changed since the last scan), run the scanner over the whole codebase (or the parts that have changed since the last scan), taking
taking advantage of bazel remote execution to achieve very high levels of parallelism. advantage of bazel remote execution to achieve very high levels of parallelism.
### The browser ### The browser
The main piece of tooling to run is the browser, which is launched by running The main piece of tooling to run is the browser, which is launched by running
`modules_poc/browse.py`. If you haven't scanned the codebase recently, it will offer to run it `modules_poc/browse.py`. If you haven't scanned the codebase recently, it will offer to run it for
for you which will take a few minutes. After modifying the source code, you can rescan at any you which will take a few minutes. After modifying the source code, you can rescan at any time by
time by pressing `r`. It will only rescan files that have been modified or that transitively pressing `r`. It will only rescan files that have been modified or that transitively include
include modified headers. modified headers.
The browser is primarily intended to assist in labeling public APIs, so the files are sorted The browser is primarily intended to assist in labeling public APIs, so the files are sorted with
with the most number of unlabeled declarations ("unknowns") first. You can search for a file the most number of unlabeled declarations ("unknowns") first. You can search for a file by pressing
by pressing `f` or press `m` to filter the files by module. `f` or press `m` to filter the files by module.
The list of available key bindings is shown on the right. You can toggle that by pressing `?`. The list of available key bindings is shown on the right. You can toggle that by pressing `?`. Other
Other keybinding of note are that you can press `g` to go to the currently highlighted keybinding of note are that you can press `g` to go to the currently highlighted declaration or
declaration or location in your editor (only when running in the vscode or nvim terminal), location in your editor (only when running in the vscode or nvim terminal), and `p` to toggle an
and `p` to toggle an inline preview of the location within the browser. You can press `Tab ↹` inline preview of the location within the browser. You can press `Tab ↹` to toggle between the tree
to toggle between the tree and the code preview. The mouse is fully supported for scrolling and the code preview. The mouse is fully supported for scrolling and expanding rows in the tree, and
and expanding rows in the tree, and there are aliases for some basic vim keybinds (`hjkl/`). there are aliases for some basic vim keybinds (`hjkl/`).
### The private header marker ### The private header marker
Once you have scanned the codebase and produced a `merged_decls.json`, Once you have scanned the codebase and produced a `merged_decls.json`,
`modules_poc/private_headers.py` can be used to find all header and IDL files where there are `modules_poc/private_headers.py` can be used to find all header and IDL files where there are no
no currently detected external usages and automatically mark them as fully private to the currently detected external usages and automatically mark them as fully private to the module. This
module. This does not necessarily mean that all automatically marked headers are intended to does not necessarily mean that all automatically marked headers are intended to be private. A human
be private. A human should review to ensure that the marked headers match intent. You can pass should review to ensure that the marked headers match intent. You can pass flags to filter on
flags to filter on any/all of module, owning team, or path glob. For headers matching the filter, any/all of module, owning team, or path glob. For headers matching the filter, the script will also
the script will also warn of usages of `_forTest` external to the file family that may need to warn of usages of `_forTest` external to the file family that may need to be marked `PRIVATE` to
be marked `PRIVATE` to make them available to the whole module since they default to only being make them available to the whole module since they default to only being available to the file
available to the file family for marked headers. family for marked headers.
Make sure to run `buildscripts/clang_format.py format-my` or `bazel run format` after using it Make sure to run `buildscripts/clang_format.py format-my` or `bazel run format` after using it to
to modify any C++ files. modify any C++ files.
Example usage: Example usage:
@ -178,13 +179,12 @@ Example usage:
### The PR comment generator ### The PR comment generator
You can run `modules_poc/mod_diff.py` to output a brief summary of all of the API (including You can run `modules_poc/mod_diff.py` to output a brief summary of all of the API (including
visibility levels and usages counts) for each file modified in your branch. When putting up a PR visibility levels and usages counts) for each file modified in your branch. When putting up a PR to
to mark API visibility, you should add a comment with its output to the PR as an aide to mark API visibility, you should add a comment with its output to the PR as an aide to reviewers. The
reviewers. The output is intended to be close enough to C++ that you should put it in a output is intended to be close enough to C++ that you should put it in a ` ```cpp ` block when
` ```cpp ` block when making your PR comment to make it more readable. You can also making your PR comment to make it more readable. You can also pipe it through `bat -lcpp` to make it
pipe it through `bat -lcpp` to make it colorful locally. Note that it will use the last colorful locally. Note that it will use the last scan output, so if you've modified any headers, you
scan output, so if you've modified any headers, you should run a rescan prior to running this should run a rescan prior to running this tool.
tool.
## Workflow ## Workflow
@ -198,24 +198,23 @@ The general workflow for each PR will generally be the same:
5. Run [the pr comment generator](#the-pr-comment-generator) to show the APIs that you have marked 5. Run [the pr comment generator](#the-pr-comment-generator) to show the APIs that you have marked
- Look through this to ensure that everything is as you expect. - Look through this to ensure that everything is as you expect.
6. Put up a PR and include the generated comment in a ` ```cpp ` block 6. Put up a PR and include the generated comment in a ` ```cpp ` block
- I suggest keeping PRs small (say, no more than 10 files at a time) so that they are - I suggest keeping PRs small (say, no more than 10 files at a time) so that they are manageable
manageable by reviewers. As an exception it seems reasonable to auto-mark many headers as by reviewers. As an exception it seems reasonable to auto-mark many headers as private in a
private in a single PR, as long as those PRs are separate from those containing any manual single PR, as long as those PRs are separate from those containing any manual marking.
marking.
When first starting to mark a module, I suggest running the [`modules_poc/private_headers.py`](#the-private-header-marker) When first starting to mark a module, I suggest running the
script with `--dry-run` (or `-n`) and `--module=YOUR_MODULE`. For larger modules (in particular, [`modules_poc/private_headers.py`](#the-private-header-marker) script with `--dry-run` (or `-n`) and
the `query` mega module) you may want to pass a `--glob` so that you can focus on a smaller `--module=YOUR_MODULE`. For larger modules (in particular, the `query` mega module) you may want to
subset of the code initially. That will give you an overview of the files that are used from pass a `--glob` so that you can focus on a smaller subset of the code initially. That will give you
outside your module (which contain defacto public APIs today) and those that do not (which can an overview of the files that are used from outside your module (which contain defacto public APIs
automatically be marked as private implementation details). today) and those that do not (which can automatically be marked as private implementation details).
If all of the defacto private headers seem like they should be private, you can remove the If all of the defacto private headers seem like they should be private, you can remove the dry-run
dry-run flag to have it automatically mark them as private. Be sure to validate that their flag to have it automatically mark them as private. Be sure to validate that their contents are
contents are actually intended to be private. Remember that the point of having a human doing actually intended to be private. Remember that the point of having a human doing the marking is to
the marking is to ensure that we correctly capture intent. You can optionally mark implementation ensure that we correctly capture intent. You can optionally mark implementation details within each
details within each header as `FILE_PRIVATE`, if you would like to prevent them from being used header as `FILE_PRIVATE`, if you would like to prevent them from being used elsewhere even within
elsewhere even within the module. the module.
You can then open [the browser](#the-browser) (`modules_poc/browse.py`) to look at the remaining You can then open [the browser](#the-browser) (`modules_poc/browse.py`) to look at the remaining
headers. It will show you what is used and from where. It will be particularly useful for things headers. It will show you what is used and from where. It will be particularly useful for things
@ -229,137 +228,136 @@ that seem like they should be private, but are being used externally.
`modules_poc/modules.yaml` to move them. `modules_poc/modules.yaml` to move them.
2. If there is already a public API that callers should use instead, mark it as 2. If there is already a public API that callers should use instead, mark it as
`USE_REPLACEMENT(better_api)`. The argument accepts any C++ tokens, but the intent is where `USE_REPLACEMENT(better_api)`. The argument accepts any C++ tokens, but the intent is where
possible to use the name of the replacement. This will generate a ticket for all teams using possible to use the name of the replacement. This will generate a ticket for all teams using that
that code. code.
1. If there are very few users, consider just cleaning them up. 1. If there are very few users, consider just cleaning them up.
3. Reconsider making this API public if other modules need its functionality, and this is 3. Reconsider making this API public if other modules need its functionality, and this is the only
the only way to get it. way to get it.
4. Otherwise, if there is no public API that fulfills the needs of the callers, but you 4. Otherwise, if there is no public API that fulfills the needs of the callers, but you don't want
don't want the current API to remain public long-term, use `NEEDS_REPLACEMENT`. This will the current API to remain public long-term, use `NEEDS_REPLACEMENT`. This will generate a ticket
generate a ticket for the team that owns that code. for the team that owns that code.
1. If the API was "obviously" intended to be private (eg it is in a `details` namespace) 1. If the API was "obviously" intended to be private (eg it is in a `details` namespace) and
and callers would be reasonably able to implement the functionality themselves, possibly callers would be reasonably able to implement the functionality themselves, possibly by
by writing their own version, it seems acceptable to use writing their own version, it seems acceptable to use
`USE_REPLACEMENT(do not use internal details)` `USE_REPLACEMENT(do not use internal details)`
## Caveats and Limitations ## Caveats and Limitations
**OVERARCHING GUIDELINE**: Always try to mark declarations correctly according to intent, **OVERARCHING GUIDELINE**: Always try to mark declarations correctly according to intent, even if it
even if it will not be enforced by the current tooling. This is both to provide the correct will not be enforced by the current tooling. This is both to provide the correct information to
information to human readers, as well as to avoid issues if we improve the tooling in the human readers, as well as to avoid issues if we improve the tooling in the future to eliminate these
future to eliminate these limitations limitations
The rest of this section is fairly technical and probably not necessary for most readers unless The rest of this section is fairly technical and probably not necessary for most readers unless they
they notice something "weird" going on and want to dive into why. Most of these limitations are notice something "weird" going on and want to dive into why. Most of these limitations are more
more likely to affect the core modules since most of the rest of our code does not expose APIs likely to affect the core modules since most of the rest of our code does not expose APIs via macros
via macros and templates or have APIs only consumed by templates, and those are where most of and templates or have APIs only consumed by templates, and those are where most of these issues come
these issues come up. up.
- We do not track usages of namespaces at all, only the declarations within namespaces. When - We do not track usages of namespaces at all, only the declarations within namespaces. When a
a namespace is marked with a visibility, it does not affect the visibility of the namespace namespace is marked with a visibility, it does not affect the visibility of the namespace itself
itself (since it doesn't have one), it sets the default visibility for all declarations within (since it doesn't have one), it sets the default visibility for all declarations within **that
**that namespace block**. Each time a namespace is reopened it is a separate block and the namespace block**. Each time a namespace is reopened it is a separate block and the visibility
visibility markers on other blocks of the same namespace do not apply. markers on other blocks of the same namespace do not apply.
- The scanner only knows about declarations that it sees being used. For implementation reasons, - The scanner only knows about declarations that it sees being used. For implementation reasons, it
it only discovers declarations by seeing what every usage is using. This can either cause or be only discovers declarations by seeing what every usage is using. This can either cause or be
caused by other limitations. caused by other limitations.
- Usages in templates may not be seen. This is especially the case for "dependent types and - Usages in templates may not be seen. This is especially the case for "dependent types and values"
values" which are things that are not known by the compiler before the template is instantiated. which are things that are not known by the compiler before the template is instantiated.
- This is a problem for functions where any arguments are dependent if it can't figure out - This is a problem for functions where any arguments are dependent if it can't figure out which
which overload will be selected. It is even worse for free-functions called unqualified overload will be selected. It is even worse for free-functions called unqualified (`f(blah)`
(`f(blah)` rather than `ns::f(blah)` or `x.f(blah)`) since due to ADL, overload resolution rather than `ns::f(blah)` or `x.f(blah)`) since due to ADL, overload resolution is _always_
is _always_ delayed for them. delayed for them.
- Everything that results from a macro expansion is treated as-if it was written at the point - Everything that results from a macro expansion is treated as-if it was written at the point of
of expansion. This applies to both declarations and usages. If you have an API that should expansion. This applies to both declarations and usages. If you have an API that should only be
only be used via the defined macros, mark it as `MOD_PUBLIC_FOR_TECHNICAL_REASONS` to signal used via the defined macros, mark it as `MOD_PUBLIC_FOR_TECHNICAL_REASONS` to signal to readers
to readers that they should avoid direct usage, even if the tooling won't prevent it. We may that they should avoid direct usage, even if the tooling won't prevent it. We may improve this in
improve this in the future. the future.
- Template variables are completely ignored due to some unfortunate clang bugs. Still, try - Template variables are completely ignored due to some unfortunate clang bugs. Still, try to mark
to mark them correctly since we may change this in the future. them correctly since we may change this in the future.
- Method calls are assigned to the static type at the call site. This has two important effects: - Method calls are assigned to the static type at the call site. This has two important effects:
- A subclass's overridden method may seem unused if it is only used via calls through a base - A subclass's overridden method may seem unused if it is only used via calls through a base class
class pointer/reference pointer/reference
- Calls through a base class pointer/reference count as calls of that class's method, not of - Calls through a base class pointer/reference count as calls of that class's method, not of the
the interface's interface's
- Defaulted members (methods, ctors, dtors) are treated as usages of the class itself, - Defaulted members (methods, ctors, dtors) are treated as usages of the class itself, regardless of
regardless of whether they implicitly or explicitly defaulted. This is because clang does not whether they implicitly or explicitly defaulted. This is because clang does not provide an API to
provide an API to distinguish between those cases. distinguish between those cases.
- Template normalization woes: we try really hard to report declarations as the template - Template normalization woes: we try really hard to report declarations as the template `foo<T>`
`foo<T>` rather than separate instantiations like `foo<int>`, `foo<string>`, etc, **unless** rather than separate instantiations like `foo<int>`, `foo<string>`, etc, **unless** they are
they are explicitly specialized, meaning that the instantiation has its own definition different explicitly specialized, meaning that the instantiation has its own definition different from the
from the main template. Unfortunately, clang does a bad job at this and we have a number of main template. Unfortunately, clang does a bad job at this and we have a number of kludgy
kludgy workarounds. The most important effects: workarounds. The most important effects:
- Explicit specializations of function and variable templates are ignored and always converted - Explicit specializations of function and variable templates are ignored and always converted to
to the primary template. the primary template.
- We do treat explicit specializations of types as separate (using the heuristic of having a - We do treat explicit specializations of types as separate (using the heuristic of having a
separate location than the main template), because they can have a different shape and API than separate location than the main template), because they can have a different shape and API than
the main template. In general they should probably have the same visibility though, unless the the main template. In general they should probably have the same visibility though, unless the
instantiation is using a private type which should be unavailable to consumers anyway. instantiation is using a private type which should be unavailable to consumers anyway.
- Clang assigns many locations to the site of explicit template instantiations and extern - Clang assigns many locations to the site of explicit template instantiations and extern template
template declarations, even when there is a better location that it can see. Luckily these declarations, even when there is a better location that it can see. Luckily these are fairly
are fairly rare. rare.
- Sometimes clang reports the resolved destination of `using` declarations and type alias, but - Sometimes clang reports the resolved destination of `using` declarations and type alias, but
usually it reports the `using` declaration itself. A few notable cases (these are trends and usually it reports the `using` declaration itself. A few notable cases (these are trends and may
may not be absolute\!) not be absolute\!)
- `using Base::foo;` to expose a member of a base class is resolved as a usage of `Base::foo` - `using Base::foo;` to expose a member of a base class is resolved as a usage of `Base::foo`
rather than `Derived::foo`. This is especially notable when the `Base` class is intended to be rather than `Derived::foo`. This is especially notable when the `Base` class is intended to be a
a private implementation detail. You will need to mark all exposed methods as public. private implementation detail. You will need to mark all exposed methods as public.
- `using Base::Base;` to pull in the base constructors is the opposite and is recorded as a - `using Base::Base;` to pull in the base constructors is the opposite and is recorded as a usage
usage of `Derived::Base(args)`, which is odd because such a declaration doesn't actually exist. of `Derived::Base(args)`, which is odd because such a declaration doesn't actually exist.
- Internal/details namespaces (currently defined as matching the regex `(detail|internal)s?$`) - Internal/details namespaces (currently defined as matching the regex `(detail|internal)s?$`)
implicitly have implicit default visibility of private if `modules.h` is included. It is not implicitly have implicit default visibility of private if `modules.h` is included. It is not
possible to give the namespace a public visibility, but you can restrict it further with possible to give the namespace a public visibility, but you can restrict it further with
`FILE_PRIVATE`. If you want declarations inside it to be usable from outside your module you `FILE_PRIVATE`. If you want declarations inside it to be usable from outside your module you must
must mark children of the namespace explicitly, or rename it to not use a name that implies mark children of the namespace explicitly, or rename it to not use a name that implies that it is
that it is for internal usage only. A somewhat common case will be marking internal declarations for internal usage only. A somewhat common case will be marking internal declarations that are
that are only intended to be used via macros with `PUBLIC_FOR_TECHNICAL_REASONS`. only intended to be used via macros with `PUBLIC_FOR_TECHNICAL_REASONS`.
- Be very careful with forward declarations. Try to avoid them wherever possible (unless there - Be very careful with forward declarations. Try to avoid them wherever possible (unless there is a
is a significant benefit). Especially avoid forward declaring anything from another module\! significant benefit). Especially avoid forward declaring anything from another module\! Where
Where forward declarations must be used, make sure that they have the same visibility as the forward declarations must be used, make sure that they have the same visibility as the real
real definition. As an exception, if every TU that sees the forward declaration will also see definition. As an exception, if every TU that sees the forward declaration will also see the
the definition it is OK to omit marking the forward definition. This may happen when they are definition it is OK to omit marking the forward definition. This may happen when they are both in
both in the same header, or the forward declaration is in a private implementation detail header the same header, or the forward declaration is in a private implementation detail header which is
which is included by the defining header. Be aware of the implicit visibility marking which also included by the defining header. Be aware of the implicit visibility marking which also applies to
applies to forward declaration, if they are the only declaration seen in the TU. forward declaration, if they are the only declaration seen in the TU.
- Never forward declare functions to avoid including a header. They are much more problematic - Never forward declare functions to avoid including a header. They are much more problematic than
than types, both in general in C++ and specifically for this tooling. types, both in general in C++ and specifically for this tooling.
- We try to use the definition location for types defined in headers, but the "canonical" - We try to use the definition location for types defined in headers, but the "canonical" location
location (clang's term for the first declaration seen in the current TU) for everything else. (clang's term for the first declaration seen in the current TU) for everything else. If the type
If the type is defined in a .cpp, we use the canonical location. is defined in a .cpp, we use the canonical location.
- We only consider declarations in headers, never in .cpp files. - We only consider declarations in headers, never in .cpp files.
- Be mindful of `_forTest` functions. They default to `FILE_PRIVATE` since they are typically - Be mindful of `_forTest` functions. They default to `FILE_PRIVATE` since they are typically
intended only for use when testing the type they are defined on, not when testing consumers. intended only for use when testing the type they are defined on, not when testing consumers. In
In the cases where they _are_ intended as part of the API for testing consumers, you can the cases where they _are_ intended as part of the API for testing consumers, you can explicitly
explicitly mark them `PUBLIC` or `PRIVATE` depending on whether they should be usable from mark them `PUBLIC` or `PRIVATE` depending on whether they should be usable from outside your
outside your module or not. module or not.
- Things used implicitly (eg implicit conversion operators) are still counted as usages even - Things used implicitly (eg implicit conversion operators) are still counted as usages even if they
if they are not specifically named at the call site are not specifically named at the call site
- When merging information from multiple TUs, definitions always replace the metadata gathered - When merging information from multiple TUs, definitions always replace the metadata gathered from
from TUs that only saw a declaration. TUs that only saw a declaration.
- Note that we aren't guaranteed to see every definition, in particular for functions that - Note that we aren't guaranteed to see every definition, in particular for functions that are not
are not called from the TU that they are defined in. So this cannot be used to find places called from the TU that they are defined in. So this cannot be used to find places where we
where we deleted the definition but forgot to delete the declaration (we wouldn't see them deleted the definition but forgot to delete the declaration (we wouldn't see them anyway, since
anyway, since we only track things that are used, and undefined things can't really be used, we only track things that are used, and undefined things can't really be used, except trivially,
except trivially, without breaking the build). without breaking the build).
- `private` members of classes are implicitly `PRIVATE`, and must be explicitly marked otherwise - `private` members of classes are implicitly `PRIVATE`, and must be explicitly marked otherwise if
if desired. They should probably never be made `PUBLIC` since that implies cross-module desired. They should probably never be made `PUBLIC` since that implies cross-module friendship.
friendship. In the few places where we have that today, they have been made one of the flavors In the few places where we have that today, they have been made one of the flavors of
of unfortunately public: `NEEDS_REPLACEMENT` or `USE_INSTEAD`. unfortunately public: `NEEDS_REPLACEMENT` or `USE_INSTEAD`.
- `public` members of `private` types do not inherit the implicit `PRIVATE` and follow the - `public` members of `private` types do not inherit the implicit `PRIVATE` and follow the normal
normal rule of looking for their nearest semantic parent with an explicit marker. That means rule of looking for their nearest semantic parent with an explicit marker. That means that they
that they may be `PUBLIC`. However, the language rules still apply and as long as an may be `PUBLIC`. However, the language rules still apply and as long as an instance of the type
instance of the type is never handed to consumers they will have no way of accessing those is never handed to consumers they will have no way of accessing those members.
members.
- `protected` members do not default to `PRIVATE`, but because we only allow subclassing from - `protected` members do not default to `PRIVATE`, but because we only allow subclassing from
`OPEN` classes, the language visibility rules will disallow access from outside the module `OPEN` classes, the language visibility rules will disallow access from outside the module
unless you choose to allow it by use `OPEN` classes or `friend`s. Note that making any unless you choose to allow it by use `OPEN` classes or `friend`s. Note that making any subclass
subclass `OPEN` exposes all `protected` members of parents unless they are marked `PRIVATE`. `OPEN` exposes all `protected` members of parents unless they are marked `PRIVATE`.
- `friend` declarations are mostly ignored, except when they are a definition. So the - `friend` declarations are mostly ignored, except when they are a definition. So the definitions
definitions using the "hidden friend" pattern are tracked, but we ignore it if the definition using the "hidden friend" pattern are tracked, but we ignore it if the definition is in a cpp
is in a cpp file. file.
[^1]: [^1]:
Clang distinguishes between "semantic" and "lexical" parents. The primary differences Clang distinguishes between "semantic" and "lexical" parents. The primary differences are that
are that members of classes (including member types) are semantic children of the class even members of classes (including member types) are semantic children of the class even when defined
when defined out of line, and conversely `friend` declarations are not, and instead are out of line, and conversely `friend` declarations are not, and instead are considered semantic
considered semantic children of the nearest namespace. children of the nearest namespace.

View File

@ -2,15 +2,20 @@
## ALLOWED_UNOWNED_FILES.yml File Format ## ALLOWED_UNOWNED_FILES.yml File Format
This file is for repos that require all files be owned. Some files may be listed here as an exception and will be added to the end of the CODEOWNERS. This file is for repos that require all files be owned. Some files may be listed here as an
exception and will be added to the end of the CODEOWNERS.
`version` is the current version of the `ALLOWED_UNOWNED_FILES.yml` file format. The only version is `1.0.0`. `version` is the current version of the `ALLOWED_UNOWNED_FILES.yml` file format. The only version is
`1.0.0`.
`filters` are a list of filters that each have a `filter` and `justificaiton` field. `filters` are a list of filters that each have a `filter` and `justificaiton` field.
`filter` is a file path. This file path must start with a `/` and is relative to the root repo directory. Directories or globs are not supported at the moment to ensure careful selection of files allowed to be unowned. This can be reconsidered if proper usecases appear. `filter` is a file path. This file path must start with a `/` and is relative to the root repo
directory. Directories or globs are not supported at the moment to ensure careful selection of files
allowed to be unowned. This can be reconsidered if proper usecases appear.
`justification` is the reason why this file should be unowned. A common case is that this is a generated file that has checks in CI to ensure it is in the correct format. `justification` is the reason why this file should be unowned. A common case is that this is a
generated file that has checks in CI to ensure it is in the correct format.
### Example file ### Example file
@ -23,7 +28,8 @@ filters: # List of all filters
### Configuration ### Configuration
This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your `.bazelrc` file: This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your
`.bazelrc` file:
``` ```
common --define codeowners_have_allowed_unowned_files=True common --define codeowners_have_allowed_unowned_files=True

View File

@ -15,7 +15,8 @@ Banned owners should be separated by newlines. Empty lines and lines starting wi
### Configuration ### Configuration
This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your `.bazelrc` file: This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your
`.bazelrc` file:
``` ```
common --define codeowners_have_banned_codeowners=True common --define codeowners_have_banned_codeowners=True

View File

@ -1,23 +1,40 @@
# Code Owners # Code Owners
After modifying any OWNERS files, the overall ownership database (`.github/CODEOWNERS`) must be rebuilt. After modifying any OWNERS files, the overall ownership database (`.github/CODEOWNERS`) must be
This is done by running `bazel run codeowners`. rebuilt. This is done by running `bazel run codeowners`.
## OWNERS.yml File Format ## OWNERS.yml File Format
This is loosely based on [kubernetes](https://www.kubernetes.dev/docs/guide/owners/) and [chromium](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/code_reviews.md) OWNERS files. This is loosely based on [kubernetes](https://www.kubernetes.dev/docs/guide/owners/) and
[chromium](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/code_reviews.md) OWNERS files.
`version` is the current version of the `OWNERS.yml` file format. The latest version is `2.0.0`. For previous versions, see the [changelog](#owners-changelog). `version` is the current version of the `OWNERS.yml` file format. The latest version is `2.0.0`. For
previous versions, see the [changelog](#owners-changelog).
`aliases` point to yaml files files that list aliases that can be used in this OWNERS.yml file. `aliases` point to yaml files files that list aliases that can be used in this OWNERS.yml file.
`filters` are a list of globs that match [gitignore syntax](https://git-scm.com/docs/gitignore#_pattern_format). The filter must match at least once file and be unique to the file. Each filter must have a list of `approvers`. An approval from any single approver will allow the code to be merged. `NOOWNER` can be specified to mark a filter as unowned. Each filter can optionally have a `metadata` tag. Inside that tag a user can put whatever tags they want. We have reserved two meaningful tags `emeritus_approvers` and `owning_team`. This is not an exhaustive list and more documented and undocumented options can be added later. There is no linting done on the metadata tag. `filters` are a list of globs that match
[gitignore syntax](https://git-scm.com/docs/gitignore#_pattern_format). The filter must match at
least once file and be unique to the file. Each filter must have a list of `approvers`. An approval
from any single approver will allow the code to be merged. `NOOWNER` can be specified to mark a
filter as unowned. Each filter can optionally have a `metadata` tag. Inside that tag a user can put
whatever tags they want. We have reserved two meaningful tags `emeritus_approvers` and
`owning_team`. This is not an exhaustive list and more documented and undocumented options can be
added later. There is no linting done on the metadata tag.
`emeritus_approvers` are folks that used to be approvers that no longer have approver privileges. This allows us to keep track of folks who built up a knowledge base of this code that might need to be consulted in a critical situation. Both `approvers` and `emeritus_approvers` should be either github usernames, emails, or aliases. `emeritus_approvers` are folks that used to be approvers that no longer have approver privileges.
This allows us to keep track of folks who built up a knowledge base of this code that might need to
be consulted in a critical situation. Both `approvers` and `emeritus_approvers` should be either
github usernames, emails, or aliases.
`owning_team` is a team that owns the files, however this team does not have approval privileges. Instead this team should be looked to for asking questions. This metadata can also be used programmatically to, for example, generate a report of all the files owned by a particular team, even though that team has nominated specific engineers as approvers. `owning_team` is a team that owns the files, however this team does not have approval privileges.
Instead this team should be looked to for asking questions. This metadata can also be used
programmatically to, for example, generate a report of all the files owned by a particular team,
even though that team has nominated specific engineers as approvers.
`options` are not required and are various options about how to use this OWNERS.yml file. Currently there is only a single option `no_parent_owners` which is defaulted to false. If this option is set to true it will stop upwards OWNERS resolution. `options` are not required and are various options about how to use this OWNERS.yml file. Currently
there is only a single option `no_parent_owners` which is defaulted to false. If this option is set
to true it will stop upwards OWNERS resolution.
### Example file ### Example file
@ -70,7 +87,8 @@ options: # All options for this file
`version` is the current version of the aliases file format. This should always be `1.0.0`. `version` is the current version of the aliases file format. This should always be `1.0.0`.
`aliases` are a list of group names. Each group name must have one or more reviewers. Reviewers should be github usernames. `aliases` are a list of group names. Each group name must have one or more reviewers. Reviewers
should be github usernames.
## Example File ## Example File
@ -133,18 +151,26 @@ filters:
### Example 1 ### Example 1
If someone changes `a/b/c/file.py` the owner resolution will select teamC since the first file searched is `a/b/c/OWNERS.yml` First we compare if `file.py` matches `*.md`. It does not so we now check if `file.py` matches `*`. It does match so teamC is selected for review. If someone changes `a/b/c/file.py` the owner resolution will select teamC since the first file
searched is `a/b/c/OWNERS.yml` First we compare if `file.py` matches `*.md`. It does not so we now
check if `file.py` matches `*`. It does match so teamC is selected for review.
### Example 2 ### Example 2
If someone changes `a/b/c/file.yaml` the owner resolution will not find a team. The first file searched is `a/b/c/OWNERS.yml`. No filters match file.yaml. Next we search in `a/b/OWNERS.yml`. No filters match there either. We stop searching up because `no_parent_owners` is set to true. If someone changes `a/b/c/file.yaml` the owner resolution will not find a team. The first file
searched is `a/b/c/OWNERS.yml`. No filters match file.yaml. Next we search in `a/b/OWNERS.yml`. No
filters match there either. We stop searching up because `no_parent_owners` is set to true.
## OWNERS Changelog ## OWNERS Changelog
### v2.0.0 ### v2.0.0
See the [previous version](https://github.com/mongodb/mongo/blob/79590effe86c471cc15d91c6785599ec2085d7c0/docs/owners/owners_format.md) of this documentation for details on v1.0.0. See the
[previous version](https://github.com/mongodb/mongo/blob/79590effe86c471cc15d91c6785599ec2085d7c0/docs/owners/owners_format.md)
of this documentation for details on v1.0.0.
Patterns without a slash are no longer prepended with `**/` to make them apply recursively. If you want your pattern you apply recursively you must add the `**/` yourself now. Patterns without a slash are no longer prepended with `**/` to make them apply recursively. If you
want your pattern you apply recursively you must add the `**/` yourself now.
The `*` pattern is now resolved as the directory name to ensure it applies recursively by default. You can use the `/*` pattern to only match inside the current directory. The `*` pattern is now resolved as the directory name to ensure it applies recursively by default.
You can use the `/*` pattern to only match inside the current directory.

View File

@ -12,16 +12,16 @@ To find the correct binary for a specific log you need to:
curl -O http://s3.amazonaws.com/downloads.mongodb.org/linux/mongodb-linux-x86_64-debugsymbols-1.x.x.tgz curl -O http://s3.amazonaws.com/downloads.mongodb.org/linux/mongodb-linux-x86_64-debugsymbols-1.x.x.tgz
``` ```
You can also get the debugsymbols archive for official builds through [the Downloads page][1]. In the You can also get the debugsymbols archive for official builds through [the Downloads page][1]. In
Archived Releases section, click on the appropriate platform link to view the available archives. the Archived Releases section, click on the appropriate platform link to view the available
Select the appropriate debug symbols archive. archives. Select the appropriate debug symbols archive.
## Using mongosymb.py to get file and line numbers ## Using mongosymb.py to get file and line numbers
Stacktraces are logged on a line with `msg` `BACKTRACE`. The full backtrace contents are available in Stacktraces are logged on a line with `msg` `BACKTRACE`. The full backtrace contents are available
an attribute named `bt`. To convert this into a list of source locations with file and line numbers, in an attribute named `bt`. To convert this into a list of source locations with file and line
copy the contents of the `bt` JSON blob into a file, then direct the contents of that file into numbers, copy the contents of the `bt` JSON blob into a file, then direct the contents of that file
the standard input of `buildscripts/mongosymb.py`: into the standard input of `buildscripts/mongosymb.py`:
``` ```
cat bt | buildscripts/mongosymb.py --debug-file-resolver=path path/to/debug/symbols/file cat bt | buildscripts/mongosymb.py --debug-file-resolver=path path/to/debug/symbols/file
@ -55,8 +55,8 @@ $ cat bt | buildscripts/mongosymb.py --debug-file-resolver=path bazel-bin/instal
## Stack Trace Schema ## Stack Trace Schema
Stack traces are typically logged as log message 31380, having a `bt` attribute Stack traces are typically logged as log message 31380, having a `bt` attribute that holds a JSON
that holds a JSON object value: object value:
```json ```json
"bt": { "bt": {
@ -86,10 +86,9 @@ that holds a JSON object value:
} }
``` ```
The "processInfo" subobject has other information about the process, but The "processInfo" subobject has other information about the process, but the most important thing
the most important thing for the stack trace is the "somap", which is an for the stack trace is the "somap", which is an array of all dynamically linked ELF files, including
array of all dynamically linked ELF files, including the main executable, the main executable, and where in memory they were loaded.
and where in memory they were loaded.
Partial example showing a few typical frames: Partial example showing a few typical frames:

View File

@ -2,27 +2,55 @@
## Project Impetus ## Project Impetus
We frequently encounter Python errors that are caused by a python dependency author updating their package that is backward breaking. The following tickets are a few examples of this happening: We frequently encounter Python errors that are caused by a python dependency author updating their
[SERVER-79126](https://jira.mongodb.org/browse/SERVER-79126), [SERVER-79798](https://jira.mongodb.org/browse/SERVER-79798), [SERVER-53348](https://jira.mongodb.org/browse/SERVER-53348), [SERVER-57036](https://jira.mongodb.org/browse/SERVER-57036), [SERVER-44579](https://jira.mongodb.org/browse/SERVER-44579), [SERVER-70845](https://jira.mongodb.org/browse/SERVER-70845), [SERVER-63974](https://jira.mongodb.org/browse/SERVER-63974), [SERVER-61791](https://jira.mongodb.org/browse/SERVER-61791), and [SERVER-60950](https://jira.mongodb.org/browse/SERVER-60950). We have always known this was a problem and have known there was a way to fix it. We finally had the bandwidth to tackle this problem. package that is backward breaking. The following tickets are a few examples of this happening:
[SERVER-79126](https://jira.mongodb.org/browse/SERVER-79126),
[SERVER-79798](https://jira.mongodb.org/browse/SERVER-79798),
[SERVER-53348](https://jira.mongodb.org/browse/SERVER-53348),
[SERVER-57036](https://jira.mongodb.org/browse/SERVER-57036),
[SERVER-44579](https://jira.mongodb.org/browse/SERVER-44579),
[SERVER-70845](https://jira.mongodb.org/browse/SERVER-70845),
[SERVER-63974](https://jira.mongodb.org/browse/SERVER-63974),
[SERVER-61791](https://jira.mongodb.org/browse/SERVER-61791), and
[SERVER-60950](https://jira.mongodb.org/browse/SERVER-60950). We have always known this was a
problem and have known there was a way to fix it. We finally had the bandwidth to tackle this
problem.
## Project Prework ## Project Prework
First, we wanted to test out using poetry so we converted mongo-container project to use poetry [SERVER-76974](https://jira.mongodb.org/browse/SERVER-76974). This showed promise and we considered this a green light to move forward on converting the server python to use poetry. First, we wanted to test out using poetry so we converted mongo-container project to use poetry
[SERVER-76974](https://jira.mongodb.org/browse/SERVER-76974). This showed promise and we considered
this a green light to move forward on converting the server python to use poetry.
Before we could start the project we had to upgrade python to a version that was not EoL. This work is captured in [SERVER-72262](https://jira.mongodb.org/browse/SERVER-72262). We upgraded python to 3.10 on every system except windows. Windows could not be upgraded due to a test problem relating to some cipher suites [SERVER-79172](https://jira.mongodb.org/browse/SERVER-79172). Before we could start the project we had to upgrade python to a version that was not EoL. This work
is captured in [SERVER-72262](https://jira.mongodb.org/browse/SERVER-72262). We upgraded python to
3.10 on every system except windows. Windows could not be upgraded due to a test problem relating to
some cipher suites [SERVER-79172](https://jira.mongodb.org/browse/SERVER-79172).
## Conversion to Poetry ## Conversion to Poetry
After the prework was done we wrote, tested, and merged [SERVER-76751](https://jira.mongodb.org/browse/SERVER-76751) which is converting the mongo python dependencies to poetry. This ticket had an absurd amount of dependencies and required a significant amount of patch builds. The total number of changes was pretty small but it affected a lot of different projects. After the prework was done we wrote, tested, and merged
[SERVER-76751](https://jira.mongodb.org/browse/SERVER-76751) which is converting the mongo python
dependencies to poetry. This ticket had an absurd amount of dependencies and required a significant
amount of patch builds. The total number of changes was pretty small but it affected a lot of
different projects.
Knowing there was a lot this touched we expected to see some bugs and were quick to try to fix them. Some of these were caught before merge and some were caught after. Knowing there was a lot this touched we expected to see some bugs and were quick to try to fix them.
Some of these were caught before merge and some were caught after.
[BUILD-17860](https://jira.mongodb.org/browse/BUILD-17860) required the build team to rebuild python on macosx arm. This was caught before merging. [BUILD-17860](https://jira.mongodb.org/browse/BUILD-17860) required the build team to rebuild python
on macosx arm. This was caught before merging.
[SERVER-81122](https://jira.mongodb.org/browse/SERVER-81122) found that poetry broke the spawnhost script. This was caught after merge. [SERVER-81122](https://jira.mongodb.org/browse/SERVER-81122) found that poetry broke the spawnhost
script. This was caught after merge.
[SERVER-81061](https://jira.mongodb.org/browse/SERVER-81061) and [BF-29909](https://jira.mongodb.org/browse/BF-29909) were found by sys-perf since they run their own build and do not use the standard build process. Therefore it was very hard to test for this one. This was caught post merge. [SERVER-81061](https://jira.mongodb.org/browse/SERVER-81061) and
[BF-29909](https://jira.mongodb.org/browse/BF-29909) were found by sys-perf since they run their own
build and do not use the standard build process. Therefore it was very hard to test for this one.
This was caught post merge.
[SERVER-80799](https://jira.mongodb.org/browse/SERVER-80799) found that poetry broke mongo tooling metrics collection (not OTel). This was only found since an engineer on the team saw this bug in the code. This was caught post merge. [SERVER-80799](https://jira.mongodb.org/browse/SERVER-80799) found that poetry broke mongo tooling
metrics collection (not OTel). This was only found since an engineer on the team saw this bug in the
code. This was caught post merge.
Overall, when changing something so foundational it is inevitable that some things will break. Overall, when changing something so foundational it is inevitable that some things will break.

View File

@ -1,10 +1,10 @@
# PrimaryOnlyService # PrimaryOnlyService
The PrimaryOnlyService machinery provides a way to register tasks that should run only when current The PrimaryOnlyService machinery provides a way to register tasks that should run only when current
node is Primary, and should be driven to completion across replica set failovers on the new node is Primary, and should be driven to completion across replica set failovers on the new Primary.
Primary. It is intended to be used by tasks that can be modeled as a state machine with a single It is intended to be used by tasks that can be modeled as a state machine with a single MongoDB
MongoDB document containing the current state, which newly-elected Primaries can use to rebuild the document containing the current state, which newly-elected Primaries can use to rebuild the state of
state of the task after failover and pick up where the old Primary left off. the task after failover and pick up where the old Primary left off.
## Classes ## Classes
@ -62,16 +62,17 @@ what state it is in and thus what work still needs to be performed, and what wor
completed by the previous Primary. completed by the previous Primary.
To see an example bare-bones PrimaryOnlyService implementation to use as a reference, check out the To see an example bare-bones PrimaryOnlyService implementation to use as a reference, check out the
TestService defined in this unit test: https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/primary_only_service_test.cpp TestService defined in this unit test:
https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/primary_only_service_test.cpp
## Behavior during state transitions ## Behavior during state transitions
At stepUp, each PrimaryOnlyService queries its state document collection, and for each document At stepUp, each PrimaryOnlyService queries its state document collection, and for each document
found, creates and launches a PrimaryOnlyService::Instance initialized off of the state found, creates and launches a PrimaryOnlyService::Instance initialized off of the state document.
document. This happens asynchronously relative to the core replication stepUp process - there is no This happens asynchronously relative to the core replication stepUp process - there is no guarantee
guarantee that when stepUp completes and the RSTL lock is dropped that the PrimaryOnlyServices have that when stepUp completes and the RSTL lock is dropped that the PrimaryOnlyServices have finished
finished rebuilding all their Instances. At stepDown all Instances are interrupted, but the threads rebuilding all their Instances. At stepDown all Instances are interrupted, but the threads running
running their work are not joined, and the Instance objects containing their in-memory state are not their work are not joined, and the Instance objects containing their in-memory state are not
released, until the next stepUp. This is done to reduce the likelihood of blocking within the state released, until the next stepUp. This is done to reduce the likelihood of blocking within the state
transition process and delaying it for the entire node. This behavior does, however, guarantee that transition process and delaying it for the entire node. This behavior does, however, guarantee that
there will never be two Instances of the same PrimaryOnlyService with the same InstanceID running at there will never be two Instances of the same PrimaryOnlyService with the same InstanceID running at

View File

@ -1,11 +1,14 @@
# Priority port support # Priority port support
`mongod` and `mongos` support a dedicated **priority port** intended for **internal, high-priority operations** such as automation monitoring, MongoTune, and critical intra-cluster replication traffic. `mongod` and `mongos` support a dedicated **priority port** intended for **internal, high-priority
operations** such as automation monitoring, MongoTune, and critical intra-cluster replication
traffic.
With a priority port configured: With a priority port configured:
- The database listens on a second TCP port in addition to the main port. - The database listens on a second TCP port in addition to the main port.
- Connections accepted on the priority port are exempt from connection limits, connection establishment rate limiting, and ingress request rate limiting. - Connections accepted on the priority port are exempt from connection limits, connection
establishment rate limiting, and ingress request rate limiting.
- gRPC is not supported. - gRPC is not supported.
The feature is **disabled by default**. The feature is **disabled by default**.
@ -35,7 +38,8 @@ net:
When the transport layer starts: When the transport layer starts:
- A **separate listener thread** is created for the priority port in the ASIO transport layer. - A **separate listener thread** is created for the priority port in the ASIO transport layer.
- Sessions created from the priority port are tagged so downstream code can distinguish them from main-port sessions (similar to the load balancer port implementation). - Sessions created from the priority port are tagged so downstream code can distinguish them from
main-port sessions (similar to the load balancer port implementation).
--- ---
@ -47,27 +51,33 @@ Priority-port connections differ from normal connections in several ways.
When a new connection is accepted: When a new connection is accepted:
- Connections from the priority port are treated as **limit-exempt** in the session manager, reusing the existing exemption machinery used for CIDR-based exemptions. - Connections from the priority port are treated as **limit-exempt** in the session manager, reusing
the existing exemption machinery used for CIDR-based exemptions.
- These connections can continue to be created even when the normal connection limit is reached. - These connections can continue to be created even when the normal connection limit is reached.
Metrics: Metrics:
- `serverStatus.connections.priority` counts current connections on the priority port only. - `serverStatus.connections.priority` counts current connections on the priority port only.
- These connections are also included in `connections.limitExempt` (along with CIDR-based exemptions). - These connections are also included in `connections.limitExempt` (along with CIDR-based
exemptions).
## Rate limiters ## Rate limiters
Two ingress-side rate limiters recognize priority-port exemptions: Two ingress-side rate limiters recognize priority-port exemptions:
- [**SessionEstablishmentRateLimiter**](../src/mongo/db/admission/README.md#session-establishment-rate-limiter) (connection establishment) - [**SessionEstablishmentRateLimiter**](../src/mongo/db/admission/README.md#session-establishment-rate-limiter)
- [**IngressRequestRateLimiter**](../src/mongo/db/admission/README.md#ingress-request-rate-limiting) (request rate limiting) (connection establishment)
- [**IngressRequestRateLimiter**](../src/mongo/db/admission/README.md#ingress-request-rate-limiting)
(request rate limiting)
## Logging and profiling ## Logging and profiling
For observability and debugging, the server records whether an operation came through the priority port: For observability and debugging, the server records whether an operation came through the priority
port:
- `CurOp` / currentOp output includes a flag indicating the connection is from the priority port. - `CurOp` / currentOp output includes a flag indicating the connection is from the priority port.
- Slow query log and profiler entries include whether the operation was executed via a priority-port connection. - Slow query log and profiler entries include whether the operation was executed via a priority-port
connection.
- Client summary reports also distinguish clients on the main vs priority port. - Client summary reports also distinguish clients on the main vs priority port.
--- ---
@ -79,7 +89,8 @@ For observability and debugging, the server records whether an operation came th
To connect to a replica set via the priority port, a user must: To connect to a replica set via the priority port, a user must:
- Use a connection string that points directly at a specific host and priority port. - Use a connection string that points directly at a specific host and priority port.
- Set `directConnection=true` to disable SDAM and prevent the driver from using hello-based host discovery, which currently does not advertise the priority port. - Set `directConnection=true` to disable SDAM and prevent the driver from using hello-based host
discovery, which currently does not advertise the priority port.
Example: Example:
@ -92,11 +103,14 @@ mongodb://hostA:27018/?directConnection=true
For `mongos`: For `mongos`:
- You may connect directly to the `mongos` priority port. - You may connect directly to the `mongos` priority port.
- `directConnection=true` is **not required** for `mongos` connections, since SDAM is not used in the same way. - `directConnection=true` is **not required** for `mongos` connections, since SDAM is not used in
the same way.
Important limitation: Important limitation:
- **Priority does not automatically propagate**: - **Priority does not automatically propagate**:
- If a client connects to a `mongos` via the priority port and `mongos` forwards a command to shards, those shard-side connections still use the main ports and do **not** inherit priority-port behavior in the current implementation. - If a client connects to a `mongos` via the priority port and `mongos` forwards a command to
shards, those shard-side connections still use the main ports and do **not** inherit
priority-port behavior in the current implementation.
--- ---

View File

@ -37,9 +37,9 @@ Users can set or modify a server parameter at startup and/or runtime, depending
specified for `set_at`. For instance, `logLevel` may be set at both startup and runtime, as specified for `set_at`. For instance, `logLevel` may be set at both startup and runtime, as
indicated by `set_at` (see the above code snippet). indicated by `set_at` (see the above code snippet).
At startup, server parameters may be set using the `--setParameter` command line option. At startup, server parameters may be set using the `--setParameter` command line option. At runtime,
At runtime, the `setParameter` command may be used to modify server parameters. the `setParameter` command may be used to modify server parameters. See the [`setParameter`
See the [`setParameter` documentation][set-parameter] for details. documentation][set-parameter] for details.
## How to get the value provided for a parameter ## How to get the value provided for a parameter
@ -99,27 +99,28 @@ must be unique across the server instance. More information on the specific fiel
- `set_at` (required): Must contain the value `startup`, `runtime`, [`startup`, `runtime`], or - `set_at` (required): Must contain the value `startup`, `runtime`, [`startup`, `runtime`], or
`cluster`. If `runtime` is specified along with `cpp_varname`, then `decltype(cpp_varname)` must `cluster`. If `runtime` is specified along with `cpp_varname`, then `decltype(cpp_varname)` must
refer to a thread-safe storage type, specifically: `Atomic<T>`, `std::atomic<T>`, refer to a thread-safe storage type, specifically: `Atomic<T>`, `std::atomic<T>`, or
or `boost::synchronized<T>`. Parameters declared as `cluster` can only be set at runtime and exhibit `boost::synchronized<T>`. Parameters declared as `cluster` can only be set at runtime and exhibit
numerous differences. See [Cluster Server Parameters](cluster-server-parameters) below. numerous differences. See [Cluster Server Parameters](cluster-server-parameters) below.
- `description` (required): Free-form text field currently used only for commenting the generated C++ - `description` (required): Free-form text field currently used only for commenting the generated
code. Future uses may preserve this value for a possible `{listSetParameters:1}` command or other C++ code. Future uses may preserve this value for a possible `{listSetParameters:1}` command or
programmatic and potentially user-facing purposes. other programmatic and potentially user-facing purposes.
- `cpp_vartype`: Declares the full storage type. If `cpp_vartype` is not defined, it may be inferred - `cpp_vartype`: Declares the full storage type. If `cpp_vartype` is not defined, it may be inferred
from the C++ variable referenced by `cpp_varname`. from the C++ variable referenced by `cpp_varname`.
- `cpp_varname`: Declares the underlying variable or C++ `struct` member to use when setting or reading the - `cpp_varname`: Declares the underlying variable or C++ `struct` member to use when setting or
server parameter. If defined together with `cpp_vartype`, the storage will be declared as a global reading the server parameter. If defined together with `cpp_vartype`, the storage will be declared
variable, and externed in the generated header file. If defined alone, a variable of this name will as a global variable, and externed in the generated header file. If defined alone, a variable of
assume to have been declared and defined by the implementer, and its type will be automatically this name will assume to have been declared and defined by the implementer, and its type will be
inferred at compile time. If `cpp_varname` is not defined, then `cpp_class` must be specified. automatically inferred at compile time. If `cpp_varname` is not defined, then `cpp_class` must be
specified.
- `cpp_class`: Declares a custom `ServerParameter` class in the generated header using the provided - `cpp_class`: Declares a custom `ServerParameter` class in the generated header using the provided
string, or the name field in the associated map. The declared class will require an implementation string, or the name field in the associated map. The declared class will require an implementation
of `setFromString()`, and optionally `set()`, `append()`, and a constructor. of `setFromString()`, and optionally `set()`, `append()`, and a constructor. See
See [Specialized Server Parameters](#specialized-server-parameters) below. [Specialized Server Parameters](#specialized-server-parameters) below.
- `default`: String or expression map representation of the initial value. - `default`: String or expression map representation of the initial value.
@ -127,10 +128,10 @@ must be unique across the server instance. More information on the specific fiel
This is a required field and must be explicitly set to `false` to disable redaction. This is a required field and must be explicitly set to `false` to disable redaction.
- `omit_in_ftdc`: Only applies to cluster parameters. If set to `true`, then the cluster parameter - `omit_in_ftdc`: Only applies to cluster parameters. If set to `true`, then the cluster parameter
will be omitted when `getClusterParameter` is invoked with `omitInFTDC: true`. will be omitted when `getClusterParameter` is invoked with `omitInFTDC: true`. In practice, FTDC
In practice, FTDC runs `getClusterParameter` with this option periodically to runs `getClusterParameter` with this option periodically to collect configuration metadata about
collect configuration metadata about the server and setting this flag to true the server and setting this flag to true for a cluster parameter ensures that its value(s) will
for a cluster parameter ensures that its value(s) will not be exposed in FTDC. not be exposed in FTDC.
- `test_only`: Set to `true` to disable this set parameter if `enableTestCommands` is not specified. - `test_only`: Set to `true` to disable this set parameter if `enableTestCommands` is not specified.
@ -141,26 +142,27 @@ must be unique across the server instance. More information on the specific fiel
new value has been stored. Prototype: `Status(const cpp_vartype&);` new value has been stored. Prototype: `Status(const cpp_vartype&);`
- `condition`: Up to five conditional rules for deciding whether or not to apply this server - `condition`: Up to five conditional rules for deciding whether or not to apply this server
parameter. `preprocessor` will be evaluated first, followed by `constexpr`, then finally `expr`. If parameter. `preprocessor` will be evaluated first, followed by `constexpr`, then finally `expr`.
no provided setting evaluates to `false`, the server parameter will be registered. `feature_flag` and If no provided setting evaluates to `false`, the server parameter will be registered.
`min_fcv` are evaluated after the parameter is registered, and instead affect whether the parameter `feature_flag` and `min_fcv` are evaluated after the parameter is registered, and instead affect
is enabled. `min_fcv` is a string of the form `X.Y`, representing the minimum FCV version for which whether the parameter is enabled. `min_fcv` is a string of the form `X.Y`, representing the
this parameter should be enabled. `feature_flag` is the name of a feature flag variable upon which minimum FCV version for which this parameter should be enabled. `feature_flag` is the name of a
this server parameter depends -- if the feature flag is disabled, this parameter will be disabled. feature flag variable upon which this server parameter depends -- if the feature flag is disabled,
`feature_flag` should be removed when all other instances of that feature flag are deleted, which this parameter will be disabled. `feature_flag` should be removed when all other instances of that
typically is done after the next LTS version of the server is branched. `min_fcv` should be removed feature flag are deleted, which typically is done after the next LTS version of the server is
after it is no longer possible to downgrade to a FCV lower than that version - this occurs when the branched. `min_fcv` should be removed after it is no longer possible to downgrade to a FCV lower
next LTS version of the server is branched. than that version - this occurs when the next LTS version of the server is branched.
- `validator`: Zero or many validation rules to impose on the setting. All specified rules must pass - `validator`: Zero or many validation rules to impose on the setting. All specified rules must pass
to consider the new setting valid. `lt`, `gt`, `lte`, `gte` fields provide for simple numeric limits to consider the new setting valid. `lt`, `gt`, `lte`, `gte` fields provide for simple numeric
or expression maps which evaluate to numeric values. For all other validation cases, specify limits or expression maps which evaluate to numeric values. For all other validation cases,
callback as a C++ function or static method. Note that validation rules (including callback) may run specify callback as a C++ function or static method. Note that validation rules (including
in any order. To perform an action after all validation rules have completed, `on_update` should be callback) may run in any order. To perform an action after all validation rules have completed,
preferred instead. Callback prototype: `Status(const cpp_vartype&, const boost::optional<TenantId>&);` `on_update` should be preferred instead. Callback prototype:
`Status(const cpp_vartype&, const boost::optional<TenantId>&);`
- `is_deprecated`: Mark the server parameter as deprecated. Warns users if the server parameter - `is_deprecated`: Mark the server parameter as deprecated. Warns users if the server parameter is
is ever used. Defaults to false. ever used. Defaults to false.
Any symbols such as global variables or callbacks used by a server parameter must be imported using Any symbols such as global variables or callbacks used by a server parameter must be imported using
the usual IDL machinery via `globals.cpp_includes`. Similarly, all generated code will be nested the usual IDL machinery via `globals.cpp_includes`. Similarly, all generated code will be nested
@ -240,9 +242,8 @@ to any other work, this custom constructor must invoke its parent's constructor.
Status {name}::set(const BSONElement& val, const boost::optional<TenantId>& tenantId); Status {name}::set(const BSONElement& val, const boost::optional<TenantId>& tenantId);
``` ```
Otherwise the base class implementation `ServerParameter::set` is used. It Otherwise the base class implementation `ServerParameter::set` is used. It invokes `setFromString`
invokes `setFromString` using a string representation of `val`, if the `val` is using a string representation of `val`, if the `val` is holding one of the supported types.
holding one of the supported types.
`override_validate`: If `true`, the implementer must provide a `validate` member function as: `override_validate`: If `true`, the implementer must provide a `validate` member function as:
@ -261,8 +262,8 @@ must be provided with the following signature:
Status {name}::append(OperationContext*, BSONObjBuilder*, StringData, const boost::optional<TenantId>& tenantId); Status {name}::append(OperationContext*, BSONObjBuilder*, StringData, const boost::optional<TenantId>& tenantId);
``` ```
`override_warn_if_deprecated`: If `true`, allows a custom warnIfDeprecated() method to be defined, defaults `override_warn_if_deprecated`: If `true`, allows a custom warnIfDeprecated() method to be defined,
to `false`. defaults to `false`.
Lastly, a `setFromString` method must always be provided with the following signature: Lastly, a `setFromString` method must always be provided with the following signature:
@ -318,17 +319,17 @@ preferred to implementing custom parameter propagation whenever possible.
`setClusterParameter` persists the new value of the indicated cluster server parameter onto a `setClusterParameter` persists the new value of the indicated cluster server parameter onto a
majority of nodes on non-sharded replica sets. On sharded clusters, it majority-writes the new value majority of nodes on non-sharded replica sets. On sharded clusters, it majority-writes the new value
onto every shard and the config server. This ensures that every **mongod** in the cluster will be able onto every shard and the config server. This ensures that every **mongod** in the cluster will be
to recover the most recently written value for all cluster server parameters on restart. able to recover the most recently written value for all cluster server parameters on restart.
Additionally, `setClusterParameter` blocks until the majority write succeeds in a replica set Additionally, `setClusterParameter` blocks until the majority write succeeds in a replica set
deployment, which guarantees that the parameter value will not be rolled back after being set. deployment, which guarantees that the parameter value will not be rolled back after being set. In a
In a sharded cluster deployment, the new value has to be majority-committed on the config shard and sharded cluster deployment, the new value has to be majority-committed on the config shard and
locally-committed on all other shards. locally-committed on all other shards.
The cluster parameters are persisted in the `config.clusterParameters` collections and cached in The cluster parameters are persisted in the `config.clusterParameters` collections and cached in
memory on every **mongod**. The cache updates are done by the `ClusterServerParameterOpObserver` class. memory on every **mongod**. The cache updates are done by the `ClusterServerParameterOpObserver`
Every **mongos** also maintains an in-memory cache by polling the config server for updated cluster class. Every **mongos** also maintains an in-memory cache by polling the config server for updated
server parameter values every `clusterServerParameterRefreshIntervalSecs` using the cluster server parameter values every `clusterServerParameterRefreshIntervalSecs` using the
`ClusterParameterRefresher` periodic job. `ClusterParameterRefresher` periodic job.
`getClusterParameter` returns the cached value of the requested cluster server parameter on the node `getClusterParameter` returns the cached value of the requested cluster server parameter on the node
@ -347,10 +348,10 @@ following members to the resulting type:
was updated; used by runtime audit configuration, and to prevent concurrent and redundant cluster was updated; used by runtime audit configuration, and to prevent concurrent and redundant cluster
parameter updates. parameter updates.
It is highly recommended to specify validation rules or a callback function via the `param.validator` It is highly recommended to specify validation rules or a callback function via the
field. These validators are called before the new value of the cluster server parameter is written `param.validator` field. These validators are called before the new value of the cluster server
to disk during `setClusterParameter`. parameter is written to disk during `setClusterParameter`. See
See [server_parameter_with_storage_test.idl][cluster-server-param-with-storage-test] and [server_parameter_with_storage_test.idl][cluster-server-param-with-storage-test] and
[server_parameter_with_storage_test_structs.idl][cluster-server-param-with-storage-test-structs] for [server_parameter_with_storage_test_structs.idl][cluster-server-param-with-storage-test-structs] for
examples. examples.
@ -394,21 +395,21 @@ Tue `reset()` method must be implemented and should update the cluster server pa
default value. default value.
All cluster server parameters are tenant-aware, meaning that on serverless clusters, each tenant has All cluster server parameters are tenant-aware, meaning that on serverless clusters, each tenant has
an isolated set of parameters. The `setClusterParameter` and `getClusterParameter` commands will pass an isolated set of parameters. The `setClusterParameter` and `getClusterParameter` commands will
the `tenantId` on the command request to the `ServerParameter`'s methods. On dedicated pass the `tenantId` on the command request to the `ServerParameter`'s methods. On dedicated
(non-serverless) clusters, `boost::none` will be passed. IDL-defined cluster server parameters will (non-serverless) clusters, `boost::none` will be passed. IDL-defined cluster server parameters will
handle the passed-in `tenantId` automatically and store separate parameter values per-tenant. handle the passed-in `tenantId` automatically and store separate parameter values per-tenant.
Specialized server parameters will have to take care to correctly handle the passed-in `tenantId` and Specialized server parameters will have to take care to correctly handle the passed-in `tenantId`
to enforce tenant isolation. and to enforce tenant isolation.
Like normal server parameters, cluster server parameters can be defined to be dependent on a minimum Like normal server parameters, cluster server parameters can be defined to be dependent on a minimum
FCV version or a specific feature flag using the `condition.min_fcv` and `condition.feature_flag` syntax discussed FCV version or a specific feature flag using the `condition.min_fcv` and `condition.feature_flag`
above. During FCV downgrade, the cluster parameter value stored on disk will be deleted if either: syntax discussed above. During FCV downgrade, the cluster parameter value stored on disk will be
(1) The downgraded FCV is lower than the cluster parameter's `min_fcv`, or (2) The cluster deleted if either: (1) The downgraded FCV is lower than the cluster parameter's `min_fcv`, or (2)
parameter's `feature_flag` is disabled on the downgraded FCV. While a cluster server parameter is The cluster parameter's `feature_flag` is disabled on the downgraded FCV. While a cluster server
disabled due to either of these conditions, `setClusterParameter` on it will always fail, and parameter is disabled due to either of these conditions, `setClusterParameter` on it will always
`getClusterParameter` will fail on **mongod**, and return the default value on **mongos** -- this fail, and `getClusterParameter` will fail on **mongod**, and return the default value on **mongos**
difference in behavior is due to **mongos** being unaware of the current FCV. -- this difference in behavior is due to **mongos** being unaware of the current FCV.
See [server_parameter_specialized_test.idl][specialized-cluster-server-param-test-idl] and See [server_parameter_specialized_test.idl][specialized-cluster-server-param-test-idl] and
[server_parameter_specialized_test.h][specialized-cluster-server-param-test-data] for examples. [server_parameter_specialized_test.h][specialized-cluster-server-param-test-data] for examples.
@ -582,9 +583,11 @@ classDiagram
[parameters.idl]: ../src/mongo/db/commands/parameters.idl [parameters.idl]: ../src/mongo/db/commands/parameters.idl
[set-parameter]: https://docs.mongodb.com/manual/reference/parameters/#synopsis [set-parameter]: https://docs.mongodb.com/manual/reference/parameters/#synopsis
[get-parameter]: https://docs.mongodb.com/manual/reference/command/getParameter/#getparameter [get-parameter]: https://docs.mongodb.com/manual/reference/command/getParameter/#getparameter
[quiet-param]: https://github.com/mongodb/mongo/search?q=serverGlobalParams+quiet+extension:idl&type=code [quiet-param]:
https://github.com/mongodb/mongo/search?q=serverGlobalParams+quiet+extension:idl&type=code
[ftdc-file-size-param]: ../src/mongo/db/ftdc/ftdc_server.idl [ftdc-file-size-param]: ../src/mongo/db/ftdc/ftdc_server.idl
[cluster-server-param-with-storage-test]: ../src/mongo/idl/server_parameter_with_storage_test.idl [cluster-server-param-with-storage-test]: ../src/mongo/idl/server_parameter_with_storage_test.idl
[cluster-server-param-with-storage-test-structs]: ../src/mongo/idl/server_parameter_with_storage_test_structs.idl [cluster-server-param-with-storage-test-structs]:
../src/mongo/idl/server_parameter_with_storage_test_structs.idl
[specialized-cluster-server-param-test-idl]: ../src/mongo/idl/server_parameter_specialized_test.idl [specialized-cluster-server-param-test-idl]: ../src/mongo/idl/server_parameter_specialized_test.idl
[specialized-cluster-server-param-test-data]: ../src/mongo/idl/server_parameter_specialized_test.h [specialized-cluster-server-param-test-data]: ../src/mongo/idl/server_parameter_specialized_test.h

View File

@ -1,7 +1,7 @@
# Test Commands # Test Commands
All test commands are denoted with the `.testOnly()` modifier to the `MONGO_REGISTER_COMMAND` invocation. All test commands are denoted with the `.testOnly()` modifier to the `MONGO_REGISTER_COMMAND`
For example: invocation. For example:
```c++ ```c++
MONGO_REGISTER_COMMAND(EchoCommand).testOnly(); MONGO_REGISTER_COMMAND(EchoCommand).testOnly();
@ -9,9 +9,9 @@ MONGO_REGISTER_COMMAND(EchoCommand).testOnly();
## How to enable ## How to enable
To be able to run these commands, the server must be started with the `enableTestCommands=1` To be able to run these commands, the server must be started with the `enableTestCommands=1` server
server parameter (e.g. `--setParameter enableTestCommands=1`). Resmoke.py often sets this server parameter (e.g. `--setParameter enableTestCommands=1`). Resmoke.py often sets this server parameter
parameter for testing. for testing.
## Examples ## Examples

View File

@ -1,7 +1,7 @@
# Testing # Testing
Most tests for MongoDB are run through resmoke, our test runner and orchestration tool. Most tests for MongoDB are run through resmoke, our test runner and orchestration tool. The entry
The entry point for resmoke can be found at `buildscripts/resmoke.py` point for resmoke can be found at `buildscripts/resmoke.py`
## Concepts ## Concepts
@ -9,9 +9,12 @@ Learn more about related topics using their own targeted documentation:
- [resmoke](../../buildscripts/resmokelib/README.md), the test runner - [resmoke](../../buildscripts/resmokelib/README.md), the test runner
- [suites](../../buildscripts/resmokeconfig/suites/README.md), how tests are grouped and configured - [suites](../../buildscripts/resmokeconfig/suites/README.md), how tests are grouped and configured
- [fixtures](../../buildscripts/resmokelib/testing/fixtures/README.md), specify the server topology that tests run against - [fixtures](../../buildscripts/resmokelib/testing/fixtures/README.md), specify the server topology
- [hooks](../../buildscripts/resmokelib/testing/hooks/README.md), logic to run before, after and/or between individual tests that tests run against
- [testcases](../../buildscripts/resmokelib/testing/testcases/README.md), Python-based unittest interfaces that resmoke can run as different "kinds" of tests. - [hooks](../../buildscripts/resmokelib/testing/hooks/README.md), logic to run before, after and/or
between individual tests
- [testcases](../../buildscripts/resmokelib/testing/testcases/README.md), Python-based unittest
interfaces that resmoke can run as different "kinds" of tests.
## Basic Example ## Basic Example
@ -35,4 +38,7 @@ Now, **run the test content** from one test file:
buildscripts/resmoke.py run --suites=no_passthrough jstests/noPassthrough/shell/js/string.js buildscripts/resmoke.py run --suites=no_passthrough jstests/noPassthrough/shell/js/string.js
``` ```
The suite defined in [buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml) includes that `string.js` file via glob selections, specifies no fixtures, no hooks, and a minimal config for the executor. The suite defined in
[buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml)
includes that `string.js` file via glob selections, specifies no fixtures, no hooks, and a minimal
config for the executor.

View File

@ -2,80 +2,69 @@
## Overview ## Overview
The FSM tests are meant to exercise concurrency within MongoDB. The suite The FSM tests are meant to exercise concurrency within MongoDB. The suite consists of workloads,
consists of workloads, which define discrete units of work as states in a FSM, which define discrete units of work as states in a FSM, and runners, which define which tests to run
and runners, which define which tests to run and how they should be run. Each and how they should be run. Each workload defines states, which are JS functions that perform some
workload defines states, which are JS functions that perform some meaningful meaningful series of tasks and assertions, and transitions, which define how to move between those
series of tasks and assertions, and transitions, which define how to move states. A single workload begins by executing its setup function, which is called once during the
between those states. A single workload begins by executing its setup function, runner's thread of execution. Next, the runner generates the number of threads specified by the
which is called once during the runner's thread of execution. Next, the runner workload, and each spawned thread executes the start state (typically named "init") defined by the
generates the number of threads specified by the workload, and each spawned workload. From this point on, each worker thread executes its own independent copy of the FSM, and
thread executes the start state (typically named "init") defined by the will randomly move between states (after executing the function) based on the probabilities defined
workload. From this point on, each worker thread executes its own independent in the workload's transition table. Each worker thread continues doing so until the number of
copy of the FSM, and will randomly move between states (after executing the transitions it makes has reached the number of iterations defined by the workload. Once all the
function) based on the probabilities defined in the workload's transition table. worker threads have finished, the runner executes the workload's teardown function.
Each worker thread continues doing so until the number of transitions it makes
has reached the number of iterations defined by the workload. Once all the
worker threads have finished, the runner executes the workload's teardown
function.
![fsm.png](../images/testing/fsm.png) ![fsm.png](../images/testing/fsm.png)
The runner provides two modes of execution for workloads: serial and parallel. The runner provides two modes of execution for workloads: serial and parallel. Serial mode runs the
Serial mode runs the provided workloads one after the other, provided workloads one after the other, waiting for all threads of a workload to complete before
waiting for all threads of a workload to complete before moving on to the next moving on to the next workload. Parallel mode runs subsets of the provided workloads in separate
workload. Parallel mode runs subsets of the provided workloads in separate
threads simultaneously. threads simultaneously.
New methods were added to allow for finer-grained assertions under different New methods were added to allow for finer-grained assertions under different situations. For
situations. For example, a test that inserts a document into a collection, and example, a test that inserts a document into a collection, and wants to assert its existence will
wants to assert its existence will fail if another test removes that document. fail if another test removes that document. One option would have been to disable all assertions
One option would have been to disable all assertions when running a mixture of when running a mixture of different workloads together, but doing so would make the system incapable
different workloads together, but doing so would make the system incapable of of detecting anything other than server crashes. Another option would have been to design the
detecting anything other than server crashes. Another option would have been to workloads to be conflict-free (e.g. writing to separate collections, using commutative operators),
design the workloads to be conflict-free (e.g. writing to separate collections, but this would leave large gaps in the achievable test coverage. Neither of those options were found
using commutative operators), but this would leave large gaps in the achievable to be very appealing. Instead, we chose to introduce the concept of an "assertion level" that acts
test coverage. Neither of those options were found to be very appealing. as a precondition for when an assertion is evaluated. This allows us to still make some assertions,
Instead, we chose to introduce the concept of an "assertion level" that acts as even when running a mixture of different workloads together. There are three assertion levels:
a precondition for when an assertion is evaluated. This allows us to still make `ALWAYS`, `OWN_COLL`, and `OWN_DB`. They can be thought of as follows:
some assertions, even when running a mixture of different workloads together.
There are three assertion levels: `ALWAYS`, `OWN_COLL`, and `OWN_DB`. They can
be thought of as follows:
- `ALWAYS`: A statement that remains unequivocally true, regardless of what - `ALWAYS`: A statement that remains unequivocally true, regardless of what another workload might
another workload might be doing to the collection I was given (hint: think be doing to the collection I was given (hint: think defensively). Examples include "1 = 1" or
defensively). Examples include "1 = 1" or inserting a document into a inserting a document into a collection (disregarding any unique indices).
collection (disregarding any unique indices).
- `OWN_COLL`: A statement that is true only if I am the only workload operating - `OWN_COLL`: A statement that is true only if I am the only workload operating on the collection I
on the collection I was given. Examples include counting the number of was given. Examples include counting the number of documents in a collection or updating a
documents in a collection or updating a previously inserted document. previously inserted document.
- `OWN_DB`: A statement that is true only if I am the only workload operating on - `OWN_DB`: A statement that is true only if I am the only workload operating on the database I was
the database I was given. Examples include renaming a collection or verifying given. Examples include renaming a collection or verifying that a collection is capped. The
that a collection is capped. The workload typically relies on the use of workload typically relies on the use of another collection aside from the one given.
another collection aside from the one given.
## Creating your own workload ## Creating your own workload
All workloads are stored in `jstests/concurrency/fsm_workloads` and as specific All workloads are stored in `jstests/concurrency/fsm_workloads` and as specific examples you can
examples you can refer to refer to
1. `jstests/concurrency/fsm_example.js` 1. `jstests/concurrency/fsm_example.js`
1. `jstests/concurrency/fsm_example_inheritance.js` 1. `jstests/concurrency/fsm_example_inheritance.js`
for writing new workloads. Every workload is loaded in as inline JavaScript for writing new workloads. Every workload is loaded in as inline JavaScript using the "load"
using the "load" function, which is a lot more like a `#include` than function, which is a lot more like a `#include` than `require.js`. This means that whatever
`require.js`. This means that whatever variables are declared in the global variables are declared in the global scope of the file will become part of the scope where load is
scope of the file will become part of the scope where load is called. The runner called. The runner will be looking for a variable called `$config` which will store the
will be looking for a variable called `$config` which will store the
configuration of your workload. configuration of your workload.
### The $config object ### The $config object
There should be exactly one `$config` per workload. For style consistency as There should be exactly one `$config` per workload. For style consistency as well as safety, be sure
well as safety, be sure to wrap the value of `$config` in an anonymous function. to wrap the value of `$config` in an anonymous function. This will create a JS closure and a new
This will create a JS closure and a new scope: scope:
```javascript ```javascript
$config = (function() { $config = (function() {
@ -93,19 +82,17 @@ $config = (function() {
)(); )();
``` ```
When finished executing, `$config` must return an object containing the properties When finished executing, `$config` must return an object containing the properties above (some of
above (some of which are optional, see below). which are optional, see below).
### Defining states ### Defining states
It's best to also declare states within its own closure so as not to interfere It's best to also declare states within its own closure so as not to interfere with the scope of
with the scope of $config. Each state takes two arguments, the db object and the $config. Each state takes two arguments, the db object and the collection name. For later, note that
collection name. For later, note that this db and collection are the only one this db and collection are the only one that you can be guaranteed to "own" when asserting. Try to
that you can be guaranteed to "own" when asserting. Try to make each state a make each state a discrete unit of work that can stand alone without the other states. Additionally,
discrete unit of work that can stand alone without the other states. try to define each function that makes up a state with a name as opposed to anonymously - this makes
Additionally, try to define each function that makes up a state easier to read backtraces when things go wrong.
with a name as opposed to anonymously - this makes easier to read backtraces
when things go wrong.
```javascript ```javascript
$config = (function () { $config = (function () {
@ -146,14 +133,12 @@ $config = (function () {
### Defining transitions ### Defining transitions
The transitions object defines the probabilities of moving from one state to a The transitions object defines the probabilities of moving from one state to a different state. When
different state. When a state's function is finished executing, the FSM randomly a state's function is finished executing, the FSM randomly chooses the next state using the
chooses the next state using the probabilities provided in the transitions probabilities provided in the transitions object. The probabilities of the transitions object do not
object. The probabilities of the transitions object do not necessarily need to necessarily need to sum to 1.0, since the mechanism for choosing the next state uses normalized
sum to 1.0, since the mechanism for choosing the next state uses normalized random values. Here it is not necessary to use a separate closure. In the example below, we're
random values. Here it is not necessary to use a separate closure. In the denoting an equal probability of moving to either of the scan states from the init state:
example below, we're denoting an equal probability of moving to either of the
scan states from the init state:
```javascript ```javascript
$config = (function () { $config = (function () {
@ -174,15 +159,13 @@ $config = (function () {
### Setup and teardown functions ### Setup and teardown functions
The setup and teardown functions are special in that they'll only be executed in The setup and teardown functions are special in that they'll only be executed in one thread. See the
one thread. See the Runners section for more information about when they're run Runners section for more information about when they're run relative to other workloads in various
relative to other workloads in various modes. The setup and teardown functions modes. The setup and teardown functions take three arguments: db, coll, and cluster. The setup
take three arguments: db, coll, and cluster. The setup function (and function (and corresponding teardown) should perform most of the initialization your workload needs,
corresponding teardown) should perform most of the initialization your workload for example setting parameters on the server, adding seed data, or setting up indexes. Note that
needs, for example setting parameters on the server, adding seed data, or rather than executing adminCommands (and others) against the provided `db` you should use the
setting up indexes. Note that rather than executing adminCommands (and others) provided `cluster.executeOnMongodNodes` and `cluster.executeOnMongosNodes` functionality.
against the provided `db` you should use the provided
`cluster.executeOnMongodNodes` and `cluster.executeOnMongosNodes` functionality.
```javascript ```javascript
$config = (function () { $config = (function () {
@ -224,18 +207,16 @@ $config = (function () {
### The `data` object ### The `data` object
The `data` object preserves information between different states of an FSM within The `data` object preserves information between different states of an FSM within an individual
an individual thread. Within a single state, the data object becomes the 'this' thread. Within a single state, the data object becomes the 'this' context in which the state
context in which the state executes. Additionally, a tid attribute is added to executes. Additionally, a tid attribute is added to data by the runner to allow each thread to
data by the runner to allow each thread to access a unique ID. Data is usually access a unique ID. Data is usually defined above states inside the config, but left below it in the
defined above states inside the config, but left below it in the returned returned object. Data is also available as the 'this' context in setup and teardown functions. Note
object. Data is also available as the 'this' context in setup and teardown that once the FSM begins, the context data that was passed to the setup function is copied into each
functions. Note that once the FSM begins, the context data that was passed to thread - meaning each thread has its own copy of the data and modifications to data will not be
the setup function is copied into each thread - meaning each thread has its own passed back to the teardown function outside of what was changed in setup. Additionally, in
copy of the data and modifications to data will not be passed back to the composition, each workload has its own data, meaning you don't have to worry about properties being
teardown function outside of what was changed in setup. Additionally, in overridden by workloads other than the current one.
composition, each workload has its own data, meaning you don't have to worry
about properties being overridden by workloads other than the current one.
```javascript ```javascript
$config = (function () { $config = (function () {
@ -255,57 +236,50 @@ $config = (function () {
#### `threadCount` #### `threadCount`
threadCount is the number of threads that will be used to run your workload in threadCount is the number of threads that will be used to run your workload in Serial and Parallel
Serial and Parallel modes. In both modes, the number of threads you provide will modes. In both modes, the number of threads you provide will execute the FSM simultaneously, cycling
execute the FSM simultaneously, cycling through different states of the through different states of the workload. Note that in serial mode, no other threads will be running
workload. Note that in serial mode, no other threads will be running outside of outside of those pertaining to this workload, and in parallel mode, other workloads will also be
those pertaining to this workload, and in parallel mode, other workloads will given threads to execute their FSM. In some cases in parallel mode, this number will be scaled down
also be given threads to execute their FSM. In some cases in parallel mode, this to make sure that all workloads can fit within the number of threads available due to system or
number will be scaled down to make sure that all workloads can fit within the performance constraints.
number of threads available due to system or performance constraints.
#### `iterations` #### `iterations`
This is just the number of states the FSM will go through before exiting. NOTE: This is just the number of states the FSM will go through before exiting. NOTE: it is _not_ the
it is _not_ the number of times each state will be executed. number of times each state will be executed.
#### `startState` (optional) #### `startState` (optional)
Default value is 'init'. If your workload does not have an init state than you Default value is 'init'. If your workload does not have an init state than you must specify in which
must specify in which state to begin. state to begin.
### Workload helpers ### Workload helpers
`jstests/concurrency/fsm_workload_helpers` contains a few files that you can `jstests/concurrency/fsm_workload_helpers` contains a few files that you can include using 'load' at
include using 'load' at the top of a workload. These provide auxiliary the top of a workload. These provide auxiliary functionality that might be necessary for some
functionality that might be necessary for some workloads. The most important of workloads. The most important of which is probably server_types.js
which is probably server_types.js
#### server_types.js #### server_types.js
This helper file contains four functions: isMongos, isMongod, isMMAPv1, and This helper file contains four functions: isMongos, isMongod, isMMAPv1, and isWiredTiger. These can
isWiredTiger. These can be used to restrict operations on different be used to restrict operations on different functionality available in sharded environments, as well
functionality available in sharded environments, as well as based on storage as based on storage engine, and work as you would expect. One thing to note is that before calling
engine, and work as you would expect. One thing to note is that before calling either isMMAPv1 or isWiredTiger, first verify isMongod. When special casing functionality for
either isMMAPv1 or isWiredTiger, first verify isMongod. When special casing sharded environments or storage engines, try to special case a test for the exceptionality while
functionality for sharded environments or storage engines, try to special case a still leaving in place assertions for either case.
test for the exceptionality while still leaving in place assertions for either
case.
#### indexed_noindex.js #### indexed_noindex.js
This helper can be used along with inheritance, to create a workload that is This helper can be used along with inheritance, to create a workload that is exactly the same as an
exactly the same as an existing workload, but with the index created during existing workload, but with the index created during setup removed. In order to use this replace the
setup removed. In order to use this replace the function you provide to the function you provide to the extendWorkload function with indexedNoindex. Additionally, ensure that
extendWorkload function with indexedNoindex. Additionally, ensure that the the workload you are extending has a function in its data object called "getIndexSpec" that returns
workload you are extending has a function in its data object called the spec for the index to be removed.
"getIndexSpec" that returns the spec for the index to be removed.
```javascript ```javascript
import {extendWorkload} from "jstests/concurrency/fsm_libs/extend_workload.js"; import {extendWorkload} from "jstests/concurrency/fsm_libs/extend_workload.js";
load( load("jstests/concurrency/fsm_workload_modifiers/collection_write_path/indexed_noindex.js"); // for indexedNoindex
"jstests/concurrency/fsm_workload_modifiers/collection_write_path/indexed_noindex.js",
); // for indexedNoindex
import {$config as $baseConfig} from "jstests/concurrency/fsm_workloads/workload_with_index.js"; import {$config as $baseConfig} from "jstests/concurrency/fsm_workloads/workload_with_index.js";
export const $config = extendWorkload($baseConfig, indexedNoIndex); export const $config = extendWorkload($baseConfig, indexedNoIndex);
@ -313,90 +287,80 @@ export const $config = extendWorkload($baseConfig, indexedNoIndex);
#### drop_utils.js #### drop_utils.js
These helpers provide safe methods for dropping collections, databases, roles, These helpers provide safe methods for dropping collections, databases, roles, and users created
and users created during a workload's execution. The methods take a regular during a workload's execution. The methods take a regular expression that the collection, database,
expression that the collection, database, role, or user name must match for it role, or user name must match for it to be dropped. Prefixing the items in any of these categories
to be dropped. Prefixing the items in any of these categories you create with a you create with a prefix defined by your workload name is a good idea since the workload file name
prefix defined by your workload name is a good idea since the workload file name can be assumed unique and will allow you to only affect your workload in these cases.
can be assumed unique and will allow you to only affect your workload in these
cases.
## Test runners ## Test runners
By default, all runners below are allowed to open a maximum of By default, all runners below are allowed to open a maximum of `maxAllowedConnections` (= 100 by
`maxAllowedConnections` (= 100 by default) explicit connections. In replicated default) explicit connections. In replicated and sharded environments, implicit connections are
and sharded environments, implicit connections are created to the original created to the original mongod provided to the mongo shell executing the runner (one for each
mongod provided to the mongo shell executing the runner (one for each thread). thread). This behavior cannot be controlled, but it highlights the importance of always using the db
This behavior cannot be controlled, but it highlights the importance of always object provided in the FSM states rather than the global db which will always correspond to the
using the db object provided in the FSM states rather than the global db which mongod the mongo shell initially connected to.
will always correspond to the mongod the mongo shell initially connected to.
### Execution modes ### Execution modes
#### Serial #### Serial
Serial is the simplest of all three modes and basically works as explained Serial is the simplest of all three modes and basically works as explained above. Setup is run
above. Setup is run single threaded, data is copied into multiple threads where single threaded, data is copied into multiple threads where the states are executed, and once all
the states are executed, and once all the threads have finished a teardown the threads have finished a teardown function is run and the runner moves onto the next workload.
function is run and the runner moves onto the next workload.
![fsm_serial_example.png](../images/testing/fsm_serial_example.png) ![fsm_serial_example.png](../images/testing/fsm_serial_example.png)
#### Parallel (Simultaneous) #### Parallel (Simultaneous)
In parallel or simultaneous mode (the naming convention has been slightly In parallel or simultaneous mode (the naming convention has been slightly inconsistent), the
inconsistent), the ordering becomes a little different. All workloads have their ordering becomes a little different. All workloads have their setup functions run, then threads are
setup functions run, then threads are spawned for each workload, and once they spawned for each workload, and once they all complete, all threads have their teardown function run.
all complete, all threads have their teardown function run.
![fsm_simultaneous_example.png](../images/testing/fsm_simultaneous_example.png) ![fsm_simultaneous_example.png](../images/testing/fsm_simultaneous_example.png)
### Existing runners ### Existing runners
The existing runners all use `jstests/concurrency/fsm_libs/runner.js` to The existing runners all use `jstests/concurrency/fsm_libs/runner.js` to actually execute the
actually execute the workloads. Most information about arguments and available workloads. Most information about arguments and available runWorkloads methods can be found by
runWorkloads methods can be found by inspecting the source. Below you can find inspecting the source. Below you can find the existing runners explained. The first argument to the
the existing runners explained. The first argument to the three runWorkloads three runWorkloads methods (each corresponding to a different run mode), is an array of workload
methods (each corresponding to a different run mode), is an array of workload files to run. clusterOptions, the second argument to the runWorkloads functions, is explained in the
files to run. clusterOptions, the second argument to the runWorkloads functions, other components section below. Execution options for runWorkloads functions, the third argument,
is explained in the other components section below. Execution options for can contain the following options (some depend on the run mode):
runWorkloads functions, the third argument, can contain the following options
(some depend on the run mode):
- `numSubsets` - Not available in serial mode, determines how many subsets of - `numSubsets` - Not available in serial mode, determines how many subsets of workloads to execute
workloads to execute in parallel mode in parallel mode
- `subsetSize` - Not available in serial mode, determines how large each subset of - `subsetSize` - Not available in serial mode, determines how large each subset of workloads
workloads executed is executed is
#### fsm_all.js #### fsm_all.js
Runs all workloads serially. For each workload, `$config.threadCount` threads Runs all workloads serially. For each workload, `$config.threadCount` threads are spawned and each
are spawned and each thread runs for exactly `$config.iterations` steps starting thread runs for exactly `$config.iterations` steps starting at `$config.startState` and
at `$config.startState` and transitioning to other states based on the transitioning to other states based on the transition probabilities defined in $config.transitions.
transition probabilities defined in $config.transitions.
#### fsm_all_simultaneous.js #### fsm_all_simultaneous.js
options: numSubsets, subsetSize options: numSubsets, subsetSize
Runs numSubsets subsets of size subsetSize of all workloads. The workloads in Runs numSubsets subsets of size subsetSize of all workloads. The workloads in each subset are
each subset are started in parallel and each workload is run according to started in parallel and each workload is run according to settings in `$config`.
settings in `$config`.
#### fsm_all_replication.js #### fsm_all_replication.js
Sets up a replica set (with 3 mongods by default) and runs workloads serially or Sets up a replica set (with 3 mongods by default) and runs workloads serially or in parallel. For
in parallel. For example, example,
`runWorkloadsSerially([<workload1>, <workload2>, ...], { replication: true } )` `runWorkloadsSerially([<workload1>, <workload2>, ...], { replication: true } )`
creates a replica set with 3 members and runs some workloads serially on the creates a replica set with 3 members and runs some workloads serially on the primary.
primary.
#### fsm_all_sharded.js #### fsm_all_sharded.js
Sets up a sharded cluster (with 2 shards and 1 mongos by default) and runs Sets up a sharded cluster (with 2 shards and 1 mongos by default) and runs workloads serially or in
workloads serially or in parallel. For example, parallel. For example,
`runWorkloadsInParallel([<workload1>, <workload2>, ...], { sharded: true } )` `runWorkloadsInParallel([<workload1>, <workload2>, ...], { sharded: true } )`
@ -404,36 +368,33 @@ creates a sharded cluster and runs workloads in parallel.
#### fsm_all_sharded_replication.js #### fsm_all_sharded_replication.js
Sets up a sharded cluster (with 2 shards, each having 3 replica set members, and Sets up a sharded cluster (with 2 shards, each having 3 replica set members, and 1 mongos by
1 mongos by default) and runs workloads serially or in parallel. default) and runs workloads serially or in parallel.
### Excluding a workload ### Excluding a workload
If any workloads fail because of known bugs in MongoDB, persistent MCI failures If any workloads fail because of known bugs in MongoDB, persistent MCI failures or timeouts, the
or timeouts, the troublesome workload can be excluded from running by placing it troublesome workload can be excluded from running by placing it in the exclusion array in the
in the exclusion array in the corresponding runner. Please remember to place a corresponding runner. Please remember to place a comment next to the excluded workload name
comment next to the excluded workload name identifying the reason a workload is identifying the reason a workload is being excluded. For example,
being excluded. For example,
`'agg_sort_external.js', // SERVER-16700 Deadlock on WiredTiger LSM` `'agg_sort_external.js', // SERVER-16700 Deadlock on WiredTiger LSM`
Each file should also have two predefined sections - one for known bugs and one Each file should also have two predefined sections - one for known bugs and one for restrictions.
for restrictions. The one above would be considered a known bug. However, The one above would be considered a known bug. However, excluding a compact workload from sharded
excluding a compact workload from sharded runners would be a restriction because runners would be a restriction because compact can only be run against individual mongods.
compact can only be run against individual mongods.
## Other components of the FSM library ## Other components of the FSM library
Most of these components live in jstests/concurrency/fsm_libs and provide the Most of these components live in jstests/concurrency/fsm_libs and provide the functionality used by
functionality used by the runner. the runner.
### ThreadManager ### ThreadManager
Responsible for spawning and joining worker threads. Each spawned thread is Responsible for spawning and joining worker threads. Each spawned thread is wrapped in a try/finally
wrapped in a try/finally block to ensure that the database connection implicitly block to ensure that the database connection implicitly created during the thread's execution is
created during the thread's execution is eventually closed explicitly. The eventually closed explicitly. The ThreadManager sets a random seed `([0, randInt(1e13))` which is
ThreadManager sets a random seed `([0, randInt(1e13))` which is the range of the range of `new Date().getTime())` before executing each workload.
`new Date().getTime())` before executing each workload.
### Worker Thread ### Worker Thread
@ -441,36 +402,30 @@ Thread spawned by ThreadManager and used to run a Finite State Machine.
### Cluster ### Cluster
cluster.js is responsible for providing the cluster object that is passed to cluster.js is responsible for providing the cluster object that is passed to setup and teardown
setup and teardown functions, and the initial connection to a db to be used by functions, and the initial connection to a db to be used by runner to pass to the workloads. For
runner to pass to the workloads. For anything except for standalone, it makes anything except for standalone, it makes use of the shell's built-in cluster test helpers like
use of the shell's built-in cluster test helpers like `ShardingTest` and `ShardingTest` and `ReplSetTest`. clusterOptions are passed to cluster.js for initialization.
`ReplSetTest`. clusterOptions are passed to cluster.js for initialization.
clusterOptions include: clusterOptions include:
- `replication`: boolean, whether or not to use replication in the cluster - `replication`: boolean, whether or not to use replication in the cluster
- `sameCollection`: boolean, whether or not all workloads are passed the same - `sameCollection`: boolean, whether or not all workloads are passed the same collection
collection
- `sameDB`: boolean, whether or not all workloads are passed the same DB - `sameDB`: boolean, whether or not all workloads are passed the same DB
- `setupFunctions`: object, containing at most two functions under the keys - `setupFunctions`: object, containing at most two functions under the keys 'mongod' and 'mongos'.
'mongod' and 'mongos'. This allows you to run a function against all mongod or This allows you to run a function against all mongod or mongos nodes in the cluster as part of the
mongos nodes in the cluster as part of the cluster initialization. Each cluster initialization. Each function takes a single argument, the db object against which
function takes a single argument, the db object against which configuration configuration can be run (will be set for each mongod/mongos)
can be run (will be set for each mongod/mongos)
- `sharded`: boolean, whether or not to use sharding in the cluster - `sharded`: boolean, whether or not to use sharding in the cluster
Note that sameCollection and sameDB can increase contention for a resource, but Note that sameCollection and sameDB can increase contention for a resource, but will also decrease
will also decrease the strength of the assertions by ruling out the use of OwnDB the strength of the assertions by ruling out the use of OwnDB and OwnColl assertions.
and OwnColl assertions.
### Miscellaneous Execution Notes ### Miscellaneous Execution Notes
- A `CountDownLatch` (exposed through the v8-based mongo shell, as of MongoDB 3.0) - A `CountDownLatch` (exposed through the v8-based mongo shell, as of MongoDB 3.0) is used as a
is used as a synchronization primitive by the ThreadManager to wait until all synchronization primitive by the ThreadManager to wait until all spawned threads have finished
spawned threads have finished being spawned before starting workload being spawned before starting workload execution.
execution. - If more than 20% of the threads fail while spawning, we abort the test. If fewer than 20% of the
- If more than 20% of the threads fail while spawning, we abort the test. If threads fail while spawning we allow the non-failed threads to continue with the test. The 20%
fewer than 20% of the threads fail while spawning we allow the non-failed threshold is somewhat arbitrary; the goal is to abort if "mostly all" of the threads failed but to
threads to continue with the test. The 20% threshold is somewhat arbitrary; tolerate "a few" threads failing.
the goal is to abort if "mostly all" of the threads failed but to tolerate "a
few" threads failing.

View File

@ -1,37 +1,34 @@
# Hang Analyzer # Hang Analyzer
The hang analyzer is a tool to collect cores and other information from processes The hang analyzer is a tool to collect cores and other information from processes that are suspected
that are suspected to have hung. Any task which exceeds its timeout in Evergreen to have hung. Any task which exceeds its timeout in Evergreen will automatically be hang-analyzed,
will automatically be hang-analyzed, with information being written compressed with information being written compressed and uploaded to S3.
and uploaded to S3.
The hang analyzer can also be invoked locally at any time. For all non-Jepsen The hang analyzer can also be invoked locally at any time. For all non-Jepsen tasks, the invocation
tasks, the invocation is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python`. You may need to substitute `python` with the name of the python binary is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python`. You may need to
you are using, which may be one of `python`, `python3`, or on Windows: `Python`, substitute `python` with the name of the python binary you are using, which may be one of `python`,
`Python3`. `python3`, or on Windows: `Python`, `Python3`.
For jepsen tasks, the invocation is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -p dbtest,java,mongo,mongod,mongos,python,_test`. For jepsen tasks, the invocation is
`buildscripts/resmoke.py hang-analyzer -o file -o stdout -p dbtest,java,mongo,mongod,mongos,python,_test`.
## Interesting Processes ## Interesting Processes
The hang analyzer detects and runs against processes which are considered The hang analyzer detects and runs against processes which are considered interesting.
interesting.
Tasks whose name contains "jepsen": any process whose name exactly matches one Tasks whose name contains "jepsen": any process whose name exactly matches one of
of `dbtest,java,mongo,mongod,mongos,python,_test`. `dbtest,java,mongo,mongod,mongos,python,_test`.
In all other scenarios, including local use of the hang-analyzer, an interesting In all other scenarios, including local use of the hang-analyzer, an interesting process is any of:
process is any of:
- process that starts with `python` or `live-record` - process that starts with `python` or `live-record`
- one which has been spawned as a child process of resmoke. - one which has been spawned as a child process of resmoke.
The resmoke subcommand `hang-analyzer` will send SIGUSR1/use SetEvent to signal The resmoke subcommand `hang-analyzer` will send SIGUSR1/use SetEvent to signal resmoke to:
resmoke to:
- Print stack traces for all python threads - Print stack traces for all python threads
- Collect core dumps and other information for any non-python child - Collect core dumps and other information for any non-python child processes, see `Data Collection`
processes, see `Data Collection` below below
- Re-signal any python child processes to do the same - Re-signal any python child processes to do the same
## Data Collection ## Data Collection
@ -41,8 +38,8 @@ Data collection occurs in the following sequence:
- Pause all non-python processes - Pause all non-python processes
- Grab debug symbols on non-Sanitizer builds - Grab debug symbols on non-Sanitizer builds
- Signal python Processes - Signal python Processes
- Dump cores of as many processes as possible, until the disk quota is exceeded. - Dump cores of as many processes as possible, until the disk quota is exceeded. The default quota
The default quota is 90% of total volume space. is 90% of total volume space.
- Collect additional, non-core data. Ideally: - Collect additional, non-core data. Ideally:
- Print C++ Stack traces - Print C++ Stack traces
@ -54,13 +51,12 @@ Data collection occurs in the following sequence:
- Dump java processes (Jepsen tests) with jstack - Dump java processes (Jepsen tests) with jstack
- SIGABRT (Unix)/terminate (Windows) go processes - SIGABRT (Unix)/terminate (Windows) go processes
Note that the list of non-core data collected is only accurate on Linux. Other Note that the list of non-core data collected is only accurate on Linux. Other platforms only
platforms only perform a subset of these operations. perform a subset of these operations.
Additionally, note that the hang analyzer is subject to Evergreen post task Additionally, note that the hang analyzer is subject to Evergreen post task timeouts, and may not
timeouts, and may not have enough time to collect all information before have enough time to collect all information before being terminated by the Evergreen agent. When
being terminated by the Evergreen agent. When running locally there is no running locally there is no timeout, and the hang analyzer may ironically hang indefinitely.
timeout, and the hang analyzer may ironically hang indefinitely.
### Implementations ### Implementations

View File

@ -2,11 +2,23 @@
## Overview ## Overview
[Mongobridge](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L1) is a network fault injection testing tool that allows test authors to intentionally simulate network issues such as connection failures, message delays, or packet loss during communication to any node in a cluster. It acts as a transparent proxy between MongoDB processes and their clients, enabling controlled network fault injection for testing distributed system behavior. [Mongobridge](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L1)
is a network fault injection testing tool that allows test authors to intentionally simulate network
issues such as connection failures, message delays, or packet loss during communication to any node
in a cluster. It acts as a transparent proxy between MongoDB processes and their clients, enabling
controlled network fault injection for testing distributed system behavior.
## How It Works ## How It Works
When `ReplSetTest` or `ShardingTest` are instructed to use `mongobridge`, they will [set up a mongobridge process](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/jstests/libs/replsettest.js#L2962) for each node that [creates a ProxiedConnection](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L323-L324) between the node and any clients (including other nodes in the cluster) attempting to communicate with it. When test authors send a command to a node, mongobridge [intercepts the command and applies any configured actions](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L395-L430) onto the commands before forwarding the command along to the node itself. This allows simple fault injection from the test author's perspective. When `ReplSetTest` or `ShardingTest` are instructed to use `mongobridge`, they will
[set up a mongobridge process](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/jstests/libs/replsettest.js#L2962)
for each node that
[creates a ProxiedConnection](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L323-L324)
between the node and any clients (including other nodes in the cluster) attempting to communicate
with it. When test authors send a command to a node, mongobridge
[intercepts the command and applies any configured actions](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L395-L430)
onto the commands before forwarding the command along to the node itself. This allows simple fault
injection from the test author's perspective.
## Quick Start ## Quick Start
@ -23,7 +35,8 @@ To use mongobridge in your tests:
}); });
``` ```
- **Test commands must be enabled**: Mongobridge's `*From` commands require `enableTestCommands: true` (which is the default in test environments) - **Test commands must be enabled**: Mongobridge's `*From` commands require
`enableTestCommands: true` (which is the default in test environments)
2. **Inject network faults** using bridge commands: 2. **Inject network faults** using bridge commands:
@ -38,11 +51,16 @@ To use mongobridge in your tests:
st.rs0.getPrimary().acceptConnectionsFrom(st.rs0.getSecondary()); st.rs0.getPrimary().acceptConnectionsFrom(st.rs0.getSecondary());
``` ```
3. Operations that depend on communication between the affected nodes will fail or timeout as expected. 3. Operations that depend on communication between the affected nodes will fail or timeout as
expected.
## What to keep in mind ## What to keep in mind
Be aware that there are consequences to injecting network faults between nodes that can cause downstream impact in (for example) heartbeats, sync source selection, and SDAM, and so after a fault has been injected the test may not be in the state you expect it to be in for future commands. It is best to keep mongobridge tests relatively short and targeted to ensure that flakiness due to these faults doesn't impact the rest of your testing. Be aware that there are consequences to injecting network faults between nodes that can cause
downstream impact in (for example) heartbeats, sync source selection, and SDAM, and so after a fault
has been injected the test may not be in the state you expect it to be in for future commands. It is
best to keep mongobridge tests relatively short and targeted to ensure that flakiness due to these
faults doesn't impact the rest of your testing.
## Command Reference ## Command Reference
@ -71,7 +89,8 @@ node.acceptConnectionsFrom([node1, node2, node3]); // Multiple nodes
node.rejectConnectionsFrom(otherNode); node.rejectConnectionsFrom(otherNode);
``` ```
**Effect**: New connections are rejected, existing connections are closed when a new request is sent over them **Effect**: New connections are rejected, existing connections are closed when a new request is sent
over them
**Use case**: Simulating complete network partitions **Use case**: Simulating complete network partitions
@ -183,7 +202,8 @@ primary.discardMessagesFrom(secondary, 0.3);
### Limitations ### Limitations
- **OP_QUERY exhaust**: Not supported for legacy exhaust queries (OP_MSG exhaust cursors are supported) - **OP_QUERY exhaust**: Not supported for legacy exhaust queries (OP_MSG exhaust cursors are
supported)
- **Direct connections**: Only works when connections go through the bridge proxy - **Direct connections**: Only works when connections go through the bridge proxy
- **TLS support**: Mongobridge is not supported if the cluster is using TLS. - **TLS support**: Mongobridge is not supported if the cluster is using TLS.

View File

@ -11,26 +11,32 @@ Using OTel we capture the following things
3. Duration of hooks before and after test/suite 3. Duration of hooks before and after test/suite
4. Resmoke archiver (when there is a failure we archive core dumps) 4. Resmoke archiver (when there is a failure we archive core dumps)
To see this visually navigate to the [resmoke dataset](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/home) and view a recent trace. To see this visually navigate to the
[resmoke dataset](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/home)
and view a recent trace.
## A look at source code ## A look at source code
### Configuration ### Configuration
The bulk of configuration is done in the The bulk of configuration is done in the `_set_up_tracing(...)` method in
`_set_up_tracing(...)` method in [configure_resmoke.py#L164](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/configure_resmoke.py#L164). This method includes documentation on how it works. [configure_resmoke.py#L164](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/configure_resmoke.py#L164).
This method includes documentation on how it works.
## BatchedBaggageSpanProcessor ## BatchedBaggageSpanProcessor
See documentation [batched_baggage_span_processor.py#L8](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/batched_baggage_span_processor.py#L8) See documentation
[batched_baggage_span_processor.py#L8](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/batched_baggage_span_processor.py#L8)
## FileSpanExporter ## FileSpanExporter
See documentation [file_span_exporter.py#L16](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/file_span_exporter.py#L16) See documentation
[file_span_exporter.py#L16](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/file_span_exporter.py#L16)
## Capturing Data ## Capturing Data
We mostly capture data by using a decorator on methods. Example taken from [job.py#L200](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L200) We mostly capture data by using a decorator on methods. Example taken from
[job.py#L200](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L200)
``` ```
TRACER = trace.get_tracer("resmoke") TRACER = trace.get_tracer("resmoke")
@ -41,7 +47,11 @@ def func_name(...):
span.set_attribute("attr1", True) span.set_attribute("attr1", True)
``` ```
This system is nice because the decorator captures exceptions and other failures and a user can never forget to close a span. On occasion we will also start a span using the `with` clause in python. However, the decorator method is preferred since the method below makes more of a readability impact on the code. This example is taken from [job.py#L215](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L215) This system is nice because the decorator captures exceptions and other failures and a user can
never forget to close a span. On occasion we will also start a span using the `with` clause in
python. However, the decorator method is preferred since the method below makes more of a
readability impact on the code. This example is taken from
[job.py#L215](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L215)
``` ```
with TRACER.start_as_current_span("func_name", attributes={}): with TRACER.start_as_current_span("func_name", attributes={}):
@ -51,4 +61,9 @@ with TRACER.start_as_current_span("func_name", attributes={}):
## Insights We Have Made (so far) ## Insights We Have Made (so far)
Using [this dashboard](https://ui.honeycomb.io/mongodb-4b/environments/production/board/3bATQLb38bh/Server-CI) and [this query](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/result/GFa2YJ6d4vU/a/7EYuMJtH8KX/Slowest-Resmoke-Tests) we can see the most expensive single js tests. We plan to make tickets for teams to fix these long running tests for cloud savings as well as developer time savings. Using
[this dashboard](https://ui.honeycomb.io/mongodb-4b/environments/production/board/3bATQLb38bh/Server-CI)
and
[this query](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/result/GFa2YJ6d4vU/a/7EYuMJtH8KX/Slowest-Resmoke-Tests)
we can see the most expensive single js tests. We plan to make tickets for teams to fix these long
running tests for cloud savings as well as developer time savings.

View File

@ -1,10 +1,14 @@
# Resmoke Module Configuration # Resmoke Module Configuration
This configuration allows additional modules to be added to Resmoke, providing more context about their associated directories. Modules can specify directories for fixtures, hooks, suites, and JavaScript tests, which Resmoke incorporates during its testing process. This configuration allows additional modules to be added to Resmoke, providing more context about
their associated directories. Modules can specify directories for fixtures, hooks, suites, and
JavaScript tests, which Resmoke incorporates during its testing process.
## Adding a New Module ## Adding a New Module
To add a new module to Resmoke, define the module name and specify its `fixture_dirs`, `hook_dirs`, `suite_dirs`, and `jstest_dirs` in the YAML configuration. Each field should be a list of directory paths. To add a new module to Resmoke, define the module name and specify its `fixture_dirs`, `hook_dirs`,
`suite_dirs`, and `jstest_dirs` in the YAML configuration. Each field should be a list of directory
paths.
### Example YAML Configuration ### Example YAML Configuration
@ -25,9 +29,12 @@ my_new_module:
- **`fixture_dirs`**: Directories containing fixtures associated with the module. - **`fixture_dirs`**: Directories containing fixtures associated with the module.
- **`hook_dirs`**: Directories containing hooks associated with the module. - **`hook_dirs`**: Directories containing hooks associated with the module.
- **`suite_dirs`**: Directories containing suites with test configurations. - **`suite_dirs`**: Directories containing suites with test configurations.
- **`jstest_dirs`**: Directories containing JavaScript tests specific to the module. This ensures module-specific tests are excluded from other suite configurations when the module is disabled. - **`jstest_dirs`**: Directories containing JavaScript tests specific to the module. This ensures
module-specific tests are excluded from other suite configurations when the module is disabled.
## Notes ## Notes
- Any suite can use jstests from any directory, when the module is enabled the configured jstest dirs does nothing. Only when the module is disabled does it filter out the tests that might be configured in a suite from a different module. - Any suite can use jstests from any directory, when the module is enabled the configured jstest
dirs does nothing. Only when the module is disabled does it filter out the tests that might be
configured in a suite from a different module.
- Fields can be omitted or empty lists - Fields can be omitted or empty lists

View File

@ -1,55 +1,48 @@
# Thread Pools # Thread Pools
A thread pool ([Wikipedia][thread_pools_wikipedia]) accepts and executes A thread pool ([Wikipedia][thread_pools_wikipedia]) accepts and executes lightweight work items
lightweight work items called "tasks", using a carefully managed group called "tasks", using a carefully managed group of dedicated long-running worker threads. The worker
of dedicated long-running worker threads. The worker threads perform threads perform the work items in parallel without forcing each work item to assume the burden of
the work items in parallel without forcing each work item to assume the starting and destroying a dedicated thead.
burden of starting and destroying a dedicated thead.
## Classes ## Classes
### `ThreadPoolInterface` ### `ThreadPoolInterface`
The [`ThreadPoolInterface`][thread_pool_interface.h] abstract interface is The [`ThreadPoolInterface`][thread_pool_interface.h] abstract interface is an extension of the
an extension of the `OutOfLineExecutor` (see [the executors architecture `OutOfLineExecutor` (see [the executors architecture guide][executors]) abstract interface, adding
guide][executors]) abstract interface, adding `startup`, `shutdown`, and `startup`, `shutdown`, and `join` virtual member functions. It is the base class for our thread pool
`join` virtual member functions. It is the base class for our thread classes.
pool classes.
### `ThreadPool` ### `ThreadPool`
[`ThreadPool`][thread_pool.h] is the most basic concrete thread pool. The [`ThreadPool`][thread_pool.h] is the most basic concrete thread pool. The number of worker threads
number of worker threads is adaptive, but configurable with a min/max is adaptive, but configurable with a min/max range. Idle worker threads are reaped (down to the
range. Idle worker threads are reaped (down to the configured min), while configured min), while new worker threads can be created when needed (up to the configured max).
new worker threads can be created when needed (up to the configured max).
### `ThreadPoolTaskExecutor` ### `ThreadPoolTaskExecutor`
[`ThreadPoolTaskExecutor`][thread_pool_task_executor.h] is not a thread [`ThreadPoolTaskExecutor`][thread_pool_task_executor.h] is not a thread pool, but rather a
pool, but rather a `TaskExecutor` that uses a `ThreadPoolInterface` and `TaskExecutor` that uses a `ThreadPoolInterface` and a `NetworkInterface` to execute scheduled
a `NetworkInterface` to execute scheduled tasks. It's configured with a tasks. It's configured with a `ThreadPoolInterface` over which it _takes_ ownership, and a
`ThreadPoolInterface` over which it _takes_ ownership, and a `NetworkInterface`, of which it _shares_ ownership. With these resources it implements the elaborate
`NetworkInterface`, of which it _shares_ ownership. With these resources `TaskExecutor` interface (see [executors]).
it implements the elaborate `TaskExecutor` interface (see [executors]).
### `NetworkInterfaceThreadPool` ### `NetworkInterfaceThreadPool`
[`NetworkInterfaceThreadPool`][network_interface_thread_pool.h] is a [`NetworkInterfaceThreadPool`][network_interface_thread_pool.h] is a thread pool implementation that
thread pool implementation that doesn't actually own any worker threads. doesn't actually own any worker threads. It runs its tasks on the background thread of a
It runs its tasks on the background thread of a
[`NetworkInterface`][network_interface.h]. [`NetworkInterface`][network_interface.h].
Incoming tasks that are scheduled from the `NetworkInterface`'s thread Incoming tasks that are scheduled from the `NetworkInterface`'s thread are run immediately.
are run immediately. Otherwise they are queued to be run by the Otherwise they are queued to be run by the `NetworkInterface` thread when it is available.
`NetworkInterface` thread when it is available.
### `ThreadPoolMock` ### `ThreadPoolMock`
[`ThreadPoolMock`][thread_pool_mock.h] is a `ThreadPoolInterface`. It is not [`ThreadPoolMock`][thread_pool_mock.h] is a `ThreadPoolInterface`. It is not a mock of a
a mock of a `ThreadPool`. It has no configurable stored responses. It has `ThreadPool`. It has no configurable stored responses. It has one worker thread and a pointer to a
one worker thread and a pointer to a `NetworkInterfaceMock`, and with these `NetworkInterfaceMock`, and with these resources it simulates a thread pool well enough to be used
resources it simulates a thread pool well enough to be used by a by a `ThreadPoolTaskExecutor` in unit tests.
`ThreadPoolTaskExecutor` in unit tests.
[thread_pools_wikipedia]: https://en.wikipedia.org/wiki/Thread_pool [thread_pools_wikipedia]: https://en.wikipedia.org/wiki/Thread_pool
[executors]: ../src/mongo/executor/README.md [executors]: ../src/mongo/executor/README.md

View File

@ -1,13 +1,14 @@
Note: this doc is being continuously updated while changes are being made to the unit test framework. Note: this doc is being continuously updated while changes are being made to the unit test
framework.
# Overview # Overview
# Features # Features
The MongoDB unit test framework is a thin layer built atop GoogleTest, so most GoogleTest features The MongoDB unit test framework is a thin layer built atop GoogleTest, so most GoogleTest features
(see [Google Test documentation][google_test_docs]) are available for use aside from anything (see [Google Test documentation][google_test_docs]) are available for use aside from anything listed
listed out in [Banned Features](#banned-features). The unit testing framework also includes out in [Banned Features](#banned-features). The unit testing framework also includes enhanced
enhanced reporting of test output (see reporting of test output (see
[Enhanced Reporting of Test Output](#enhanced-reporting-of-test-output)). [Enhanced Reporting of Test Output](#enhanced-reporting-of-test-output)).
The core unittest features can be accessed by including the `mongo/unittest/unittest.h` header and The core unittest features can be accessed by including the `mongo/unittest/unittest.h` header and
@ -18,8 +19,8 @@ using the `mongo_cc_unit_test` bazel rule.
### Parameterized tests ### Parameterized tests
Parameterized tests are a GoogleTest feature that allows the same test logic to be run with Parameterized tests are a GoogleTest feature that allows the same test logic to be run with
different values or types (see GoogleTest docs on different values or types (see GoogleTest docs on [Value-Parameterized
[Value-Parameterized Tests][value_parameterized_tests] and [Typed Tests][typed_tests]). Tests][value_parameterized_tests] and [Typed Tests][typed_tests]).
```cpp ```cpp
class TestFixture : class TestFixture :
@ -41,8 +42,8 @@ TEST_P(TestFixture, MongoTest) {
### GoogleMock ### GoogleMock
GoogleMock can be used by including the `mongo/unittest/unittest.h` header. You should never GoogleMock can be used by including the `mongo/unittest/unittest.h` header. You should never
directly include `<gmock/gmock.h>`. There are matchers for common mongo types such as `BSONObj` directly include `<gmock/gmock.h>`. There are matchers for common mongo types such as `BSONObj` in
in `mongo/unittest/matcher.h`. `mongo/unittest/matcher.h`.
## Banned Features ## Banned Features
@ -63,9 +64,9 @@ GoogleTest fatal assertions, such as no fatal assertions allowed in non-void hel
## Enhanced Reporting of Test Output ## Enhanced Reporting of Test Output
The Enhanced Reporter improves test reporting by colorizing and formatting output, maintaining The Enhanced Reporter improves test reporting by colorizing and formatting output, maintaining a
a progress indicator, printing enhanced failure information, and suppressing log output on progress indicator, printing enhanced failure information, and suppressing log output on passing
passing tests. tests.
These command line flags may be used to configure the Enhanced Reporter: These command line flags may be used to configure the Enhanced Reporter:
@ -74,9 +75,9 @@ These command line flags may be used to configure the Enhanced Reporter:
## Death Tests ## Death Tests
The MongoDB unit testing framework uses `DEATH_TEST` (with `DEATH_TEST_F`, `DEATH_TEST_REGEX`, The MongoDB unit testing framework uses `DEATH_TEST` (with `DEATH_TEST_F`, `DEATH_TEST_REGEX`, and
and `DEATH_TEST_REGEX_F` variants) to test code that is expected to cause the process to `DEATH_TEST_REGEX_F` variants) to test code that is expected to cause the process to terminate. This
terminate. This should replace all uses of the `ASSERT_DEATH` macro from GoogleTest (see should replace all uses of the `ASSERT_DEATH` macro from GoogleTest (see
[unittest/death_test.h][death_test_h] for more details). [unittest/death_test.h][death_test_h] for more details).
Similar to GoogleTest, `DEATH_TEST` test suite names should be suffixed with `DeathTest`. For Similar to GoogleTest, `DEATH_TEST` test suite names should be suffixed with `DeathTest`. For
@ -98,8 +99,10 @@ DEATH_TEST_F(FixtureNameDeathTest, TestName) {
} }
``` ```
[death_test_naming]: https://github.com/google/googletest/blob/main/docs/advanced.md#death-test-naming [death_test_naming]:
https://github.com/google/googletest/blob/main/docs/advanced.md#death-test-naming
[death_test_h]: ../src/mongo/unittest/death_test.h [death_test_h]: ../src/mongo/unittest/death_test.h
[google_test_docs]: https://github.com/google/googletest/blob/main/docs/primer.md [google_test_docs]: https://github.com/google/googletest/blob/main/docs/primer.md
[value_parameterized_tests]: https://github.com/google/googletest/blob/main/docs/advanced.md#value-parameterized-tests [value_parameterized_tests]:
https://github.com/google/googletest/blob/main/docs/advanced.md#value-parameterized-tests
[typed_tests]: https://github.com/google/googletest/blob/main/docs/advanced.md#typed-tests [typed_tests]: https://github.com/google/googletest/blob/main/docs/advanced.md#typed-tests

View File

@ -56,9 +56,10 @@ Contact for more Information: https://www.mongodb.com/contact
### Note to 1194.22 ### Note to 1194.22
The Board interprets paragraphs (a) through (k) of this section as consistent with the following The Board interprets paragraphs (a) through (k) of this section as consistent with the following
priority 1 Checkpoints of the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) (May 5 1999) published by the Web priority 1 Checkpoints of the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) (May 5 1999)
Accessibility Initiative of the World Wide Web Consortium: Paragraph (a) - 1.1, (b) - 1.4, (c\) - 2.1, (d) - 6.1, published by the Web Accessibility Initiative of the World Wide Web Consortium: Paragraph (a) - 1.1,
(e) - 1.2, (f) - 9.1, (g) - 5.1, (h) - 5.2, (i) - 12.1, (j) - 7.1, (k) - 11.4. (b) - 1.4, (c\) - 2.1, (d) - 6.1, (e) - 1.2, (f) - 9.1, (g) - 5.1, (h) - 5.2, (i) - 12.1, (j) - 7.1,
(k) - 11.4.
## Section 1194.23 Telecommunications Products Detail ## Section 1194.23 Telecommunications Products Detail

View File

@ -1,84 +1,160 @@
# Javascript Test Guide # Javascript Test Guide
At MongoDB we write integration tests in JavaScript. These are tests written to exercise some behavior of a running MongoDB server, replica set, or sharded cluster. This guide aims to provide some general guidelines and best practices on how to write good tests. At MongoDB we write integration tests in JavaScript. These are tests written to exercise some
behavior of a running MongoDB server, replica set, or sharded cluster. This guide aims to provide
some general guidelines and best practices on how to write good tests.
## Principles ## Principles
### Minimize the test case as much as possible while still exercising and testing the desired behavior. ### Minimize the test case as much as possible while still exercising and testing the desired behavior.
- For example, if you are testing that document deletion works correctly, it may be entirely sufficient to insert just a single document and then delete that document. Inserting multiple documents would be unnecessary. A guiding principle on this is to ask yourself how easy it would be for a new person coming to this test to quickly understand it. If there are multiple documents being inserted into a collection, in a test that only tests document deletion, a newcomer might ask the question: “is it important that the test uses multiple documents, or incidental?”. It is best if you can remove these kinds of questions from a persons mind, by keeping only the absolute essential parts of a test. - For example, if you are testing that document deletion works correctly, it may be entirely
- We should always strive for unittesting when possible, so if the functionality you want to test can be covered by a unit test, we should write a unit test instead. sufficient to insert just a single document and then delete that document. Inserting multiple
documents would be unnecessary. A guiding principle on this is to ask yourself how easy it would
be for a new person coming to this test to quickly understand it. If there are multiple documents
being inserted into a collection, in a test that only tests document deletion, a newcomer might
ask the question: “is it important that the test uses multiple documents, or incidental?”. It is
best if you can remove these kinds of questions from a persons mind, by keeping only the absolute
essential parts of a test.
- We should always strive for unittesting when possible, so if the functionality you want to test
can be covered by a unit test, we should write a unit test instead.
### Add a block comment at the top of the JavaScript test file giving a clear and concise overview of what a test is trying to verify. ### Add a block comment at the top of the JavaScript test file giving a clear and concise overview of what a test is trying to verify.
- For tests that are more complicated, a brief description of the test steps might be useful as well. - For tests that are more complicated, a brief description of the test steps might be useful as
well.
### Keep debuggability in mind. ### Keep debuggability in mind.
- Assertion error messages should contain all information relevant to debugging the test. This means the servers response from the failed command should almost always be included in the assertion error message. It can also be helpful to include parameters that vary during the test to avoid requiring the investigator to use the logs/backtrace to determine what the test was attempting to do. - Assertion error messages should contain all information relevant to debugging the test. This means
- Think about how easy it would be to debug your test if something failed and a newcomer only had the logs of the test to look at. This can help guide your decision on what log messages to include and to what level of detail. The jsTestLog function is useful for this, as it is good at visually demarcating different phases of a test. As a tip, run your test a few times and just study the log messages, imagining you are an engineer debugging the test with only these logs to look at. Think about how understandable the logs would be to a newcomer. It is easy to add log messages to a test but then forget to see how they would actually appear. the servers response from the failed command should almost always be included in the assertion
- Never insert identical documents unless necessary. It is very useful in debugging to be able to figure out where a given piece of data came from. error message. It can also be helpful to include parameters that vary during the test to avoid
- If a test does the same thing multiple times, consider factoring it out into a library. Shorter running tests are easier to debug and code duplication is always bad. requiring the investigator to use the logs/backtrace to determine what the test was attempting to
do.
- Think about how easy it would be to debug your test if something failed and a newcomer only had
the logs of the test to look at. This can help guide your decision on what log messages to include
and to what level of detail. The jsTestLog function is useful for this, as it is good at visually
demarcating different phases of a test. As a tip, run your test a few times and just study the log
messages, imagining you are an engineer debugging the test with only these logs to look at. Think
about how understandable the logs would be to a newcomer. It is easy to add log messages to a test
but then forget to see how they would actually appear.
- Never insert identical documents unless necessary. It is very useful in debugging to be able to
figure out where a given piece of data came from.
- If a test does the same thing multiple times, consider factoring it out into a library. Shorter
running tests are easier to debug and code duplication is always bad.
### Do not hardcode collection or database names, especially if they are used multiple times throughout a test. ### Do not hardcode collection or database names, especially if they are used multiple times throughout a test.
It is best to use variable names that attempt to describe what a value is used for. For example, naming a variable that stores a collection named `collectionToDrop` is much better than just naming the variable `collName`. It is best to use variable names that attempt to describe what a value is used for. For example,
naming a variable that stores a collection named `collectionToDrop` is much better than just naming
the variable `collName`.
### Make every effort to make your test as deterministic as possible. ### Make every effort to make your test as deterministic as possible.
- Non-deterministic tests add noise to our build system and, in general, make it harder for yourself and other engineers to determine if the system really is working correctly or not. Flaky integration tests should be considered bugs, and we should not allow them to be committed to the server codebase. One way to make jstests more deterministic is to use failpoints to force the events happening in expected order. However, if we have to use failpoints to make this test deterministic, we should consider write a unit test instead. - Non-deterministic tests add noise to our build system and, in general, make it harder for yourself
- Note that our fuzzer and concurrency test suites are often an exception to this rule. In those cases we sometimes give up some level of determinism in order to trigger a wider class of rare edge cases. For targeted JavaScript integration tests, however, highly deterministic tests should be the goal. and other engineers to determine if the system really is working correctly or not. Flaky
integration tests should be considered bugs, and we should not allow them to be committed to the
server codebase. One way to make jstests more deterministic is to use failpoints to force the
events happening in expected order. However, if we have to use failpoints to make this test
deterministic, we should consider write a unit test instead.
- Note that our fuzzer and concurrency test suites are often an exception to this rule. In those
cases we sometimes give up some level of determinism in order to trigger a wider class of rare
edge cases. For targeted JavaScript integration tests, however, highly deterministic tests should
be the goal.
### Think hard about all the assumptions that the test relies on. ### Think hard about all the assumptions that the test relies on.
- For example, if a certain phase of the test ran much slower or much faster, would it cause your test to fail for the wrong reason? - For example, if a certain phase of the test ran much slower or much faster, would it cause your
- If your test includes hard-coded timeouts, make sure they are set appropriately. If a test is waiting for a certain condition to be true, and the test should not proceed until that condition is met, it is often correct to just wait “indefinitely”, instead of adding some arbitrary timeout value, like 30 seconds. In practice this usually means setting some reasonable upper limit, for example, 10 minutes. test to fail for the wrong reason?
- Also, for replication tests, make sure data exists on the right nodes at the right time. For example, if you a do a write and dont explicitly wait for it to replicate, it might not reach a secondary node before you try to do the next step of the test. - If your test includes hard-coded timeouts, make sure they are set appropriately. If a test is
- Does your test require data to be stored persistently? Remember that we have test variants that run on in-memory/ephemeral storage engines waiting for a certain condition to be true, and the test should not proceed until that condition
- There are timeouts in the test suites and we aim to make all tests in the same suite finish before timeout. That says we should always make the test run quickly to keep the test short in terms of duration. is met, it is often correct to just wait “indefinitely”, instead of adding some arbitrary timeout
value, like 30 seconds. In practice this usually means setting some reasonable upper limit, for
example, 10 minutes.
- Also, for replication tests, make sure data exists on the right nodes at the right time. For
example, if you a do a write and dont explicitly wait for it to replicate, it might not reach a
secondary node before you try to do the next step of the test.
- Does your test require data to be stored persistently? Remember that we have test variants that
run on in-memory/ephemeral storage engines
- There are timeouts in the test suites and we aim to make all tests in the same suite finish before
timeout. That says we should always make the test run quickly to keep the test short in terms of
duration.
### Make tests fail as early as possible. ### Make tests fail as early as possible.
- If something goes wrong early in the test, its much harder to diagnose when that error becomes visible much later. - If something goes wrong early in the test, its much harder to diagnose when that error becomes
- Wrap every command in assert.commandWorked, or assert.commandFailedWithCode. There is also assert.commandFailed that won't check the return error code, but we should always try to use assert.commandFailedWithCode to make sure the test won't pass on an unexpected error. visible much later.
- Wrap every command in assert.commandWorked, or assert.commandFailedWithCode. There is also
assert.commandFailed that won't check the return error code, but we should always try to use
assert.commandFailedWithCode to make sure the test won't pass on an unexpected error.
### Be aware of all the configurations and variants that your test might run under. ### Be aware of all the configurations and variants that your test might run under.
- Make sure that your test still works correctly if is run in a different configuration or on a different platform than the one you might have tested on. - Make sure that your test still works correctly if is run in a different configuration or on a
- Varying storage engines and suites can often affect a tests behavior. For example, maybe your test fails unexpectedly if it runs with authentication turned on with an in-memory storage engine. You dont have to run a new test on every possible platform before committing it, but you should be confident that your test doesnt break in an unexpected configuration. different platform than the one you might have tested on.
- Varying storage engines and suites can often affect a tests behavior. For example, maybe your
test fails unexpectedly if it runs with authentication turned on with an in-memory storage engine.
You dont have to run a new test on every possible platform before committing it, but you should
be confident that your test doesnt break in an unexpected configuration.
### Avoid assertions that verify properties indirectly. ### Avoid assertions that verify properties indirectly.
All assertions in a test should attempt to verify the most specific property possible. For example, if you are trying to test that a certain collection exists, it is better to assert that the collections exact name exists in the list of collections, as opposed to verifying that the collection count is equal to 1. The desired collections existence is sufficient for the collection count to be 1, but not necessary (a different collection could exist in its place). Be wary of adding these kind of indirect assertions in a test. All assertions in a test should attempt to verify the most specific property possible. For example,
if you are trying to test that a certain collection exists, it is better to assert that the
collections exact name exists in the list of collections, as opposed to verifying that the
collection count is equal to 1. The desired collections existence is sufficient for the collection
count to be 1, but not necessary (a different collection could exist in its place). Be wary of
adding these kind of indirect assertions in a test.
### Test Isolation ### Test Isolation
Your JS test will likely be running with many other files before and after it. It's important to start from a known state, and to restore that state (to a reasonable extent) at the end of your test content. Your JS test will likely be running with many other files before and after it. It's important to
start from a known state, and to restore that state (to a reasonable extent) at the end of your test
content.
- **Before**: If there are critical assumptions about the environment that your test needs, assert for it explicitly before proceeding to the real test content (instead of debugging side effects of that not being the case) - **Before**: If there are critical assumptions about the environment that your test needs, assert
- If you have a precondition on the _environment_, use [`@tags`](./tags.md) instead of just an early-return. This will avoid the test being scheduled in the first place if the environment is not supported. for it explicitly before proceeding to the real test content (instead of debugging side effects of
- **After**: If you are modifying the fixture, do everything possible to safely restore those changes at the end of your test content, even after a test failure. Resmokes' `--continueOnFailure` flag is used in CI, so the fixture is shared across many test files, and is only torn down at the end. that not being the case)
- Note, a fixture _can_ immediately "abort" after a test failure, only if [archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is configured, but that shouldn't be assumed because that is a per-suite configuration (and your test can run in many passthrough suite combinations). - If you have a precondition on the _environment_, use [`@tags`](./tags.md) instead of just an
- One easy approach to restoring your state is to use the [Mocha-style](#use-mocha-style-constructs) `after` hooks in your test content. early-return. This will avoid the test being scheduled in the first place if the environment is
not supported.
- **After**: If you are modifying the fixture, do everything possible to safely restore those
changes at the end of your test content, even after a test failure. Resmokes'
`--continueOnFailure` flag is used in CI, so the fixture is shared across many test files, and is
only torn down at the end.
- Note, a fixture _can_ immediately "abort" after a test failure, only if
[archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is
configured, but that shouldn't be assumed because that is a per-suite configuration (and your
test can run in many passthrough suite combinations).
- One easy approach to restoring your state is to use the
[Mocha-style](#use-mocha-style-constructs) `after` hooks in your test content.
## Modern JS: Modules in Practice ## Modern JS: Modules in Practice
We have fully migrated to the modularized JavaScript world so any new test should use modules and adapt the new style. We have fully migrated to the modularized JavaScript world so any new test should use modules and
adapt the new style.
### Only import/export what you need. ### Only import/export what you need.
It's always important to keep the test context clean so we should only import/export what we need. It's always important to keep the test context clean so we should only import/export what we need.
- The unused import is against [no-unused-vars](https://eslint.org/docs/latest/rules/no-unused-vars) rule in ESLint though we haven't enforced it. - The unused import is against [no-unused-vars](https://eslint.org/docs/latest/rules/no-unused-vars)
- We don't have a linter to check export since it's hard to tell the necessity, but we should only export the modules that are imported by other tests or will be needed in the future. rule in ESLint though we haven't enforced it.
- We don't have a linter to check export since it's hard to tell the necessity, but we should only
export the modules that are imported by other tests or will be needed in the future.
### Declare variables in proper scope. ### Declare variables in proper scope.
In the past, we have seen tests referring some "undeclared" or "redeclared" variables, which are actually introduced through `load()`. Now with modules, the scope is more clear. We can use global variables properly to setup the test and don't need to worry about polluting other tests. In the past, we have seen tests referring some "undeclared" or "redeclared" variables, which are
actually introduced through `load()`. Now with modules, the scope is more clear. We can use global
variables properly to setup the test and don't need to worry about polluting other tests.
### Name variables properly when exporting. ### Name variables properly when exporting.
To avoid naming conflicts, we should not make the name of exported variables too general which could easily conflict with another variable from the test which import your module. For example, in the following case, the module exported a variable named `alphabet` and it will lead to a re-declaration error. To avoid naming conflicts, we should not make the name of exported variables too general which could
easily conflict with another variable from the test which import your module. For example, in the
following case, the module exported a variable named `alphabet` and it will lead to a re-declaration
error.
``` ```
import {alphabet} from "/matts/module.js"; import {alphabet} from "/matts/module.js";
@ -87,7 +163,9 @@ const alphabet = "xyz"; // ERROR
### Prefer let/const over var ### Prefer let/const over var
`let/const` should be preferred over `var` since these can help detect double declaration at the first place. Like, in the naming conflict example, if the second line is using var, it could easily mess up without throwing an error. `let/const` should be preferred over `var` since these can help detect double declaration at the
first place. Like, in the naming conflict example, if the second line is using var, it could easily
mess up without throwing an error.
### Export in ES6 style ### Export in ES6 style
@ -116,7 +194,8 @@ This can help the language server to discover the methods and provide code navig
### Use Mocha-style Constructs ### Use Mocha-style Constructs
The [mochalite.js](../jstests/libs/mochalite.js) library ports over a subset of [MochaJS](https://mochajs.org/) functionality for the shell, including: The [mochalite.js](../jstests/libs/mochalite.js) library ports over a subset of
[MochaJS](https://mochajs.org/) functionality for the shell, including:
- `it` test contruction - `it` test contruction
- `describe` suite structures - `describe` suite structures
@ -125,19 +204,13 @@ The [mochalite.js](../jstests/libs/mochalite.js) library ports over a subset of
- `before` and `after` hooks, to run _once_ around _all_ `it` tests - `before` and `after` hooks, to run _once_ around _all_ `it` tests
- `beforeEach` and `afterEach` hooks, to run around _each_ `it` test - `beforeEach` and `afterEach` hooks, to run around _each_ `it` test
- The above (excluding `describe` variants) also support `async` functions - The above (excluding `describe` variants) also support `async` functions
- Resmoke test filtering using the `--mochagrep` flag, which mirrors the [`grep`](https://mochajs.org/#-grep-regexp-g-regexp) flag from MochaJS - Resmoke test filtering using the `--mochagrep` flag, which mirrors the
[`grep`](https://mochajs.org/#-grep-regexp-g-regexp) flag from MochaJS
Example using several APIs: Example using several APIs:
```js ```js
import { import {after, afterEach, before, beforeEach, describe, it} from "jstests/libs/mochalite.js";
after,
afterEach,
before,
beforeEach,
describe,
it,
} from "jstests/libs/mochalite.js";
describe("simple inserts and finds", () => { describe("simple inserts and finds", () => {
before(() => { before(() => {
@ -157,9 +230,7 @@ describe("simple inserts and finds", () => {
assert.eq(this.fixtureDB.find({name: "test"}).count(), 1); assert.eq(this.fixtureDB.find({name: "test"}).count(), 1);
}); });
it("should error on invalid data", () => { it("should error on invalid data", () => {
const e = assert.throws(() => const e = assert.throws(() => this.fixtureDB.insert({notafield: undefined}));
this.fixtureDB.insert({notafield: undefined}),
);
assert.eq(e.message, "Field 'notafield' not found"); assert.eq(e.message, "Field 'notafield' not found");
}); });
}); });
@ -182,7 +253,9 @@ buildscripts/resmoke.py run --suites=no_passthrough --mochagrep "do something" j
## Test Tags ## Test Tags
JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary. Not scheduling a test to run is much faster than the test doing an early-return when preconditions are not met. JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary.
Not scheduling a test to run is much faster than the test doing an early-return when preconditions
are not met.
The simplest use case is having something like the following at the top of your js test file: The simplest use case is having something like the following at the top of your js test file:

View File

@ -4,19 +4,31 @@ For a short introduction to property-based testing or fast-check, see [Appendix]
## Core PBT Design ## Core PBT Design
The 'Core PBTs' are a subset of our property-based tests that use a shared schema and models. Their purpose is to provide basic coverage of our query language that may not be tested by the rest of our jstests. This means only simple stages such as $project, $match, $sort, etc are covered. More complicated stages such as $lookup or $facet are not tested. PBTs outside of the core set may test these more complex features. The 'Core PBTs' are a subset of our property-based tests that use a shared schema and models. Their
purpose is to provide basic coverage of our query language that may not be tested by the rest of our
jstests. This means only simple stages such as $project, $match, $sort, etc are covered. More
complicated stages such as $lookup or $facet are not tested. PBTs outside of the core set may test
these more complex features.
These tests have been highly effective at finding bugs. As of writing they have caught 24 bugs in 8 months. See [SERVER-89308](https://jira.mongodb.org/browse/SERVER-89308) for a full list of issues. These tests have been highly effective at finding bugs. As of writing they have caught 24 bugs in 8
months. See [SERVER-89308](https://jira.mongodb.org/browse/SERVER-89308) for a full list of issues.
The Core PBT design is built off of a few key principles about randomized testing: The Core PBT design is built off of a few key principles about randomized testing:
### Properties Dictate the Models ### Properties Dictate the Models
In our fuzzer, we have grammar for most of MQL. While this provides more coverage, it means the property we assert is weaker. We can add as much as we'd like to the model, because the property comes second to the model. We're willing to add exceptions to the property to make it work. In our fuzzer, we have grammar for most of MQL. While this provides more coverage, it means the
property we assert is weaker. We can add as much as we'd like to the model, because the property
comes second to the model. We're willing to add exceptions to the property to make it work.
However, the "model dictates the property" design also backfired, because in addition to exceptions in the property, we need to post-process the generated queries. Adding $sort to several places throughout an aggregation pipeline means we are no longer testing MQL, but rather an artificial subset of MQL that a user would never write. However, the "model dictates the property" design also backfired, because in addition to exceptions
in the property, we need to post-process the generated queries. Adding $sort to several places
throughout an aggregation pipeline means we are no longer testing MQL, but rather an artificial
subset of MQL that a user would never write.
For this reason, the properties come first in our Core PBTs, and have few exceptions. They dictate what model we use so no postprocessing is needed. The PBT models are significantly smaller than the fuzzer models. For this reason, the properties come first in our Core PBTs, and have few exceptions. They dictate
what model we use so no postprocessing is needed. The PBT models are significantly smaller than the
fuzzer models.
### Small Schema ### Small Schema
@ -24,19 +36,32 @@ For this reason, the properties come first in our Core PBTs, and have few except
A small number of fields in our schema allows us to find interesting interactions more easily. A small number of fields in our schema allows us to find interesting interactions more easily.
An example of an interaction could be query optimizations. Let's say an optimization on `[{$match: {*field*: 5}}, {$sort: {*field*: 1}}]` only kicks in when the two fields are the same. In a PBT where there are one thousand possible fields (`a`, `b`, `c`, but also `a.b.c`, `a.a.a` and all combinations), the probability of finding this optimization is `1/1000`. With six fields, it's increased to `1/6`. An example of an interaction could be query optimizations. Let's say an optimization on
`[{$match: {*field*: 5}}, {$sort: {*field*: 1}}]` only kicks in when the two fields are the same. In
a PBT where there are one thousand possible fields (`a`, `b`, `c`, but also `a.b.c`, `a.a.a` and all
combinations), the probability of finding this optimization is `1/1000`. With six fields, it's
increased to `1/6`.
Another interaction is between queries and indexes. Queries and indexes generated from a small schema make the indexes more likely to be used. Another interaction is between queries and indexes. Queries and indexes generated from a small
schema make the indexes more likely to be used.
Bugs tend to come from interactions and special cases. A query that has no optimizations applied and does not use an index requires much less complicated logic, which is correlated to less bugs. Bugs tend to come from interactions and special cases. A query that has no optimizations applied and
does not use an index requires much less complicated logic, which is correlated to less bugs.
#### Simple Values to Avoid MQL Inconsistencies #### Simple Values to Avoid MQL Inconsistencies
Related to [Properties Dictate the Models](#properties-dictate-the-models), a simpler document model also allows for stronger properties. Related to [Properties Dictate the Models](#properties-dictate-the-models), a simpler document model
also allows for stronger properties.
There are inconsistencies in our query language that are accepted behavior, but cause issues in property-based testing. We can work around them by being careful about the values we allow in documents. There are inconsistencies in our query language that are accepted behavior, but cause issues in
property-based testing. We can work around them by being careful about the values we allow in
documents.
[SERVER-12869](https://jira.mongodb.org/browse/SERVER-12869) is an issue that stems from null and missing being encoded the same way in our index format. This means a covering plan (a plan with no `FETCH` node) cannot distinguish between null and missing. This inconsistency is the cause of lots of noise from our fuzzer, since one differing value in a query result can propogate. In our Core PBTs, we do not allow missing fields. This means: [SERVER-12869](https://jira.mongodb.org/browse/SERVER-12869) is an issue that stems from null and
missing being encoded the same way in our index format. This means a covering plan (a plan with no
`FETCH` node) cannot distinguish between null and missing. This inconsistency is the cause of lots
of noise from our fuzzer, since one differing value in a query result can propogate. In our Core
PBTs, we do not allow missing fields. This means:
- Documents must have all fields in the schema - Documents must have all fields in the schema
- We can only index fields in the schema - We can only index fields in the schema
@ -44,7 +69,9 @@ There are inconsistencies in our query language that are accepted behavior, but
`null` is allowed. `null` is allowed.
Floating point values are another area the PBTs avoid. Results can differ depending on the order of floating point operations. These differences can propogate. For this reason the only number values allowed are integers. Floating point values are another area the PBTs avoid. Results can differ depending on the order of
floating point operations. These differences can propogate. For this reason the only number values
allowed are integers.
## Modeling Workloads ## Modeling Workloads
@ -62,8 +89,9 @@ A workload consists of a collection model and an aggregation model, in the follo
} }
``` ```
Using one workload model instead of separate (and independent) collection models and agg models allows them to be interrelated. Using one workload model instead of separate (and independent) collection models and agg models
For example, if we want to model a PBT to test partial indexes where every query should satisfy the partial index filter, we can write: allows them to be interrelated. For example, if we want to model a PBT to test partial indexes where
every query should satisfy the partial index filter, we can write:
``` ```
fc.record({ fc.record({
@ -78,7 +106,8 @@ fc.record({
}); });
``` ```
and this is a valid workload model. If the collection and aggregation models are passed separately, they would be independent an unable to coordinate with shared arbitraries (like `partialFilter`). and this is a valid workload model. If the collection and aggregation models are passed separately,
they would be independent an unable to coordinate with shared arbitraries (like `partialFilter`).
### Schema ### Schema
@ -95,11 +124,13 @@ The Core PBT schema is:
} }
``` ```
For now, this is also a valid model for a document in a time-series collection (where `t` is the time field and `m` is the meta field), but the models may diverge. For now, this is also a valid model for a document in a time-series collection (where `t` is the
time field and `m` is the meta field), but the models may diverge.
### Query Generation ### Query Generation
These models cover a limited number of aggregation stages, located in `jstests/libs/property_test_helpers/models`. The supported stages are: These models cover a limited number of aggregation stages, located in
`jstests/libs/property_test_helpers/models`. The supported stages are:
- $project - $project
- $addFields - $addFields
@ -112,7 +143,8 @@ These models cover a limited number of aggregation stages, located in `jstests/l
#### Query Families #### Query Families
Rather than generating single, standalone queries, our query model generates a "family" of queries. Rather than generating single, standalone queries, our query model generates a "family" of queries.
At its leaves, a query family contains multiple values that the leaf could take on. For example instead of generating a single query with a concrete value `1` at the leaf: At its leaves, a query family contains multiple values that the leaf could take on. For example
instead of generating a single query with a concrete value `1` at the leaf:
``` ```
[{$match: {a: 1}}, {$project: {b: 0}}] [{$match: {a: 1}}, {$project: {b: 0}}]
@ -133,7 +165,8 @@ Then we extract several queries that have the same shape.
``` ```
This allows us to write properties that use the plan cache more often rather than relying on chance. This allows us to write properties that use the plan cache more often rather than relying on chance.
Properties can use the `getQuery` interface to ask for queries with different shapes, or the same shape with different leaf values plugged in. Properties can use the `getQuery` interface to ask for queries with different shapes, or the same
shape with different leaf values plugged in.
## Core PBTs ## Core PBTs
@ -143,15 +176,15 @@ Details are provided at the top of each file.
## Debugging a PBT Failure ## Debugging a PBT Failure
Currently, all PBTs have a fixed seed. Currently, all PBTs have a fixed seed. This means that as long as the bug it found is deterministic
This means that as long as the bug it found is deterministic on the server's side, the PBT will consistently run into the issue. on the server's side, the PBT will consistently run into the issue. If the bug is not deterministic,
If the bug is not deterministic, the PBT may or may not fail. the PBT may or may not fail.
### Shrinking (Minimizing) ### Shrinking (Minimizing)
Once a counterexample (a failing case) to the property is found, fast-check tests will automatically attempt to shrink the issue. Once a counterexample (a failing case) to the property is found, fast-check tests will automatically
Shrinking often does not reach the global minimum counterexample, since fast-check cannot make certain jumps. attempt to shrink the issue. Shrinking often does not reach the global minimum counterexample, since
For example it has no way of knowing that fast-check cannot make certain jumps. For example it has no way of knowing that
`{$and: [{a: {$eq: 1}}]}` `{$and: [{a: {$eq: 1}}]}`
@ -163,30 +196,39 @@ or even
`{a: 1}` `{a: 1}`
This could be solved if fast-check had domain-specific knowledge about MQL or if it fuzzed counterexamples during shrinking. This could be solved if fast-check had domain-specific knowledge about MQL or if it fuzzed
However the counterexamples are usually small enough where there isn't much left to shrink. counterexamples during shrinking. However the counterexamples are usually small enough where there
isn't much left to shrink.
For non-deterministic issues, fast-check's shrinking is not as effective because it receives mixed signals from the property on whether the shrunk counterexamples fail or not. For non-deterministic issues, fast-check's shrinking is not as effective because it receives mixed
signals from the property on whether the shrunk counterexamples fail or not.
### Failure Output ### Failure Output
After a failure is minimized, the counterexample is printed out. After a failure is minimized, the counterexample is printed out. This includes debug data such as
This includes debug data such as the counterexample that fast-check found and the error it ran into. the counterexample that fast-check found and the error it ran into. The counterexample will be a
The counterexample will be a workload (see [Modeling Workloads](#modeling-workloads)), containing all information about the collection and queries run against it. workload (see [Modeling Workloads](#modeling-workloads)), containing all information about the
collection and queries run against it.
To reproduce the issue, the workload can be copied and pasted into the failing property-based test, specifically by passing it in as the `examples` argument to `testProperty`. To reproduce the issue, the workload can be copied and pasted into the failing property-based test,
fast-check will take these hand-written examples and run them before trying randomized examples. specifically by passing it in as the `examples` argument to `testProperty`. fast-check will take
See `partial_index_pbt.js` (which references `pbt_resolved_bugs.js`) for an example of this. these hand-written examples and run them before trying randomized examples. See
`partial_index_pbt.js` uses the `examples` argument to ensure workloads that previously would fail are run. `partial_index_pbt.js` (which references `pbt_resolved_bugs.js`) for an example of this.
It can be used in the same way to repro existing bugs from BFs. `partial_index_pbt.js` uses the `examples` argument to ensure workloads that previously would fail
are run. It can be used in the same way to repro existing bugs from BFs.
# Appendix # Appendix
## Property-Based Testing (PBT) ## Property-Based Testing (PBT)
Property-based testing is a testing method that asserts properties hold over many example inputs. In our use of PBT, it involves two components, a "model" and a "property function". The model is a description of the object we are testing. It is used to generate examples of what the object looks like. These examples are routed into the property function, which asserts that the object has the characteristics we expect them to have. Property-based testing is a testing method that asserts properties hold over many example inputs. In
our use of PBT, it involves two components, a "model" and a "property function". The model is a
description of the object we are testing. It is used to generate examples of what the object looks
like. These examples are routed into the property function, which asserts that the object has the
characteristics we expect them to have.
Let's say we wrote a new integer addition function `add` that we'd like to test. We could calculate the correct answer to different addition problems, and assert that `add` behaves correctly. Let's say we wrote a new integer addition function `add` that we'd like to test. We could calculate
the correct answer to different addition problems, and assert that `add` behaves correctly.
``` ```
assert.eq(add(1, 2), 3); assert.eq(add(1, 2), 3);
@ -194,7 +236,9 @@ assert.eq(add(-1, 1), 0);
... ...
``` ```
In addition to tests written with concrete values, we could also write a PBT to test for characteristics we expect `add` to have. Addition is commutative for example, meaning `add(a, b)` should always equal `add(b, a)`. We can write a function for this: In addition to tests written with concrete values, we could also write a PBT to test for
characteristics we expect `add` to have. Addition is commutative for example, meaning `add(a, b)`
should always equal `add(b, a)`. We can write a function for this:
``` ```
function testAdd(a, b){ function testAdd(a, b){
@ -202,12 +246,20 @@ function testAdd(a, b){
} }
``` ```
The input to `testAdd` could use the builtin Javascript `Random` package, or a PBT library such as fast-check. The input to `testAdd` could use the builtin Javascript `Random` package, or a PBT library such as
fast-check.
The way the query team uses PBT tends to be more complex, and almost always involves modeling a subset of our query language, documents, and indexes. Our fuzzer is a form of property-based testing, since we generate random queries and assert correctness against different controls (an older mongo version, a collection without indexes, etc) The way the query team uses PBT tends to be more complex, and almost always involves modeling a
subset of our query language, documents, and indexes. Our fuzzer is a form of property-based
testing, since we generate random queries and assert correctness against different controls (an
older mongo version, a collection without indexes, etc)
## fast-check ## fast-check
fast-check (located in jstests/third_party/fast_check/fc-3.1.0.js) is a property-based testing framework for javascript/typescript. It provides building-block components to use for larger models, and has functionality to test properties against these models. It also has built-in logic for shrinking (minimizing) counterexamples to properties. fast-check (located in jstests/third_party/fast_check/fc-3.1.0.js) is a property-based testing
framework for javascript/typescript. It provides building-block components to use for larger models,
and has functionality to test properties against these models. It also has built-in logic for
shrinking (minimizing) counterexamples to properties.
For an example of how to use fast-check to write a property-based test, see [project_coalescing.js](../../aggregation/sources/project/project_coalescing.js) For an example of how to use fast-check to write a property-based test, see
[project_coalescing.js](../../aggregation/sources/project/project_coalescing.js)

View File

@ -4,5 +4,7 @@ These tests test upgrade/downgrade behavior expected between different versions
Those that begin failing upon branching should be assessed by the owner teams: Those that begin failing upon branching should be assessed by the owner teams:
- Is the test only applicable to specific versions during specific development cycles? If so, delete it from irrelevant branches and master. - Is the test only applicable to specific versions during specific development cycles? If so, delete
- Does the test add value for "last" (dynamic) version features? If so, modify the test to be more robust. These should always pass regardless of MongoDB version. it from irrelevant branches and master.
- Does the test add value for "last" (dynamic) version features? If so, modify the test to be more
robust. These should always pass regardless of MongoDB version.

View File

@ -1,3 +1,4 @@
# FCV / setFCV core infrastructure # FCV / setFCV core infrastructure
This folder contains tests the core FCV and setFCV upgrade/downgrade infrastructure. It does not contain tests linked to any other particular feature. This folder contains tests the core FCV and setFCV upgrade/downgrade infrastructure. It does not
contain tests linked to any other particular feature.

View File

@ -1,6 +1,8 @@
# Introduction # Introduction
The plan_stability tests record the current winning plan for a set of ~ 1K queries produced by SPM-3816. If those plans ever change, the test is expected to fail at which point a human would decide if the changed plans are for the better or for the worse. The plan_stability tests record the current winning plan for a set of ~ 1K queries produced by
SPM-3816. If those plans ever change, the test is expected to fail at which point a human would
decide if the changed plans are for the better or for the worse.
# Running # Running
@ -13,7 +15,8 @@ $ buildscripts/resmoke.py run \
jstests/query_golden/plan_stability.js jstests/query_golden/plan_stability.js
``` ```
There are several resmoke suites predefined for different plan ranking modes, for which it is not needed to add mongod parameters: There are several resmoke suites predefined for different plan ranking modes, for which it is not
needed to add mongod parameters:
```bash ```bash
query_golden_cbr_automatic query_golden_cbr_automatic
@ -42,7 +45,9 @@ To obtain a diff that contains an individual diff fragment for each changed plan
2. Edit the `~/.golden_test_config.yml` to use a customized diff command: 2. Edit the `~/.golden_test_config.yml` to use a customized diff command:
```yml ```yml
diffCmd: 'git -c diff.plan_stability.xfuncname=">>>pipeline" diff --unified=0 --function-context --no-index "{{expected}}" "{{actual}}"' diffCmd:
'git -c diff.plan_stability.xfuncname=">>>pipeline" diff --unified=0 --function-context --no-index
"{{expected}}" "{{actual}}"'
``` ```
3. You can now run `buildscripts/golden_test.py diff` as usual and the output will look like this: 3. You can now run `buildscripts/golden_test.py diff` as usual and the output will look like this:
@ -68,15 +73,20 @@ This provides the plan that changed, the pipeline it belonged to, and the execut
## Using the summarization scripts ## Using the summarization scripts
The `feature-extractor` internal repository contains a summarization script that can be used to obtain a summary of the failed test as well as information on the individual regressions that should be looked into. Please see `scripts/cbr/README.md` in that repository for more information. The `feature-extractor` internal repository contains a summarization script that can be used to
obtain a summary of the failed test as well as information on the individual regressions that should
be looked into. Please see `scripts/cbr/README.md` in that repository for more information.
# Debugging failures # Debugging failures
## Which pipeline is the problematic one? ## Which pipeline is the problematic one?
In Evergreen, the diff will most likely show a pipeline **below** the counters. This is however the following pipeline in the test, not the one you are looking for. The problematic pipeline is the one that comes **before** it in the `expected_output` file. In Evergreen, the diff will most likely show a pipeline **below** the counters. This is however the
following pipeline in the test, not the one you are looking for. The problematic pipeline is the one
that comes **before** it in the `expected_output` file.
In local execution, if your environment is configured as described above, the diff will show the actual pipeline of interest, **above** the counters. In local execution, if your environment is configured as described above, the diff will show the
actual pipeline of interest, **above** the counters.
## Running the offending pipelines manually ## Running the offending pipelines manually
@ -98,7 +108,8 @@ and wait until the script has advanced to the following log line:
[js_test:plan_stability] [jsTest] ---- [js_test:plan_stability] [jsTest] ----
``` ```
2. Connect to `mongodb://127.0.0.1:20000` and run the offending pipeline against the `db.plan_stability` collection. 2. Connect to `mongodb://127.0.0.1:20000` and run the offending pipeline against the
`db.plan_stability` collection.
```bash ```bash
mongosh mongodb://127.0.0.1:20000 mongosh mongodb://127.0.0.1:20000
@ -113,7 +124,10 @@ db.plan_stability.aggregate(pipeline).explain().queryPlanner.rejectedPlans.sort(
## Converting the pipeline to JavaScript ## Converting the pipeline to JavaScript
The pipelines in the diff are **EJSON**-ish, while the mongosh shell expects **JavaScript**. EJSON-ish and JavaScript are identical when it comes to basic types, such as strings and integers, but if the pipeline contains timestamps and decimals, the JSON needs to be converted to JavaScript using `EJSON.parse()`: The pipelines in the diff are **EJSON**-ish, while the mongosh shell expects **JavaScript**.
EJSON-ish and JavaScript are identical when it comes to basic types, such as strings and integers,
but if the pipeline contains timestamps and decimals, the JSON needs to be converted to JavaScript
using `EJSON.parse()`:
```js ```js
> pipelineStr = '[{"$match":{"field20_Timestamp_idx":{"$gt":{"$timestamp":{"t":1760551205,"i":0}}}},"field12_Decimal128_idx":{"$lte":{"$numberDecimal":"35.1"}}}]'; > pipelineStr = '[{"$match":{"field20_Timestamp_idx":{"$gt":{"$timestamp":{"t":1760551205,"i":0}}}},"field12_Decimal128_idx":{"$lte":{"$numberDecimal":"35.1"}}}]';
@ -130,23 +144,26 @@ The pipelines in the diff are **EJSON**-ish, while the mongosh shell expects **J
db.plan_stability2.aggregate(pipeline); db.plan_stability2.aggregate(pipeline);
``` ```
Note that **ISO Timestamps** need to be handled separately. JSON will store those as strings, resulting in loss of typing information that `EJSON.parse()` can not recover. This will result in a semantic change in the query unless manually converted to an `ISODate` object: Note that **ISO Timestamps** need to be handled separately. JSON will store those as strings,
resulting in loss of typing information that `EJSON.parse()` can not recover. This will result in a
semantic change in the query unless manually converted to an `ISODate` object:
```js ```js
// Manually convert // Manually convert
// [{"$match":{"field19_datetime_idx":{"$gte":"2024-01-27T00:00:00.000Z"}}}] // [{"$match":{"field19_datetime_idx":{"$gte":"2024-01-27T00:00:00.000Z"}}}]
// to the correct JavaScript // to the correct JavaScript
pipeline = [ pipeline = [{$match: {field19_datetime_idx: {$gte: ISODate("2024-01-27T00:00:00.000Z")}}}];
{$match: {field19_datetime_idx: {$gte: ISODate("2024-01-27T00:00:00.000Z")}}},
];
``` ```
## Is the new plan better or worse? ## Is the new plan better or worse?
For the majority of the plans, it will be obvious if the new plan is better or worse because all the execution counters would have moved in the same direction without any ambiguity. For the majority of the plans, it will be obvious if the new plan is better or worse because all the
execution counters would have moved in the same direction without any ambiguity.
Some plans, such as those involving `$sort` or `$limit` will sometimes change in a way that makes some counters better while others become worse. For those queries, consider running them manually multiple times to compare their wallclock execution times: Some plans, such as those involving `$sort` or `$limit` will sometimes change in a way that makes
some counters better while others become worse. For those queries, consider running them manually
multiple times to compare their wallclock execution times:
```javascript ```javascript
pipeline = [...]; pipeline = [...];
@ -162,11 +179,15 @@ You can also modify `collSize` in `plan_stability.js` to temporarily use a large
If you want to run a comparison between estimation methods `X` and `Y`: If you want to run a comparison between estimation methods `X` and `Y`:
1. If method `X` is not multi-planning, place the `jstests/query_golden/expected_files/X` for estimation method `X` in the root of `expected_files`, so that they are used as the base for the comparison; 1. If method `X` is not multi-planning, place the `jstests/query_golden/expected_files/X` for
estimation method `X` in the root of `expected_files`, so that they are used as the base for the
comparison;
2. Temporary remove the expected files for method `Y` from `expected_files/query_golden/expected_files/Y` so that they are not considered; 2. Temporary remove the expected files for method `Y` from
`expected_files/query_golden/expected_files/Y` so that they are not considered;
3. Run the test as described above, specifying `featureFlagCostBasedRanker`/`internalQueryCBRCEMethod`; 3. Run the test as described above, specifying
`featureFlagCostBasedRanker`/`internalQueryCBRCEMethod`;
4. Use the summarization script as described above to produce a report. 4. Use the summarization script as described above to produce a report.
@ -179,5 +200,5 @@ To accept the new plans, use `buildscripts/golden_test.py accept`, as with any o
## Removing individual pipelines ## Removing individual pipelines
If a given pipeline proves flaky, that is, is flipping between one plan and another for no reason, If a given pipeline proves flaky, that is, is flipping between one plan and another for no reason,
you can comment it out from the test with a note. Re-run the test and then run `buildscripts/golden_test.py accept` you can comment it out from the test with a note. Re-run the test and then run
to persist the change. `buildscripts/golden_test.py accept` to persist the change.

View File

@ -1,21 +1,26 @@
# Introduction # Introduction
The plan stability tests for join optimization are golden tests that execute a number of joins against the TPC-H dataset. The plan stability tests for join optimization are golden tests that execute a number of joins
against the TPC-H dataset.
For each pipeline we persist the following in the golden test output: For each pipeline we persist the following in the golden test output:
- the MQL command, including the base table and the pipeline - the MQL command, including the base table and the pipeline
- a concise representation of the winning plan for the query - a concise representation of the winning plan for the query
- execution counters that quantify the effort it took to execute the query in terms of docs and keys examined - execution counters that quantify the effort it took to execute the query in terms of docs and keys
examined
- data about the resultset, such as the number of rows returned - data about the resultset, such as the number of rows returned
## Prerequisites ## Prerequisites
This test requires the following: This test requires the following:
- The `mongorestore` tool, accessible on the $PATH. This tool is part of the [MongoDB Database Tools](https://www.mongodb.com/try/download/database-tools) package. - The `mongorestore` tool, accessible on the $PATH. This tool is part of the
[MongoDB Database Tools](https://www.mongodb.com/try/download/database-tools) package.
- The TPC-H dataset, located in a directory named `tpc-h` that is on the same level as the mongodb repository. The dataset is available from the `query-benchmark-data` S3 bucket. You can retrieve it as follows: - The TPC-H dataset, located in a directory named `tpc-h` that is on the same level as the mongodb
repository. The dataset is available from the `query-benchmark-data` S3 bucket. You can retrieve
it as follows:
```bash ```bash
mkdir ~/tpc-h mkdir ~/tpc-h
@ -26,7 +31,8 @@ aws sso login
aws s3 cp s3://query-benchmark-data/tpc-h/tpch-0.1-normalized.archive.gz tpc-h/tpch-0.1-normalized.archive.gz --region us-east-1 aws s3 cp s3://query-benchmark-data/tpc-h/tpch-0.1-normalized.archive.gz tpc-h/tpch-0.1-normalized.archive.gz --region us-east-1
``` ```
In evergreen, tasks such as `query_golden_join_optimization_plan_stability` make sure the prerequisites are already in place. In evergreen, tasks such as `query_golden_join_optimization_plan_stability` make sure the
prerequisites are already in place.
- The golden test framework configured with a custom diff rule - The golden test framework configured with a custom diff rule
@ -77,13 +83,16 @@ The report contains the following information:
- the most-improved queries, useful for obtaining examples for presentation purposes; - the most-improved queries, useful for obtaining examples for presentation purposes;
- all individual failures, categorized and pretty-printed. - all individual failures, categorized and pretty-printed.
The report has one section per jstest -- if you are running multiple tests, each one will be processed and reported separately. The report has one section per jstest -- if you are running multiple tests, each one will be
processed and reported separately.
## Debugging ## Debugging
> [!WARNING] > **_WARNING:_** The queries dumped by this test, the diff tooling or the summary report may contain EJSON constructs, such as $numberDecimal > [!WARNING] > **_WARNING:_** The queries dumped by this test, the diff tooling or the summary
> that are not properly processed by `coll.aggregate()` unless converted using `EJSON.parse()`. Typing information around ISO dates may have also been lost, so manually recreate those as `ISODate(...)`. > report may contain EJSON constructs, such as $numberDecimal that are not properly processed by
> See the "A note on the queries" section below for more information. > `coll.aggregate()` unless converted using `EJSON.parse()`. Typing information around ISO dates may
> have also been lost, so manually recreate those as `ISODate(...)`. See the "A note on the queries"
> section below for more information.
### Determining the offending query ### Determining the offending query
@ -91,7 +100,9 @@ Each query has an `idx` key that can be used to track it across files and report
### Starting a populated MongoDB instance ### Starting a populated MongoDB instance
To obtain a running, populated MongoDB instance, run `resmoke.py run` with the `--pauseAfterPopulate` option. This will start mongod, load the data and then pause resmoke at the following line: To obtain a running, populated MongoDB instance, run `resmoke.py run` with the
`--pauseAfterPopulate` option. This will start mongod, load the data and then pause resmoke at the
following line:
``` ```
[js_test:plan_stability_join_opt_tpch] [jsTest] TestData.pauseAfterPopulate is set. Pausing indefinitely ... [js_test:plan_stability_join_opt_tpch] [jsTest] TestData.pauseAfterPopulate is set. Pausing indefinitely ...
@ -124,15 +135,18 @@ The collections will be restored to the `tpch` database.
## A note on the queries ## A note on the queries
The queries you see in files, diffs, bug reports may be in various formats, depending on whether they were dumped using JavaScript, python, or some other method. The queries you see in files, diffs, bug reports may be in various formats, depending on whether
they were dumped using JavaScript, python, or some other method.
Therefore, it is important to obtain the query plan of the query and make sure that what you are seeing locally matches the plan from the bug report. Therefore, it is important to obtain the query plan of the query and make sure that what you are
seeing locally matches the plan from the bug report.
The following caveats are currently known: The following caveats are currently known:
### Typing information for timestamps ### Typing information for timestamps
Typing information for timestamps is frequently lost, so a query may contain ISO timestamps as strings: Typing information for timestamps is frequently lost, so a query may contain ISO timestamps as
strings:
```json ```json
{"l_commitdate": {"$lt": "1993-03-17T00:00:00"}} {"l_commitdate": {"$lt": "1993-03-17T00:00:00"}}
@ -146,7 +160,8 @@ You will need to manually convert this into a timestamp:
{'l_commitdate': {'$lt': new ISODate('1993-03-17T00:00:00')}} {'l_commitdate': {'$lt': new ISODate('1993-03-17T00:00:00')}}
``` ```
Since the typing information has been lost somewhere along the pipeline, no existing library is available to restore it for you. Since the typing information has been lost somewhere along the pipeline, no existing library is
available to restore it for you.
### EJSON output ### EJSON output
@ -158,6 +173,8 @@ Sometimes the query will be provided in EJSON, so you will see:
in the output. in the output.
In mongosh, `aggregate()` does not support EJSON directly, so passing EJSON to it will succeed but will not produce the expected results. In mongosh, `aggregate()` does not support EJSON directly, so passing EJSON to it will succeed but
will not produce the expected results.
Either pass this output as `EJSON.parse()` (if your input is a string), `EJSON.deserialize()` (if your input is parsed already) or manually convert it to standard MQL. Either pass this output as `EJSON.parse()` (if your input is a string), `EJSON.deserialize()` (if
your input is parsed already) or manually convert it to standard MQL.

View File

@ -2,15 +2,18 @@
Bazel test targets for resmoke suites. Bazel test targets for resmoke suites.
For documentation of the `resmoke_suite_test` rule, see [bazel/resmoke/README.md](bazel/resmoke/README.md). For documentation of the `resmoke_suite_test` rule, see
[bazel/resmoke/README.md](bazel/resmoke/README.md).
## Configuring ## Configuring
In addition to attributes for `resmoke_suite_test`, the following are options for configuring test targets. In addition to attributes for `resmoke_suite_test`, the following are options for configuring test
targets.
### tags ### tags
Arbitrary tags may also be added to group test targets for batch execution. For example, a custom tag lets you run all matching suites at once: Arbitrary tags may also be added to group test targets for batch execution. For example, a custom
tag lets you run all matching suites at once:
``` ```
bazel test //jstests/suites/... --test_tag_filters=my_tag bazel test //jstests/suites/... --test_tag_filters=my_tag
@ -26,7 +29,8 @@ The following tags have special meaning:
### target_compatible_with ### target_compatible_with
Configure platforms/build options that the test is compatible with. Use this to exclude the test suite from platforms in CI. Configure platforms/build options that the test is compatible with. Use this to exclude the test
suite from platforms in CI.
Example — exclude the test on PPC/S390x, MacOS, and TSAN builds: Example — exclude the test on PPC/S390x, MacOS, and TSAN builds:

View File

@ -1,6 +1,8 @@
# JS Test Tags # JS Test Tags
JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary. Not scheduling a test to run is much faster than the test doing an early-return when preconditions are not met. JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary.
Not scheduling a test to run is much faster than the test doing an early-return when preconditions
are not met.
The simplest use case is having something like the following at the top of your js test file: The simplest use case is having something like the following at the top of your js test file:
@ -38,7 +40,10 @@ and can also include (meta) comments:
*/ */
``` ```
The tags are meant to be used in suite configurations, to [`include_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorinclude_with_any_tags) and/or [`exclude_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorexclude_with_any_tags): The tags are meant to be used in suite configurations, to
[`include_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorinclude_with_any_tags)
and/or
[`exclude_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorexclude_with_any_tags):
```bash ```bash
test_kind: js_test test_kind: js_test
@ -50,7 +55,8 @@ selector:
- disabled_for_fcv_6_1_upgrade - disabled_for_fcv_6_1_upgrade
``` ```
Build variants can also use tags via the `test_flags` expansion, which facilitates tag-exclusions _across suites_ that run with the variant: Build variants can also use tags via the `test_flags` expansion, which facilitates tag-exclusions
_across suites_ that run with the variant:
``` ```
expansions: expansions:
@ -60,6 +66,9 @@ Build variants can also use tags via the `test_flags` expansion, which facilitat
## Available Tags ## Available Tags
There is no current exhaustive list, since tags are arbitrary labels and do not need to be "registered". However, tags are always "global", and many are reused. Names should have communicate clear intent; and be reused/consolidated when appropriate. There is no current exhaustive list, since tags are arbitrary labels and do not need to be
"registered". However, tags are always "global", and many are reused. Names should have communicate
clear intent; and be reused/consolidated when appropriate.
> Use `buildscripts/resmoke.py list-tags` to find which tags are actively referenced by suite configs, although there may be more in JS files and Build Variant expansions. > Use `buildscripts/resmoke.py list-tags` to find which tags are actively referenced by suite
> configs, although there may be more in JS files and Build Variant expansions.

Some files were not shown because too many files have changed in this diff Show More