SERVER-124136 Format markdown via prettier: wrap lines and use width of 100 (#52231)

GitOrigin-RevId: 3305c1e2ee3a6a2c3a5b2b7883b0f491a59ed646
This commit is contained in:
Steve McClure 2026-04-21 14:30:35 -04:00 committed by MongoDB Bot
parent e66373f938
commit 32e8f260de
205 changed files with 12720 additions and 8000 deletions

View File

@ -2,18 +2,24 @@
This folder is for custom pull request templates. Templates are Markdown (\*.md) files.
These custom templates can be used for example, by individual teams to have a custom pull request template with team specific testing or documentation instructions.
These custom templates can be used for example, by individual teams to have a custom pull request
template with team specific testing or documentation instructions.
Read more in [Github's docs](https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository)
Read more in
[Github's docs](https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository)
If you update the default PR template, you also need to update the commit metadata in github branch rulesets.
If you update the default PR template, you also need to update the commit metadata in github branch
rulesets.
# How To Use This Folder
To create a custom template, create a new markdown file in this folder.
Then create a link of the form `https://github.com/mongodb/mongo/compare/main...my-branch?quick_pull=1&template=your_new_template.md`
Then create a link of the form
`https://github.com/mongodb/mongo/compare/main...my-branch?quick_pull=1&template=your_new_template.md`
Share that link in your team docs to use for creating PRs. By selecting an unused values for `my-branch` it should show a branch selector when following the link.
Share that link in your team docs to use for creating PRs. By selecting an unused values for
`my-branch` it should show a branch selector when following the link.
Read more in [Github's docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/using-query-parameters-to-create-a-pull-request)
Read more in
[Github's docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/using-query-parameters-to-create-a-pull-request)

View File

@ -1 +1,2 @@
Anything in this description will be included in the commit message. Replace or delete this text before merging. Add links to testing in the comments of the PR.
Anything in this description will be included in the commit message. Replace or delete this text
before merging. Add links to testing in the comments of the PR.

View File

@ -15,6 +15,13 @@
"parser": "yaml",
"tabWidth": 4
}
},
{
"files": "*.md",
"options": {
"proseWrap": "always",
"printWidth": 100
}
}
]
}

View File

@ -49,8 +49,7 @@ You can install compass using the `install_compass` script packaged with MongoDB
$ ./install_compass
```
This will download the appropriate MongoDB Compass package for your platform
and install it.
This will download the appropriate MongoDB Compass package for your platform and install it.
## Drivers
@ -88,9 +87,9 @@ https://www.mongodb.com/cloud/atlas
## LICENSE
MongoDB is free and the source is available. Versions released prior to
October 16, 2018 are published under the AGPL. All versions released after
October 16, 2018, including patch fixes for prior versions, are published
under the [Server Side Public License (SSPL) v1](LICENSE-Community.txt).
See individual files for details which will specify the license applicable
to each file. Files subject to the SSPL will be noted in their headers.
MongoDB is free and the source is available. Versions released prior to October 16, 2018 are
published under the AGPL. All versions released after October 16, 2018, including patch fixes for
prior versions, are published under the
[Server Side Public License (SSPL) v1](LICENSE-Community.txt). See individual files for details
which will specify the license applicable to each file. Files subject to the SSPL will be noted in
their headers.

View File

@ -1,10 +1,13 @@
# Building Bazel from Source to target the PPC64LE Architecture
Bazel doesn't release to the PPC64LE architecture. To address this, MongoDB maintains our own Bazel build that we perform on our PPC64LE development systems.
Bazel doesn't release to the PPC64LE architecture. To address this, MongoDB maintains our own Bazel
build that we perform on our PPC64LE development systems.
# JDK?
Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't support PPC64LE. To get around this, an external JDK must be present on both the system compiling the Bazel executable itself as well as the host running Bazel as a build system.
Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't
support PPC64LE. To get around this, an external JDK must be present on both the system compiling
the Bazel executable itself as well as the host running Bazel as a build system.
On the MongoDB PPC64LE Evergreen static hosts and dev hosts, the OpenJDK 21 installation exists at:

View File

@ -1,10 +1,13 @@
# Building Bazel from Source to target the S390X Architecture
Bazel doesn't release to the S390X architecture. To address this, MongoDB maintains our own Bazel build that we perform on our S390X development systems.
Bazel doesn't release to the S390X architecture. To address this, MongoDB maintains our own Bazel
build that we perform on our S390X development systems.
# JDK?
Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't support S390X. To get around this, an external JDK must be present on both the system compiling the Bazel executable itself as well as the host running Bazel as a build system.
Bazel usually comes with a built-in JDK. However, the tooling used to build the built-in JDK doesn't
support S390X. To get around this, an external JDK must be present on both the system compiling the
Bazel executable itself as well as the host running Bazel as a build system.
On the MongoDB S390X Evergreen static hosts and dev hosts, the OpenJDK 21 installation exists at:

View File

@ -1,3 +1,4 @@
# MongoDB Bazel Best Practices
Please refer to https://bazel.build/configure/best-practices as a baseline. This doc will be updated with MongoDB-specific best practices as they're defined.
Please refer to https://bazel.build/configure/best-practices as a baseline. This doc will be updated
with MongoDB-specific best practices as they're defined.

View File

@ -4,7 +4,8 @@ This document describes the Server Developer workflow for modifying Bazel build
# Creating a new BUILD.bazel file
A build target is defined in the directory where its source code exists. To create a target that compiles **src/mongo/hello_world.cpp**, you would create **src/mongo/BUILD.bazel**.
A build target is defined in the directory where its source code exists. To create a target that
compiles **src/mongo/hello_world.cpp**, you would create **src/mongo/BUILD.bazel**.
src/mongo/BUILD.bazel would contain:
@ -15,7 +16,8 @@ src/mongo/BUILD.bazel would contain:
],
}
Once you've obtained bazel by running **python buildscripts/install_bazel.py**, you can then build this target via "bazel build":
Once you've obtained bazel by running **python buildscripts/install_bazel.py**, you can then build
this target via "bazel build":
bazel build //src/mongo:hello_world
@ -23,13 +25,17 @@ Or run this target via "bazel run":
bazel run //src/mongo:hello_world
The full target name is a combination between the directory of the BUILD.bazel file and the target name:
The full target name is a combination between the directory of the BUILD.bazel file and the target
name:
//{BUILD.bazel dir}:{targetname}
# Adding a New Header / Source File
Bazel makes use of static analysis wherever possible to improve execution and querying speed. As part of this, source and header files must not be declared dynamically (ex. glob, wildcard, etc). Instead, you'll need to manually add a reference to each header or source file you add into your build target.
Bazel makes use of static analysis wherever possible to improve execution and querying speed. As
part of this, source and header files must not be declared dynamically (ex. glob, wildcard, etc).
Instead, you'll need to manually add a reference to each header or source file you add into your
build target.
mongo_cc_binary(
name = "hello_world",
@ -44,13 +50,15 @@ Bazel makes use of static analysis wherever possible to improve execution and qu
## Adding a New Library
The DevProd Build Team created MongoDB-specific macros for the different types of build targets you may want to specify. These include:
The DevProd Build Team created MongoDB-specific macros for the different types of build targets you
may want to specify. These include:
- mongo_cc_binary
- mongo_cc_library
- idl_generator
Creating a new library is similar to the steps above for creating a new binary. A new **mongo_cc_library** definition would be created in the BUILD.bazel file.
Creating a new library is similar to the steps above for creating a new binary. A new
**mongo_cc_library** definition would be created in the BUILD.bazel file.
mongo_cc_library(
name = "new_library",
@ -61,7 +69,9 @@ Creating a new library is similar to the steps above for creating a new binary.
## Declaring Dependencies
If a library or binary depends on another library, this must be declared in the **deps** section of the target. The syntax for referring to the library is the same syntax used in the bazel build/run command.
If a library or binary depends on another library, this must be declared in the **deps** section of
the target. The syntax for referring to the library is the same syntax used in the bazel build/run
command.
mongo_cc_library(
name = "new_library",
@ -82,16 +92,20 @@ If a library or binary depends on another library, this must be declared in the
## Running clang-tidy via Bazel
Note: This feature is still in development; see https://jira.mongodb.org/browse/SERVER-80396 for details)
Note: This feature is still in development; see https://jira.mongodb.org/browse/SERVER-80396 for
details)
To run clang-tidy via Bazel, do the following:
1. To analyze all code, run `bazel build --config=clang-tidy src/...`
2. To analyze a single target (e.g.: `environment_buffer`), run the following command (note that `_with_debug` suffix on the target): `bazel build --config=clang-tidy src/mongo/db/commands:environment_buffer_with_debug`
2. To analyze a single target (e.g.: `environment_buffer`), run the following command (note that
`_with_debug` suffix on the target):
`bazel build --config=clang-tidy src/mongo/db/commands:environment_buffer_with_debug`
Testing notes:
- If you want to test whether clang-tidy is in fact finding bugs, you can inject the following code into a `cpp` file to generate a `bugprone-incorrect-roundings` warning:
- If you want to test whether clang-tidy is in fact finding bugs, you can inject the following code
into a `cpp` file to generate a `bugprone-incorrect-roundings` warning:
```
const double f = 1.0;
@ -105,12 +119,24 @@ const int foo = (int)(f + 0.5);
Follow this loop to figure out where the header needs to be added
1. Build directly with bazel to speed up the loop: `bazel build //src/...`
2. This will fail on the first missing header dependency, search the bazel build files for the library the header is defined on. Currently there are cases where headers are incorrectly located so you'll need to use your best judgement. If the header exists on some library, add that library as a dep, for example `scoped_timer.h` is part of `scope_timer` library so add `//src/mongo/db/exec:scoped_timer` to deps field (this will take care of `scoped_timer.h` transitive dependencies). If not add the header directly to the hdrs field of the library that's failing to compile.
2. This will fail on the first missing header dependency, search the bazel build files for the
library the header is defined on. Currently there are cases where headers are incorrectly located
so you'll need to use your best judgement. If the header exists on some library, add that library
as a dep, for example `scoped_timer.h` is part of `scope_timer` library so add
`//src/mongo/db/exec:scoped_timer` to deps field (this will take care of `scoped_timer.h`
transitive dependencies). If not add the header directly to the hdrs field of the library that's
failing to compile.
3. Build directly with bazel `bazel build //src/...`
4. If there is a cycle remove the dependency from Step #2, add the header as direct dependency to the hdrs field, and then start back at Step #1
4. If there is a cycle remove the dependency from Step #2, add the header as direct dependency to
the hdrs field, and then start back at Step #1
### The header I want to add is referenced in dozens or more locations, and adding it to the proper location requires a large refactor that is blocking critical work, what should I do?
If you've put in a significant amount of work to try to get a header added and have found to get it added to the right place (usually alongside the associated .cpp file, having all dependents add that library as a dep) will take a significant refactor, create a SERVER ticket explaining the problem, solution, and complexity required to resolve it. Then, open up src/mongo/BUILD.bazel and add the header to "core_headers" file group referencing your ticket in a TODO comment.
If you've put in a significant amount of work to try to get a header added and have found to get it
added to the right place (usually alongside the associated .cpp file, having all dependents add that
library as a dep) will take a significant refactor, create a SERVER ticket explaining the problem,
solution, and complexity required to resolve it. Then, open up src/mongo/BUILD.bazel and add the
header to "core_headers" file group referencing your ticket in a TODO comment.
This is very much a last resort and should only be done if the refactor will take a very significant amount of time and is blocking other work.
This is very much a last resort and should only be done if the refactor will take a very significant
amount of time and is blocking other work.

View File

@ -1,7 +1,9 @@
# EngFlow Certification Installation
MongoDB uses EngFlow to enable remote execution with Bazel. This dramatically speeds up the build process, but is only available to internal MongoDB employees.
MongoDB uses EngFlow to enable remote execution with Bazel. This dramatically speeds up the build
process, but is only available to internal MongoDB employees.
Bazel uses a wrapper script to check the credentials on each invocation, if for some reason thats not working, you can also manually perform this process with this command alternatively:
Bazel uses a wrapper script to check the credentials on each invocation, if for some reason thats
not working, you can also manually perform this process with this command alternatively:
python buildscripts/engflow_auth.py

View File

@ -1,8 +1,12 @@
# Header Relocation and Cycle Resolution
1. Locate all the targets that reference the header file in BUILD.bazel files.
2. Find an ideal target to declare the header under. This is usually under the target that features the .cpp file of the same name. Otherwise, the header can be placed in its own library.
3. Ensure that all the targets that need this header can depend on the target the header was moved to.
4. Run `bazel build //src/...` to check for build failures (look for failures related to dependency cycles).
5. If the build fails because of a dependency cycle, you may need to split up the dependent library or relocate the header.
2. Find an ideal target to declare the header under. This is usually under the target that features
the .cpp file of the same name. Otherwise, the header can be placed in its own library.
3. Ensure that all the targets that need this header can depend on the target the header was moved
to.
4. Run `bazel build //src/...` to check for build failures (look for failures related to dependency
cycles).
5. If the build fails because of a dependency cycle, you may need to split up the dependent library
or relocate the header.
6. Once the build succeeds, please create a PR and include `devprod-build` for review.

View File

@ -1,8 +1,7 @@
# Remote execution images
The Dockerfiles for remote execution images are autogenerated to pin all
versions and allow for updates at the same time. To repin the image hashes and
package versions:
The Dockerfiles for remote execution images are autogenerated to pin all versions and allow for
updates at the same time. To repin the image hashes and package versions:
```bash
# With Bazel

View File

@ -1,16 +1,22 @@
# About
This documents some useful tools, concepts, and debugging strategies for bazel toolchains.
This information was gathered while developing the WASI SDK toolchain.
This documents some useful tools, concepts, and debugging strategies for bazel toolchains. This
information was gathered while developing the WASI SDK toolchain.
# Concepts
[Toolchain](https://bazel.build/extending/toolchains#debugging-toolchains) and [Platform](https://bazel.build/extending/platforms) are the core relevant concepts.
Toolchains define the tools used to compile, and the platform defines either the execution platform (for the compilation/compiler tools) and target platform (for the binary).
Bazel tries to search for a toolchain based on these constraints.
[Toolchain](https://bazel.build/extending/toolchains#debugging-toolchains) and
[Platform](https://bazel.build/extending/platforms) are the core relevant concepts. Toolchains
define the tools used to compile, and the platform defines either the execution platform (for the
compilation/compiler tools) and target platform (for the binary). Bazel tries to search for a
toolchain based on these constraints.
We also made use of [transitions](https://bazel.build/rules/lib/builtins/transition) which allow bazel to reconfigure itself before building a target to avoid passing irrelevant or incorrect compiler flags (e.g. WASI SDK doesn't support shared objects).
Similarly, we used [actions](https://bazel.build/docs/cc-toolchain-config-reference#using-action-config) instead of the tool paths attribute because of, [possibly historical, lack of support for remote resources in tool paths](https://stackoverflow.com/questions/73504780/bazel-reference-binaries-from-packages-in-custom-toolchain-definition/73505313#73505313).
We also made use of [transitions](https://bazel.build/rules/lib/builtins/transition) which allow
bazel to reconfigure itself before building a target to avoid passing irrelevant or incorrect
compiler flags (e.g. WASI SDK doesn't support shared objects). Similarly, we used
[actions](https://bazel.build/docs/cc-toolchain-config-reference#using-action-config) instead of the
tool paths attribute because of,
[possibly historical, lack of support for remote resources in tool paths](https://stackoverflow.com/questions/73504780/bazel-reference-binaries-from-packages-in-custom-toolchain-definition/73505313#73505313).
# Debugging tools
@ -20,13 +26,15 @@ Similarly, we used [actions](https://bazel.build/docs/cc-toolchain-config-refere
bazel ... --toolchain_resolution_debug=.* ...
```
The above flag can be used to debug toolchain resolution as bazel tries to automatically satisfy constraints.
The above flag can be used to debug toolchain resolution as bazel tries to automatically satisfy
constraints.
## Debugging Remote Resources
Toolchains may be remotely fetched, but the directory structure of the build environment after these remote resources are fetched may not be clear.
`bazel info` can be used to find the bazel directory and inspect it `bazel info output_base`.
Note: this may be different depending on your configuration and level of sandboxing.
Toolchains may be remotely fetched, but the directory structure of the build environment after these
remote resources are fetched may not be clear. `bazel info` can be used to find the bazel directory
and inspect it `bazel info output_base`. Note: this may be different depending on your configuration
and level of sandboxing.
This is particularly useful when used in combination with the `find` command as shown below.
@ -42,10 +50,11 @@ Note: this command is directory dependent because output_base is per bazel insta
bazel ... -s ...
```
This will show verbose output such as cd actions and compiler/linker invocations.
Note: bazel may recast paths relative to the exec directory.
This will show verbose output such as cd actions and compiler/linker invocations. Note: bazel may
recast paths relative to the exec directory.
## Debugging on Engflow
Engflow has a lot of helpful views showing remote execution stats and the remote file structure.
We don't intent to duplicate their documentation but be careful as some of their data (particularly remotely executed actions) may not be accurate immediately after execution.
Engflow has a lot of helpful views showing remote execution stats and the remote file structure. We
don't intent to duplicate their documentation but be careful as some of their data (particularly
remotely executed actions) may not be accurate immediately after execution.

View File

@ -38,18 +38,21 @@ resmoke_suite_test(
### Test Sharding
Test sharding allows you to split a large test suite across multiple parallel test executions, significantly reducing total test time. When `shard_count` is specified, Bazel will:
Test sharding allows you to split a large test suite across multiple parallel test executions,
significantly reducing total test time. When `shard_count` is specified, Bazel will:
1. Run the test target multiple times in parallel (up to the specified shard count)
2. Each shard receives a unique shard index (0 to N-1)
3. The resmoke runner uses these values to determine which subset of tests to run in each shard
4. Each shard produces its own test output and logs
Note: sharding is an alternative to the resmoke `--jobs` flag, which should not be used with `resmoke_suite_test`.
Note: sharding is an alternative to the resmoke `--jobs` flag, which should not be used with
`resmoke_suite_test`.
### Test Logs and Output Directory
Bazel creates a dedicated output directory for each test run under the `bazel-testlogs` symlink in your workspace root.
Bazel creates a dedicated output directory for each test run under the `bazel-testlogs` symlink in
your workspace root.
For a test target `//jstests/suites/query-execution:core`, the outputs are like:
@ -78,7 +81,8 @@ bazel test //jstests/suites/query-execution:core --test_sharding_strategy=disabl
#### Run with additional resmoke flags:
Any `--test_arg` in the bazel command will be propagated as a flag to resmoke.py. To modify the resmoke invocation with any of resmoke's flags, add them as `--test_arg`s.
Any `--test_arg` in the bazel command will be propagated as a flag to resmoke.py. To modify the
resmoke invocation with any of resmoke's flags, add them as `--test_arg`s.
```
# Runs all tests from the core suite with timeseries in their name, twice, with all feature flags enabled.

View File

@ -11,7 +11,8 @@ To use the WASI SDK apply the `wasi_compatible` with a select statement:
})
```
If your target is defined in terms of a traditional bazel C/C++ target you can use the WASI transition in order to ensure the bazel options are WASI compatible.
If your target is defined in terms of a traditional bazel C/C++ target you can use the WASI
transition in order to ensure the bazel options are WASI compatible.
```python
load("//bazel/toolchains/cc/wasm/toolchain:with_wasi_config.bzl", "with_wasi_config")

View File

@ -17,8 +17,8 @@ For background on Antithesis, the base images, and the broader CI pipeline, see
Scripts must be executable and live directly under the template directory (not in subdirectories).
The prefix of the filename determines scheduling behavior. Any file that doesn't match a known
prefix — including files in subdirectories or files prefixed with `helper_` — is ignored by
Test Composer and can be used for shared logic.
prefix — including files in subdirectories or files prefixed with `helper_` — is ignored by Test
Composer and can be used for shared logic.
### Driver commands
@ -27,18 +27,18 @@ Run during fault injection periods. At least one driver or `anytime_*` command i
- **`parallel_driver_<name>`** — runs concurrently with other parallel drivers, including itself.
Use for continuous client operations, parallel workloads, and availability checks under faults.
- **`singleton_driver_<name>`** — runs as the only active driver in a history branch.
Use for porting existing integration tests or workloads that shouldn't overlap with other drivers.
- **`singleton_driver_<name>`** — runs as the only active driver in a history branch. Use for
porting existing integration tests or workloads that shouldn't overlap with other drivers.
- **`serial_driver_<name>`** — runs only when no other driver commands are active.
Use for validation steps and operations that require quiescence.
- **`serial_driver_<name>`** — runs only when no other driver commands are active. Use for
validation steps and operations that require quiescence.
### Quiescent commands
Run in the absence of faults.
- **`first_<name>`** — optional one-time setup that runs once before any driver commands start.
Use for data initialization, schema setup, and bootstrapping.
- **`first_<name>`** — optional one-time setup that runs once before any driver commands start. Use
for data initialization, schema setup, and bootstrapping.
- **`eventually_<name>`** — runs after driver commands start; halts all drivers and stops faults,
creating a new history branch. Use for testing eventual consistency and post-recovery state.
@ -57,8 +57,8 @@ Run in the absence of faults.
### `basic_js_commands`
Parallel JavaScript workload against a single `mongod`. All commands share retry logic defined in
[`js/commands.js`](basic_js_commands/js/commands.js) that handles transient network errors,
server selection failures, and retryable write errors.
[`js/commands.js`](basic_js_commands/js/commands.js) that handles transient network errors, server
selection failures, and retryable write errors.
| Script | Function | Notes |
| ------------------------------------------------ | ----------------------------- | --------------------------------------------------------------------------- |
@ -86,13 +86,13 @@ infrastructure for Test Composer. Both scripts use
## Best practices
- **Retry logic** — always handle transient network errors and server selection failures.
See [`commands.js`](basic_js_commands/js/commands.js) for a reusable retry wrapper.
- **Retry logic** — always handle transient network errors and server selection failures. See
[`commands.js`](basic_js_commands/js/commands.js) for a reusable retry wrapper.
- **Randomize** — the more variation you introduce, the more state space Antithesis can explore.
Antithesis controls and can reproduce the random seed, so interesting paths can be re-explored.
- **Idempotency** — design scripts to tolerate being killed and restarted at any point.
- **Start simple** — begin with a `singleton_driver_*` to port an existing test, then evolve
toward parallel drivers as confidence grows.
- **Start simple** — begin with a `singleton_driver_*` to port an existing test, then evolve toward
parallel drivers as confidence grows.
## Running locally
@ -126,8 +126,8 @@ docker compose -f docker_compose/<suite_name>/docker-compose.yml \
/opt/antithesis/test/v1/basic_js_commands/parallel_driver_mongod_aggregate.sh
```
The `/scripts/print_connection_string.sh` helper used by each script is generated automatically
from the resmoke fixture's connection string and placed in the config image during the build step.
The `/scripts/print_connection_string.sh` helper used by each script is generated automatically from
the resmoke fixture's connection string and placed in the config image during the build step.
## Adding a new template

View File

@ -4,13 +4,19 @@ This directory is a bazel rule we use to ship common code between bazel repos
# Using in your repo
1. Look at the latest version in [this](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml) file
1. Look at the latest version in
[this](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml)
file
2. Get the sha of the latest release at https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz.sha256
2. Get the sha of the latest release at
https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz.sha256
3. Get the link to the latest version at https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz
3. Get the link to the latest version at
https://mdb-build-public.s3.amazonaws.com/bazel_rules_mongo/{version}/bazel_rules_mongo.tar.gz
4. Add this as a http archive to your repo and implement the dependencies listed in the [WORKSPACE](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/WORKSPACE.bazel) file. It will look something like this
4. Add this as a http archive to your repo and implement the dependencies listed in the
[WORKSPACE](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/WORKSPACE.bazel)
file. It will look something like this
```
# Poetry rules for managing Python dependencies
@ -50,7 +56,8 @@ poetry(
)
```
5. Use the rule however you see fit! For example to add `bazel run codeowners` to your repo you can add the following to your root `BUILD.bazel` file
5. Use the rule however you see fit! For example to add `bazel run codeowners` to your repo you can
add the following to your root `BUILD.bazel` file
```
alias(
@ -61,5 +68,7 @@ alias(
# Deploying
When you are ready for a new version to be released, bump the version in the [pyproject.toml](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml) file.
This will be deployed the next time the `package_bazel_rules_mongo` task runs (nightly). You can schedule this earlier in the waterfall when your pr is merged if you want it quicker.
When you are ready for a new version to be released, bump the version in the
[pyproject.toml](https://github.com/mongodb/mongo/blob/master/buildscripts/bazel_rules_mongo/pyproject.toml)
file. This will be deployed the next time the `package_bazel_rules_mongo` task runs (nightly). You
can schedule this earlier in the waterfall when your pr is merged if you want it quicker.

View File

@ -3,4 +3,5 @@ This is cltcache.py.txt taken from
CLTCACHE_URL = "https://raw.githubusercontent.com/freedick/cltcache/1.2.2/src/cltcache/cltcache.py"
CLTCACHE_SHA256 = "30d9bf6d3615eab1826d5e24aea54873de034014c1e77506c9ff983e1e858b3c"
A small simple clang tidy cacher used with vscode which does not use bazel to run clang tidy. The extension is used to avoid linting and changing the file from its source.
A small simple clang tidy cacher used with vscode which does not use bazel to run clang tidy. The
extension is used to avoid linting and changing the file from its source.

View File

@ -18,7 +18,8 @@ source python3-venv/bin/activate
(python3-venv) bazel build --config=opt install-devcore
```
3. Run mongod instance (only for CBR calibration, because join_start.py manages mongod's lifecycle itself):
3. Run mongod instance (only for CBR calibration, because join_start.py manages mongod's lifecycle
itself):
```sh
(python3-venv) bazel-bin/install-mongod/bin/mongod --setParameter internalMeasureQueryExecutionTimeInNanoseconds=true
@ -74,16 +75,21 @@ source cm/bin/activate
```sh
(cm) python join_start.py
```
To skip the constant calibration (warm scan, CPU, sequential I/O, random I/O) and only run the join algorithm comparison:
To skip the constant calibration (warm scan, CPU, sequential I/O, random I/O) and only run the
join algorithm comparison:
```sh
(cm) python join_start.py --join-only
```
To iterate quickly on cost model changes, reuse pre-recorded execution times from a previous full run. This skips actual query execution, only running `queryPlanner` explains to collect fresh cost estimates:
To iterate quickly on cost model changes, reuse pre-recorded execution times from a previous full
run. This skips actual query execution, only running `queryPlanner` explains to collect fresh cost
estimates:
```sh
(cm) python join_start.py --execution-times join_output/join_times_in-cache.csv join_output/join_times_exceeds-cache.csv
```
**Note:** For CBR calibration, the first time it will take a while since it has to generate the data. Afterwards, as long as you aren't modifying the collections, you can comment out `await generator.populate_collections()` in `start.py` - this will make it a lot faster.
**Note:** For CBR calibration, the first time it will take a while since it has to generate the
data. Afterwards, as long as you aren't modifying the collections, you can comment out
`await generator.populate_collections()` in `start.py` - this will make it a lot faster.
8. When done, deactivate the environment:

View File

@ -1 +1,2 @@
> Content moved to [buildscripts/resmokeconfig/suites/README.md](../../buildscripts/resmokeconfig/suites/README.md).
> Content moved to
> [buildscripts/resmokeconfig/suites/README.md](../../buildscripts/resmokeconfig/suites/README.md).

View File

@ -1,13 +1,14 @@
# mongo gpg builds
This directory contains a script to produce **portable `gpg` binaries** for all our supported linux platforms:
This directory contains a script to produce **portable `gpg` binaries** for all our supported linux
platforms:
- **Linux** (`manylinux2014` glibc 2.17 baseline): `x86_64`, `aarch64`, `s390x`, `ppc64le`
In particular, it builds gnupg-2.5.16 from source.
This script is used to generate the binaries that we use bring into bazel as a dependency to sign test extensions.
All artifacts are placed in the `dist/` directory.
This script is used to generate the binaries that we use bring into bazel as a dependency to sign
test extensions. All artifacts are placed in the `dist/` directory.
---
@ -61,8 +62,8 @@ ARCH=ppc64le PLATFORM=linux/ppc64le ./build_gpg_manylinux.sh
## 📜 License & Attribution
These scripts build **gpg** and its required dependencies from sources originally obtained from:
👉 <https://www.gnupg.org/ftp/gcrypt/gnupg/> and <https://gnupg.org/download/index.html>
These scripts build **gpg** and its required dependencies from sources originally obtained from: 👉
<https://www.gnupg.org/ftp/gcrypt/gnupg/> and <https://gnupg.org/download/index.html>
The exact sources can be obtained at the following URLs:

View File

@ -1,12 +1,14 @@
# mongo rapidyaml wheel builds
This directory contains scripts to produce versioned `rapidyaml` wheels that can be uploaded to S3 and consumed directly instead of building from the git dependency in `pyproject.toml`.
This directory contains scripts to produce versioned `rapidyaml` wheels that can be uploaded to S3
and consumed directly instead of building from the git dependency in `pyproject.toml`.
The scripts default to the `rapidyaml` commit currently pinned in `pyproject.toml`:
- `a5d485fd44719e1c03e059177fc1f695fc462b66`
They also require `RAPIDYAML_VERSION` to be set explicitly. The MongoDB fork does not currently publish git tags, so `setuptools-scm` cannot infer a stable release version on its own.
They also require `RAPIDYAML_VERSION` to be set explicitly. The MongoDB fork does not currently
publish git tags, so `setuptools-scm` cannot infer a stable release version on its own.
All artifacts are written to `dist/`.
@ -47,11 +49,14 @@ RAPIDYAML_VERSION=0.9.0.post0 ARCH=ppc64le PLATFORM=linux/ppc64le ./build_rapidy
### macOS
Run the script on each target macOS architecture you want to publish. The script intentionally builds for the host arch only, which keeps wheel tags and interpreter usage straightforward.
Run the script on each target macOS architecture you want to publish. The script intentionally
builds for the host arch only, which keeps wheel tags and interpreter usage straightforward.
The script creates and uses a temporary virtualenv, so it works with Homebrew-managed Python installations that reject direct `pip install` into the system environment.
The script creates and uses a temporary virtualenv, so it works with Homebrew-managed Python
installations that reject direct `pip install` into the system environment.
It also leaves `Python.framework` external during delocation, so the wheel should be built with the same Python distribution family you expect consumers to use.
It also leaves `Python.framework` external during delocation, so the wheel should be built with the
same Python distribution family you expect consumers to use.
```bash
RAPIDYAML_VERSION=0.9.0.post0 PYTHON_BIN=python3.13 ./build_rapidyaml_macos.sh
@ -67,15 +72,19 @@ $env:PYTHON_BIN = "C:\Python313\python.exe"
.\build_rapidyaml_windows_x64.ps1
```
Note: `pyproject.toml` currently excludes `rapidyaml` on Windows, so a Windows wheel is only needed if that marker changes later.
Note: `pyproject.toml` currently excludes `rapidyaml` on Windows, so a Windows wheel is only needed
if that marker changes later.
## Build Behavior
- The Linux script builds inside the appropriate `manylinux2014` image and runs `auditwheel repair`.
- The macOS script creates a temporary virtualenv, installs its build tooling there, and runs `delocate-wheel` while excluding `Python.framework` from bundling.
- The macOS script creates a temporary virtualenv, installs its build tooling there, and runs
`delocate-wheel` while excluding `Python.framework` from bundling.
- The Windows script runs `delvewheel repair` after building.
- Every script clones the `mongodb-forks/rapidyaml` repo, checks out the requested ref, initializes submodules, builds a wheel, and performs a simple `import ryml` smoke test.
- Linux defaults to `cp313-cp313`, which matches the repo's current Python version. Override that when you need a wheel for a different interpreter.
- Every script clones the `mongodb-forks/rapidyaml` repo, checks out the requested ref, initializes
submodules, builds a wheel, and performs a simple `import ryml` smoke test.
- Linux defaults to `cp313-cp313`, which matches the repo's current Python version. Override that
when you need a wheel for a different interpreter.
## Environment Variables
@ -94,7 +103,8 @@ Note: `pyproject.toml` currently excludes `rapidyaml` on Windows, so a Windows w
## Consuming the Wheels
Once the wheels are uploaded, you can replace the current git dependency in `pyproject.toml` with URL-based entries scoped by platform markers.
Once the wheels are uploaded, you can replace the current git dependency in `pyproject.toml` with
URL-based entries scoped by platform markers.
For example:

View File

@ -1,12 +1,14 @@
# mongo ripgrep builds
This directory contains scripts to produce **portable, high-performance `ripgrep` binaries** for all major platforms:
This directory contains scripts to produce **portable, high-performance `ripgrep` binaries** for all
major platforms:
- **Linux** (`manylinux2014` glibc 2.17 baseline): `x86_64`, `aarch64`, `s390x`, `ppc64le`
- **macOS** universal2 (`x86_64` + `arm64`)
- **Windows** x86_64 (MSVC)
Each build uses **bundled static PCRE2**, **LTO**, and conservative CPU baselines to maximize portability.
Each build uses **bundled static PCRE2**, **LTO**, and conservative CPU baselines to maximize
portability.
All artifacts are placed in the `dist/` directory.
---

View File

@ -1,54 +1,79 @@
# Block-on-Red
> **TL;DR:** During times of high BF volume, code approvals and merging in 10gen/mongo master will be restricted to only allow changes that help reduce BFs, Bugs, Performance Regressions, and paying down technical debt.
> **TL;DR:** During times of high BF volume, code approvals and merging in 10gen/mongo master will
> be restricted to only allow changes that help reduce BFs, Bugs, Performance Regressions, and
> paying down technical debt.
### Motivation
The master branch should remain stable to develop the Server efficiently, and to be within 30 days of releasing at all times. If it becomes too unstable, or "too red," we want to aggressively focus on getting it back into the green. As a side benefit to releasability, a "greener" build should make patch build failures more meaningful. This will also reduce release time stress by having the release time period look and feel more like normal business.
The master branch should remain stable to develop the Server efficiently, and to be within 30 days
of releasing at all times. If it becomes too unstable, or "too red," we want to aggressively focus
on getting it back into the green. As a side benefit to releasability, a "greener" build should make
patch build failures more meaningful. This will also reduce release time stress by having the
release time period look and feel more like normal business.
### Strategy
Each team carries a quota (see below for details). When a team exceeds their quota - they enter a "code lockdown".
Each team carries a quota (see below for details). When a team exceeds their quota - they enter a
"code lockdown".
- **Team Level**: The intention here is to stop work with a small blast radius in the first instance, and address the releasability risk from that team and their owned code.
- **VP Level**: We roll the quotas up to a VPs entire organization as the next step of "code lockdown". The expectation is that redirecting resources within a VPs organization to help address BFs is likely more effective and less disruptive than a global freeze.
- **Global Level**: Finally, if the global quota is exceeded, the entire server organization enters a "code lockdown" until we meet the threshold for unfreezing.
- **Team Level**: The intention here is to stop work with a small blast radius in the first
instance, and address the releasability risk from that team and their owned code.
- **VP Level**: We roll the quotas up to a VPs entire organization as the next step of "code
lockdown". The expectation is that redirecting resources within a VPs organization to help
address BFs is likely more effective and less disruptive than a global freeze.
- **Global Level**: Finally, if the global quota is exceeded, the entire server organization enters
a "code lockdown" until we meet the threshold for unfreezing.
## Impact of a "Code Lockdown"
### Allowed Code Changes
During a "code lockdown," Code Owners are expected to only approve **work that closes BFs or helps us reduce/avoid the _next_ Blocking state**. i.e. aimed at fixing a BF, a class of BFs, bugs, performance regression, etc.
During a "code lockdown," Code Owners are expected to only approve **work that closes BFs or helps
us reduce/avoid the _next_ Blocking state**. i.e. aimed at fixing a BF, a class of BFs, bugs,
performance regression, etc.
If your PR does not meet this criteria, it may be pending for some time until the system becomes unblocked. There are of course reasonable exceptions, below.
If your PR does not meet this criteria, it may be pending for some time until the system becomes
unblocked. There are of course reasonable exceptions, below.
### Feature Work
**All feature work stops** during a "code lockdown."
In exceptional circumstances VPs can approve exceptions.
**All feature work stops** during a "code lockdown." In exceptional circumstances VPs can approve
exceptions.
### Non-feature Work
We understand that in many cases addressing the larger BF problem requires refactoring, modularity improvements, changes to our test and paying down other kinds of **technical debt**. During a "code lockdown" this work is **expressly permitted and mergeable** - with the guidance that teams index heavily on risk when deciding what to work on. If a piece of work feels like it makes the BF problem worse before it gets better, talk to your director about how to proceed.
We understand that in many cases addressing the larger BF problem requires refactoring, modularity
improvements, changes to our test and paying down other kinds of **technical debt**. During a "code
lockdown" this work is **expressly permitted and mergeable** - with the guidance that teams index
heavily on risk when deciding what to work on. If a piece of work feels like it makes the BF problem
worse before it gets better, talk to your director about how to proceed.
Allowable Examples (not exclusive):
- Refactoring components to make them more unit testable
- Increasing code coverage through high quality tests that block PRs
- Making the development loop faster (decreasing build times, fixing slow tests, etc)
- Improving guardrails that improve code quality (fixing clang-tidy warnings, compiler warnings, etc)
- Improving guardrails that improve code quality (fixing clang-tidy warnings, compiler warnings,
etc)
If a team is in a lockdown, but the rest of the org is not - their focus should likely skew towards work that expedites their lockdown exit.
If a team is in a lockdown, but the rest of the org is not - their focus should likely skew towards
work that expedites their lockdown exit.
If the org is in a lockdown, but a team doesnt have BFs to work on - they should balance helping other teams with the work theyve identified as addressing the underlying BF problem.
If the org is in a lockdown, but a team doesnt have BFs to work on - they should balance helping
other teams with the work theyve identified as addressing the underlying BF problem.
The higher the risk of the work, the more involvement the Staff+ engineers and the Director/VP should have in the decision about what is ok to merge and what isnt.
The higher the risk of the work, the more involvement the Staff+ engineers and the Director/VP
should have in the decision about what is ok to merge and what isnt.
### Code Owner Responsibilities
Code Owners should join the `#10gen-mongo-code-lockdown` Slack channel to receive daily updates on the status of the build. It produces daily metrics with instructions if there is a state change.
Code Owners should join the `#10gen-mongo-code-lockdown` Slack channel to receive daily updates on
the status of the build. It produces daily metrics with instructions if there is a state change.
If we change to a blocking state, code owners should use their discretion to only approve changes that are allowed (see above). If we exit the blocking state, code owners should approve PRs as usual.
If we change to a blocking state, code owners should use their discretion to only approve changes
that are allowed (see above). If we exit the blocking state, code owners should approve PRs as
usual.
## Quotas and State-Changes
@ -74,21 +99,31 @@ This shows relevant JIRA queries for a more live and interactive view of the sta
### BFs remaining open only on older branches
Some teams may fix a BF in master, but are "waiting for fix" on older branches, which keeps the BF counted against the thresholds. Guidance here is currently evolving.
Some teams may fix a BF in master, but are "waiting for fix" on older branches, which keeps the BF
counted against the thresholds. Guidance here is currently evolving.
If the build failure is not frequently occurring, it can be marked as P5-Trivial, and it wont count towards your teams build failures for the block merge.
If the build failure is not frequently occurring, it can be marked as P5-Trivial, and it wont count
towards your teams build failures for the block merge.
As we iterate on our processes for this, the `exclude-from-master-quota` label can be used to exclude BFs that should not be included in these quotas. The expectation is that this is an interim solution as we improve our processes especially around BFs that remain open pending backports.
As we iterate on our processes for this, the `exclude-from-master-quota` label can be used to
exclude BFs that should not be included in these quotas. The expectation is that this is an interim
solution as we improve our processes especially around BFs that remain open pending backports.
Specifically:
- If a BF is only waiting for a backport on a branch older than master, apply the `exclude-from-master-quota` label to the ticket.
- If a BF is failing on master, not a serious bug (or a test-only issue that can't affect the real clients), not noisy, and we are choosing not to fix it, set the Priority to `P5 - Trivial` and apply the `keep-trivial` label.
- If a BF is failing on an older branch and we are choosing not to backport a fix, set the `Priority to P5 - Trivial` and apply the `keep-trivial-X.Y` label appropriately.
- If a BF is only waiting for a backport on a branch older than master, apply the
`exclude-from-master-quota` label to the ticket.
- If a BF is failing on master, not a serious bug (or a test-only issue that can't affect the real
clients), not noisy, and we are choosing not to fix it, set the Priority to `P5 - Trivial` and
apply the `keep-trivial` label.
- If a BF is failing on an older branch and we are choosing not to backport a fix, set the
`Priority to P5 - Trivial` and apply the `keep-trivial-X.Y` label appropriately.
## Contributing
For any new proposals, changes to thresholds, or concerns regarding their application, please escalate to your Director/VP. **We want advocacy from all levels to make this a successful change to our engineering culture.**
For any new proposals, changes to thresholds, or concerns regarding their application, please
escalate to your Director/VP. **We want advocacy from all levels to make this a successful change to
our engineering culture.**
### CLI
@ -100,7 +135,9 @@ python buildscripts/monitor_build_status/cli.py --help
### Testing locally
For Jira API authentication, use the `JIRA_AUTH_PAT` env variable. More about Jira Personal Access Tokens (PATs) can be found [here](https://wiki.corp.mongodb.com/pages/viewpage.action?pageId=218995581).
For Jira API authentication, use the `JIRA_AUTH_PAT` env variable. More about Jira Personal Access
Tokens (PATs) can be found
[here](https://wiki.corp.mongodb.com/pages/viewpage.action?pageId=218995581).
Use your PAT to run the following and output its results:
@ -112,4 +149,6 @@ The above will _not_ send notifications to the Slack channel.
### Slack Notifications
Slack notifications use a webhook from the Devprod Correctness Slack app (rather than user credentials) for security. The webhook URL is read from the `mongo-code-lockdown-webhook` Evergreen expansion, which points to the `#10gen-mongo-code-lockdown` Slack channel.
Slack notifications use a webhook from the Devprod Correctness Slack app (rather than user
credentials) for security. The webhook URL is read from the `mongo-code-lockdown-webhook` Evergreen
expansion, which points to the `#10gen-mongo-code-lockdown` Slack channel.

View File

@ -3,27 +3,24 @@
## Summary
Matrix Suites are defined as a combination of explict
[suite files](../../../buildscripts/resmokeconfig/suites/README.md)
and a set of "overrides" for specific keys. The intention is
to avoid duplication of suite definitions as much as
possible with the eventual goal of having most suites be
fully composed of reusable sections.
[suite files](../../../buildscripts/resmokeconfig/suites/README.md) and a set of "overrides" for
specific keys. The intention is to avoid duplication of suite definitions as much as possible with
the eventual goal of having most suites be fully composed of reusable sections.
## Usage
Matrix suites behave like regular suites for all functionality in resmoke.py,
including `list-suites`, `find-suites` and `run --suites=[SUITE]`.
Matrix suites behave like regular suites for all functionality in resmoke.py, including
`list-suites`, `find-suites` and `run --suites=[SUITE]`.
## Writing a matrix suite mapping file.
Matrix suites consist of a mapping, and a set of overrides in
their eponymous directories. When you are done writing the mapping file, you must
Matrix suites consist of a mapping, and a set of overrides in their eponymous directories. When you
are done writing the mapping file, you must
[generate the matrix suite file.](#generating-matrix-suites)
The "mappings" directory contains YAML files that each contain a suite definition.
Each suite definition includes `base_suite`, and a list of
modifiers. There is also an optional `description` field that will get output
with the local resmoke invocation.
The "mappings" directory contains YAML files that each contain a suite definition. Each suite
definition includes `base_suite`, and a list of modifiers. There is also an optional `description`
field that will get output with the local resmoke invocation.
The fields of modifiers are the following:
@ -33,30 +30,29 @@ The fields of modifiers are the following:
4. extends
Each modifier field is a dot-delimited-notation representing the file and field of the modification.
All modifier fields must be in a yaml file in the `overrides` directory
For example `encryption.mongodfixture_ese` would reference the `mongodfixture_ese` field
inside of the `encryption.yml` file inside of the `overrides` directory.
All modifier fields must be in a yaml file in the `overrides` directory For example
`encryption.mongodfixture_ese` would reference the `mongodfixture_ese` field inside of the
`encryption.yml` file inside of the `overrides` directory.
### overrides
All fields referenced in the `overrides` section of the mappings file will overwrite the specified
fields in the `base_suite`.
The `overrides` modifier takes precidence over the `excludes` and `eval` modifiers.
The `overrides` list will be processed in order so order can matter if multiple override modifiers
try to overwrite the same field in the base_suite.
fields in the `base_suite`. The `overrides` modifier takes precidence over the `excludes` and `eval`
modifiers. The `overrides` list will be processed in order so order can matter if multiple override
modifiers try to overwrite the same field in the base_suite.
### excludes
All fields referenced in the `excludes` section of the mappings file will append to the specified
`exclude` fields in the base suite.
The only two valid options in the referenced modifier field are `exclude_with_any_tags` and
`exclude_files`. They are appended in the order they are specified in the mappings file.
`exclude` fields in the base suite. The only two valid options in the referenced modifier field are
`exclude_with_any_tags` and `exclude_files`. They are appended in the order they are specified in
the mappings file.
### eval
All fields referenced in the `eval` section of the mappings file will append to the specified
`config.shell_options.eval` field in the base suite.
They are appended in the order they are specified in the mappings file.
`config.shell_options.eval` field in the base suite. They are appended in the order they are
specified in the mappings file.
### extends
@ -69,9 +65,8 @@ modifiers), the key being extended must already exist and also be a list.
The generated matrix suites live in the `buildscripts/resmokeconfig/matrix_suites/generated_suites`
directory. These files may be edited for local testing but must remain consistent with the mapping
files. There is a task in the commit queue that enforces this. To generate a new version of these
matrix suites, you may run
`buildscripts/resmoke.py generate-matrix-suites`. This command
will overwrite the current generated matrix suites on disk so make sure you do not have any unsaved
matrix suites, you may run `buildscripts/resmoke.py generate-matrix-suites`. This command will
overwrite the current generated matrix suites on disk so make sure you do not have any unsaved
changes to these files.
## Validating matrix suites
@ -82,5 +77,4 @@ ensures that the files are validated.
## FAQ
For questions about the user or authorship experience,
please reach out in #server-testing.
For questions about the user or authorship experience, please reach out in #server-testing.

View File

@ -2,7 +2,8 @@
Test "suites" are configuration files that group which tests to run, and how.
Yaml files enumerate the test files that the suite encompasses, as well as any test fixtures and their configurations to leverage, options for the shell, hooks, and more.
Yaml files enumerate the test files that the suite encompasses, as well as any test fixtures and
their configurations to leverage, options for the shell, hooks, and more.
## Minimal Example
@ -64,7 +65,8 @@ Example:
test_kind: js_test
```
See all supported kinds in [`buildscripts/resmokelib/testing/testcases`](../../../buildscripts/resmokelib/testing/testcases/README.md).
See all supported kinds in
[`buildscripts/resmokelib/testing/testcases`](../../../buildscripts/resmokelib/testing/testcases/README.md).
## `selector`
@ -89,25 +91,34 @@ File path(s) of test files to include. If a path without a glob is provided, it
### `selector.root`
A file containing glob patterns, one per line, typically used by test_kind cpp_unit_test (usually build/unittests.txt). Specifies which tests to consider for including into the suite. If no other options are specified, these are the tests that will be run. Glob patterns are supported (and common) here.
A file containing glob patterns, one per line, typically used by test_kind cpp_unit_test (usually
build/unittests.txt). Specifies which tests to consider for including into the suite. If no other
options are specified, these are the tests that will be run. Glob patterns are supported (and
common) here.
### `selector.include_files`
A list of strings representing glob patterns. Includes only this subset of tests in the suite. These files will be included even if they would otherwise be excluded by tags. Will error if a test specified here was not included in the roots.
A list of strings representing glob patterns. Includes only this subset of tests in the suite. These
files will be included even if they would otherwise be excluded by tags. Will error if a test
specified here was not included in the roots.
### `selector.exclude_files`
A list of strings representing glob patterns. Excludes this list of tests from the suite. These files will be excluded even if they would otherwise be included by tags. Will error if a test specified here was not included in the roots.
A list of strings representing glob patterns. Excludes this list of tests from the suite. These
files will be excluded even if they would otherwise be included by tags. Will error if a test
specified here was not included in the roots.
### `selector.include_with_any_tags`
A list of strings. Only jstests which define a list of tags which includes any of these tags will be included in the suite, unless otherwise excluded by filename.
A list of strings. Only jstests which define a list of tags which includes any of these tags will be
included in the suite, unless otherwise excluded by filename.
To see all tags referenced across suites, run `./buildscripts/resmoke.py list-tags`.
### `selector.exclude_with_any_tags`
A list of strings. Any jstest which defines a list of tags which includes any of these tags will be excluded from the suite, unless otherwise included by filename.
A list of strings. Any jstest which defines a list of tags which includes any of these tags will be
excluded from the suite, unless otherwise included by filename.
To see all tags referenced across suites, run `./buildscripts/resmoke.py list-tags`.
@ -118,9 +129,8 @@ Defines how the tests will be executed.
### `executor.config`
This section contains additional configuration for each test. The structure of this can vary
significantly based on the `test_kind`. For specific information, you can look at the
implementation of the `test_kind` of concern in the `buildscripts/resmokelib/testing/testcases`
directory.
significantly based on the `test_kind`. For specific information, you can look at the implementation
of the `test_kind` of concern in the `buildscripts/resmokelib/testing/testcases` directory.
Example:
@ -147,7 +157,9 @@ Any parameters (besides `global_vars`) will directly be passed to the mongo shel
##### `executor.config.shell_options.global_vars`
Will use this as the base for the string passed to `--eval`. Anything specified in `shell_options.eval` will be appended after these. Formats any objects so that they will evaluate properly as a string.
Will use this as the base for the string passed to `--eval`. Anything specified in
`shell_options.eval` will be appended after these. Formats any objects so that they will evaluate
properly as a string.
`global_vars` allows for setting global variables. A `TestData` object is a special global variable
that is used to hold testing data. Parts of `TestData` can be updated via `resmoke` command-line
@ -156,8 +168,8 @@ intelligently and made available to the `js_test` running. Behavior can vary on
in general this is the order of precedence: (1) resmoke command-line (2) [suite].yml (3)
runtime/default.
The mongo shell can also be invoked with flags &
named arguments. Flags must have the `''` value, such as in the case for `nodb` above.
The mongo shell can also be invoked with flags & named arguments. Flags must have the `''` value,
such as in the case for `nodb` above.
`eval` can also be used to run generic javascript code in the shell. You can directly include
javascript code, or you can put it in a separate script & `load` it.
@ -166,11 +178,12 @@ javascript code, or you can put it in a separate script & `load` it.
Specify hooks to run before, after, and between individual tests to execute specified logic.
> Read more about hooks in [buildscripts/resmokelib/testing/hooks/README.md](../../../buildscripts/resmokelib/testing/hooks/README.md)
> Read more about hooks in
> [buildscripts/resmokelib/testing/hooks/README.md](../../../buildscripts/resmokelib/testing/hooks/README.md)
The hook name in the `.yml` must match its Python class name of the hook. Parameters can also be included in the `.yml`
and will be passed to the hook's constructor (the `hook_logger` & `fixture` parameters are
automatically included, so those should not be included in the `.yml`).
The hook name in the `.yml` must match its Python class name of the hook. Parameters can also be
included in the `.yml` and will be passed to the hook's constructor (the `hook_logger` & `fixture`
parameters are automatically included, so those should not be included in the `.yml`).
Example:
@ -190,9 +203,11 @@ hooks:
Specify a test fixture to run around the tests.
> Read more about fixtures in [buildscripts/resmokelib/testing/fixtures/README.md](../../../buildscripts/resmokelib/testing/fixtures/README.md).
> Read more about fixtures in
> [buildscripts/resmokelib/testing/fixtures/README.md](../../../buildscripts/resmokelib/testing/fixtures/README.md).
The `class` sub-field corresponds to the Python class name of a fixture. All other sub-fields are passed into the constructor of the fixture. These sub-fields will vary based on the fixture used.
The `class` sub-field corresponds to the Python class name of a fixture. All other sub-fields are
passed into the constructor of the fixture. These sub-fields will vary based on the fixture used.
Example:
@ -238,4 +253,5 @@ Read more about [hooks](../../../buildscripts/resmokelib/testing/hooks/README.md
#### `executor.archive.tests`
Specify a list of test files to archive on failure. Wildcard selection a valid. Set to `true` to archive _all_ tests.
Specify a list of test files to archive on failure. Wildcard selection a valid. Set to `true` to
archive _all_ tests.

View File

@ -2,11 +2,13 @@
Resmoke is MongoDB's integration test runner.
The JS Tests it can run live in the `jstests/` directory - reference its [README](../../jstests/README.md) to learn about their content.
The JS Tests it can run live in the `jstests/` directory - reference its
[README](../../jstests/README.md) to learn about their content.
## Build
Though the source is built with bazel, resmoke is not yet integrated. This means that the source has to be built prior to using resmoke, eg:
Though the source is built with bazel, resmoke is not yet integrated. This means that the source has
to be built prior to using resmoke, eg:
```
bazel build install-dist-test
@ -41,11 +43,13 @@ bazel build install-dist-test
Generate a mongod.conf and mongos.conf using config fuzzer.
```
Note: `bisect`, `setup-multiversion`, and `symbolize` commands have been moved to [`db-contrib-tool`](https://github.com/10gen/db-contrib-tool#readme).
Note: `bisect`, `setup-multiversion`, and `symbolize` commands have been moved to
[`db-contrib-tool`](https://github.com/10gen/db-contrib-tool#readme).
## Suites
Many of the above commands use the concept of a "suite". Loosely, suites group which tests run, and how.
Many of the above commands use the concept of a "suite". Loosely, suites group which tests run, and
how.
Read more about suites [here](../../buildscripts/resmokeconfig/suites/README.md).
@ -59,43 +63,47 @@ The most typical approach is to run a particular JS test file given a suite, eg:
buildscripts/resmoke.py run --suites=no_passthrough jstests/noPassthrough/shell/js/string.js
```
That executes the content of that file, using the suite configuration as a fixture setup. The suite "no_passthrough" is associated with the file [buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml).
That executes the content of that file, using the suite configuration as a fixture setup. The suite
"no_passthrough" is associated with the file
[buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml).
Run has **100+ flags**! Use `resmoke run --help` to inspect them. To avoid risk of multiple sources of truth that can drift and become stale, **we do not attempt to document them all here** - they should each be self-descriptive and documented within the CLI help.
Run has **100+ flags**! Use `resmoke run --help` to inspect them. To avoid risk of multiple sources
of truth that can drift and become stale, **we do not attempt to document them all here** - they
should each be self-descriptive and documented within the CLI help.
Below are very high-level descriptions for high-usage flags.
### Suites (`--suites`)
The run subcommand can run suites (list of tests and the MongoDB topology and
configuration to run them against), and explicitly named test files.
The run subcommand can run suites (list of tests and the MongoDB topology and configuration to run
them against), and explicitly named test files.
A single suite can be specified using the `--suite` flag, and multiple suites
can be specified by providing a comma separated list to the `--suites` flag.
A single suite can be specified using the `--suite` flag, and multiple suites can be specified by
providing a comma separated list to the `--suites` flag.
Additional documentation on our suite configuration can be found in
[buildscripts/resmokeconfig/suites/README.md](../../buildscripts/resmokeconfig/suites/README.md).
### Testable Installations (`--installDir`)
resmoke can run tests against any testable installation of MongoDB (such
as ASAN, Debug, Release). When possible, resmoke will automatically locate and
run with a locally built copy of MongoDB Server, so long as that build was
installed to a subdirectory of the root of the git repository, and there is
exactly one build. In other situations, the `--installDir` flag, passed to run
subcommand, can be used to indicate the location of the mongod/mongos binaries.
resmoke can run tests against any testable installation of MongoDB (such as ASAN, Debug, Release).
When possible, resmoke will automatically locate and run with a locally built copy of MongoDB
Server, so long as that build was installed to a subdirectory of the root of the git repository, and
there is exactly one build. In other situations, the `--installDir` flag, passed to run subcommand,
can be used to indicate the location of the mongod/mongos binaries.
As an alternative, you may instead prefer to use the resmoke.py wrapper script
located in the same directory as the mongod binary, which will automatically
set `installDir` for you.
As an alternative, you may instead prefer to use the resmoke.py wrapper script located in the same
directory as the mongod binary, which will automatically set `installDir` for you.
Note that this wrapper is unavailable in packaged installations of MongoDB
Server, such as those provided by Homebrew, and other package managers. If you
would like to run tests against a packaged installation, you must explicitly
pass `--installDir` to resmoke.py
Note that this wrapper is unavailable in packaged installations of MongoDB Server, such as those
provided by Homebrew, and other package managers. If you would like to run tests against a packaged
installation, you must explicitly pass `--installDir` to resmoke.py
### Resmoke test telemetry
We capture telemetry from resmoke using open telemetry.
Using open telemetry (OTel) we capture more specific information about the internals of resmoke. This data is used for improvements specifically when running in evergreen. This data is captured on every resmoke invocation but only sent to honeycomb when running in evergreen. More info about how we use OTel in resmoke can be found [here](otel_resmoke.md).
Using open telemetry (OTel) we capture more specific information about the internals of resmoke.
This data is used for improvements specifically when running in evergreen. This data is captured on
every resmoke invocation but only sent to honeycomb when running in evergreen. More info about how
we use OTel in resmoke can be found [here](otel_resmoke.md).

View File

@ -1,10 +1,12 @@
# Extensions
This module provides utilities for setting up and configuring MongoDB extensions in resmoke test suites.
This module provides utilities for setting up and configuring MongoDB extensions in resmoke test
suites.
## Overview
Extensions are dynamically loaded shared objects (`.so` files) that provide additional functionality to MongoDB. The utilities in this folder can handle:
Extensions are dynamically loaded shared objects (`.so` files) that provide additional functionality
to MongoDB. The utilities in this folder can handle:
1. Discovering extension `.so` files in build directories
2. Generating `.conf` configuration files for extensions
@ -12,7 +14,8 @@ Extensions are dynamically loaded shared objects (`.so` files) that provide addi
## Configuration File Generation in Tests
Extension `.conf` files are YAML configuration files that tell the server how to load an extension. They contain:
Extension `.conf` files are YAML configuration files that tell the server how to load an extension.
They contain:
- `sharedLibraryPath`: Path to the `.so` file
- `extensionOptions`: Optional configuration parameters for the extension
@ -30,9 +33,11 @@ extensionOptions:
The `generate_extension_configs.py` module creates `.conf` files:
1. Receives a list of `.so` file paths (either from automatic discovery via `find_and_generate_extension_configs.py`, or manually via `--so-files` command-line argument)
1. Receives a list of `.so` file paths (either from automatic discovery via
`find_and_generate_extension_configs.py`, or manually via `--so-files` command-line argument)
2. For each `.so`, creates a `.conf` file in the temp directory (`/tmp/mongo/extensions/`)
3. Looks up corresponding extension options from `src/mongo/db/extension/test_examples/configurations.yml`, if any are specified
3. Looks up corresponding extension options from
`src/mongo/db/extension/test_examples/configurations.yml`, if any are specified
4. Writes the config file with `sharedLibraryPath` and any `extensionOptions`
### Automatic Discovery and Generation

View File

@ -2,7 +2,10 @@
This is a testing feature of the mongod and mongos, built into resmoke.py!
The config fuzzer is a resmoke feature that randomizes various server parameters of both mongod and mongos on startup. These fuzzed parameters should not affect the correctness of any tests. Therefore, the config fuzzer can be enabled for any test or suite run with resmoke to ensure the database is resilient to abnormal server configurations.
The config fuzzer is a resmoke feature that randomizes various server parameters of both mongod and
mongos on startup. These fuzzed parameters should not affect the correctness of any tests.
Therefore, the config fuzzer can be enabled for any test or suite run with resmoke to ensure the
database is resilient to abnormal server configurations.
More information can be displayed in the resmoke --help output:
@ -25,15 +28,22 @@ The bulk of the fuzzing logic is in [mongo_fuzzer_configs.py](./mongo_fuzzer_con
## How does it work?
The config fuzzer assigns random values to various tunable parameters. Server parameters and their ranges are specified manually by developers and are not discovered automatically in any way.
The config fuzzer assigns random values to various tunable parameters. Server parameters and their
ranges are specified manually by developers and are not discovered automatically in any way.
When the above resmoke flags are used, the [plugin](./plugin.py) implicitly enables the [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) hook for testing.
When the above resmoke flags are used, the [plugin](./plugin.py) implicitly enables the
[FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py)
hook for testing.
## Where and When does it run on evergreen?
The config fuzzer is represented as a handful of evergreen tasks with "_config_fuzzer_" in the name. Search "config_fuzzer" in the [etc/](../../../etc) directory to find all the evergreen tasks.
The config fuzzer is represented as a handful of evergreen tasks with "_config_fuzzer_" in the name.
Search "config_fuzzer" in the [etc/](../../../etc) directory to find all the evergreen tasks.
Arguably the simplest evergreen task, `config_fuzzer_jsCore`, runs the "core" (i.e. `jstests/core`) resmoke suite with the config fuzzer parameters to resmoke set, and excludes some incompatible tests ([src link](https://github.com/mongodb/mongo/blob/a2e7e83a135c3096de7f360b88de1b3cdc1caaf2/etc/evergreen_yml_components/tasks/resmoke/server_divisions/durable_transactions_and_availability/tasks.yml#L1956-L1975)). Here is a sampling of some of the task names:
Arguably the simplest evergreen task, `config_fuzzer_jsCore`, runs the "core" (i.e. `jstests/core`)
resmoke suite with the config fuzzer parameters to resmoke set, and excludes some incompatible tests
([src link](https://github.com/mongodb/mongo/blob/a2e7e83a135c3096de7f360b88de1b3cdc1caaf2/etc/evergreen_yml_components/tasks/resmoke/server_divisions/durable_transactions_and_availability/tasks.yml#L1956-L1975)).
Here is a sampling of some of the task names:
- `config_fuzzer_concurrency_replication`
- `config_fuzzer_concurrency_sharded_replication`
@ -41,7 +51,10 @@ Arguably the simplest evergreen task, `config_fuzzer_jsCore`, runs the "core" (i
## Reproducing a config fuzzer failure
In the Evergreen task view, click on the Logs tab, then Task Logs, and open in Parsely. Search for "Fuzzed" ([source link](https://github.com/mongodb/mongo/blob/ca1c935aca43ca2e028507e2a878d4e12f50355b/buildscripts/resmokelib/run/__init__.py#L352-L366)). The output will look similar to this:
In the Evergreen task view, click on the Logs tab, then Task Logs, and open in Parsely. Search for
"Fuzzed"
([source link](https://github.com/mongodb/mongo/blob/ca1c935aca43ca2e028507e2a878d4e12f50355b/buildscripts/resmokelib/run/__init__.py#L352-L366)).
The output will look similar to this:
<details>
<summary>Logs</summary>
@ -112,13 +125,22 @@ In the Evergreen task view, click on the Logs tab, then Task Logs, and open in P
</details>
The log line starting with "resmoke.py invocation for local usage" and the one with "configFuzzSeed" provide an option `--configFuzzSeed=5583430894313922699` that can be used to generate the same fuzzed server parameters locally in resmoke.
The log line starting with "resmoke.py invocation for local usage" and the one with "configFuzzSeed"
provide an option `--configFuzzSeed=5583430894313922699` that can be used to generate the same
fuzzed server parameters locally in resmoke.
## Running the config fuzzer locally
Before running the Resmoke config fuzzer command, you need to obtain the necessary binaries. You can download them from the "Files" section of the `archive_dist_test` task in Evergreen (e.g., binaries from the `amazon2-arm64-compile` variant). Alternatively, if you don't require those specific binaries, you can use `db-contrib-tool` to download the binaries (e.g., by running `bazel run db-contrib-tool -- setup-repro-env master`).
Before running the Resmoke config fuzzer command, you need to obtain the necessary binaries. You can
download them from the "Files" section of the `archive_dist_test` task in Evergreen (e.g., binaries
from the `amazon2-arm64-compile` variant). Alternatively, if you don't require those specific
binaries, you can use `db-contrib-tool` to download the binaries (e.g., by running
`bazel run db-contrib-tool -- setup-repro-env master`).
To re-run a command locally that failed through the config fuzzer, you can navigate to the specific test that failed, and under files you can find a name titled "Resmoke.py Invocation for Local Usage". If you are replicating an older config fuzzer invocation, remove the command line argument "`--installDir=dist-test/bin`". A simple example command is shown below:
To re-run a command locally that failed through the config fuzzer, you can navigate to the specific
test that failed, and under files you can find a name titled "Resmoke.py Invocation for Local
Usage". If you are replicating an older config fuzzer invocation, remove the command line argument
"`--installDir=dist-test/bin`". A simple example command is shown below:
```
buildscripts/resmoke.py run jstests/noPassthrough/bulk_write_w0.js \
@ -127,7 +149,12 @@ buildscripts/resmoke.py run jstests/noPassthrough/bulk_write_w0.js \
--configFuzzSeed=7956511060361033919
```
It is easiest to pipe the output to another text file and then to analyze the output through there. The format of the file is slightly different, as you will not be able to explicitly look up Fuzzed, but you can look up one of the fuzzed config parameters to find the list of fuzzed config parameter settings. A subset of a log from running the above command on [this version](https://github.com/mongodb/mongo/commit/856e4ecd8612b19c8ba281cf23450d74b5838650) of master yields is the following:
It is easiest to pipe the output to another text file and then to analyze the output through there.
The format of the file is slightly different, as you will not be able to explicitly look up Fuzzed,
but you can look up one of the fuzzed config parameters to find the list of fuzzed config parameter
settings. A subset of a log from running the above command on
[this version](https://github.com/mongodb/mongo/commit/856e4ecd8612b19c8ba281cf23450d74b5838650) of
master yields is the following:
```
js_test:bulk_write_w0] Skip waiting to connect to node with pid=2522712, port=20040
@ -140,7 +167,8 @@ js_test:bulk_write_w0] Skip waiting to connect to node with pid=2522712, port=20
## Adding a new parameter to be fuzzed to the config fuzzer
There are two broad categories of parameters in the config fuzzer, that each have two sub-categories of parameters:
There are two broad categories of parameters in the config fuzzer, that each have two sub-categories
of parameters:
1. mongo parameters
- mongod parameters
@ -151,25 +179,43 @@ There are two broad categories of parameters in the config fuzzer, that each hav
### Adding new mongo parameters
Mongo parameters and their properties (e.g. min, max, default) are stored in [config_fuzzer_limits.py](./config_fuzzer_limits.py).
Mongo parameters and their properties (e.g. min, max, default) are stored in
[config_fuzzer_limits.py](./config_fuzzer_limits.py).
Below is a list of ways to fuzz configs which are supported without having to also change [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py).
Please ensure that you add it correctly to the `mongod` or `mongos` subdictionary.
Below is a list of ways to fuzz configs which are supported without having to also change
[mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). Please ensure that you add it correctly to the
`mongod` or `mongos` subdictionary.
You need to specify if your parameter should be fuzzed at runtime, startup, or both by declaring the `fuzz_at` key for the parameter. The `fuzz_at` key should be a list that can contain the values `startup`, `runtime`, or both. The eligible values are specified in the `set_at` keys of the corresponding `.idl` files.
You need to specify if your parameter should be fuzzed at runtime, startup, or both by declaring the
`fuzz_at` key for the parameter. The `fuzz_at` key should be a list that can contain the values
`startup`, `runtime`, or both. The eligible values are specified in the `set_at` keys of the
corresponding `.idl` files.
For a parameter that is only fuzzed at startup, the fuzzer will generate a fuzzed value for the parameter and set it when starting up the server.
For a parameter that is only fuzzed at startup, the fuzzer will generate a fuzzed value for the
parameter and set it when starting up the server.
For a parameter fuzzed at runtime, the fuzzer will generate a fuzzed value for the parameter while running the server based on a `period` key that is required for fuzzed runtime parameters.
The `period` key describes how often the parameter should be changed, in seconds. Every `period` seconds, the fuzzer will select a new random value for the parameter and use the setParameter command to update the value of the
parameter on every node in the cluster while the suite is running. This is perfomed by the [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) hook.
For a parameter fuzzed at runtime, the fuzzer will generate a fuzzed value for the parameter while
running the server based on a `period` key that is required for fuzzed runtime parameters. The
`period` key describes how often the parameter should be changed, in seconds. Every `period`
seconds, the fuzzer will select a new random value for the parameter and use the setParameter
command to update the value of the parameter on every node in the cluster while the suite is
running. This is perfomed by the
[FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py)
hook.
For parameters with complex fuzzing logic or interdependencies with other parameters, you can set `"custom_fuzz_value_assignment": True` to bypass the standard fuzzing logic. Parameters with this flag must be handled explicitly in the special handling functions (`generate_special_mongod_startup_parameters()` for startup parameters or `generate_special_runtime_parameters()` for runtime parameters). Note that parameter dependency logic is currently only supported for startup fuzzing - runtime fuzzing operates on individual parameters. See the section below on parameters requiring special handling for more details.
For parameters with complex fuzzing logic or interdependencies with other parameters, you can set
`"custom_fuzz_value_assignment": True` to bypass the standard fuzzing logic. Parameters with this
flag must be handled explicitly in the special handling functions
(`generate_special_mongod_startup_parameters()` for startup parameters or
`generate_special_runtime_parameters()` for runtime parameters). Note that parameter dependency
logic is currently only supported for startup fuzzing - runtime fuzzing operates on individual
parameters. See the section below on parameters requiring special handling for more details.
Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have as a value.
The parameters are added in order of priority chosen in the if-elif-else statement in `generate_normal_mongo_parameters()`
in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py).
So, if you added the fields `default`, `min`, and `max` for a `param`, case 4 would get evaluated over case 5.
Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have
as a value. The parameters are added in order of priority chosen in the if-elif-else statement in
`generate_normal_mongo_parameters()` in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py). So, if
you added the fields `default`, `min`, and `max` for a `param`, case 4 would get evaluated over
case 5.
1. `param = rng.uniform(min, max)`
@ -218,41 +264,59 @@ So, if you added the fields `default`, `min`, and `max` for a `param`, case 4 wo
"param": {"default": default}
```
> Note: For the default case, please add the value `"fuzz_at": ["startup"]` (the default value gets set at "startup").
> Note: For the default case, please add the value `"fuzz_at": ["startup"]` (the default value
> gets set at "startup").
If you have a parameter that depends on another parameter being generated (see `throughputProbingInitialConcurrency` needing to be initialized before
`throughputProbingMinConcurrency` and `throughputProbingMaxConcurrency` as an example in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that
differs from the above cases, please do the following steps:
If you have a parameter that depends on another parameter being generated (see
`throughputProbingInitialConcurrency` needing to be initialized before
`throughputProbingMinConcurrency` and `throughputProbingMaxConcurrency` as an example in
[mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that differs from the above cases,
please do the following steps:
1. Add the parameter and the needed information to [config_fuzzer_limits.py](./config_fuzzer_limits.py) (ensure to correctly add to the `mongod` or `mongos` sub-dictionary), including `"custom_fuzz_value_assignment": True` to indicate it requires special handling
1. Add the parameter and the needed information to
[config_fuzzer_limits.py](./config_fuzzer_limits.py) (ensure to correctly add to the `mongod` or
`mongos` sub-dictionary), including `"custom_fuzz_value_assignment": True` to indicate it
requires special handling
In [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py):
2. Add the parameter's special handling in `generate_special_mongod_startup_parameters()` or `generate_special_mongos_startup_parameters()` for startup parameters, or `generate_special_runtime_parameters()` for runtime parameters
2. Add the parameter's special handling in `generate_special_mongod_startup_parameters()` or
`generate_special_mongos_startup_parameters()` for startup parameters, or
`generate_special_runtime_parameters()` for runtime parameters
> Note: Parameter dependencies (where one parameter's value constrains another) are currently only supported for startup fuzzing. Runtime fuzzing handles parameters individually.
> Note: Parameter dependencies (where one parameter's value constrains another) are currently only
> supported for startup fuzzing. Runtime fuzzing handles parameters individually.
If you add a flow control parameter, please add the the parameter's name to `flow_control_params` in `generate_mongod_parameters`.
If you add a flow control parameter, please add the the parameter's name to `flow_control_params` in
`generate_mongod_parameters`.
> Note: The main distinction between min/max vs. lower-bound/upper_bound is there is some transformation involving the lower and upper bounds,
> while the min/max should be the true min/max of the parameters. You should also include the true min/max of the parameter so this can be logged.
> If the min/max is not inclusive, this is added as a note above the parameter.
> Note: The main distinction between min/max vs. lower-bound/upper_bound is there is some
> transformation involving the lower and upper bounds, while the min/max should be the true min/max
> of the parameters. You should also include the true min/max of the parameter so this can be
> logged. If the min/max is not inclusive, this is added as a note above the parameter.
### Adding new WiredTiger parameters
WiredTiger parameters and their properties (e.g. min, max, default) are stored in [config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py).
WiredTiger parameters and their properties (e.g. min, max, default) are stored in
[config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py).
> These _can not_ be fuzzed with the [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py) hook because they are only set on startup (these parameters are used in the wt configuration string).
> These _can not_ be fuzzed with the
> [FuzzRuntimeParameters](../../../buildscripts/resmokelib/testing/hooks/fuzz_runtime_parameters.py)
> hook because they are only set on startup (these parameters are used in the wt configuration
> string).
Below is a list of ways to fuzz configs which are supported without having to also change [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py).
Please ensure that you add it correctly to the `wt` (eviction parameters) or `wt_table` subdictionary.
Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have as a value.
The parameters are added in order of priority chosen in the if-elif-else statement in `generate_normal_wt_parameters()` in
Below is a list of ways to fuzz configs which are supported without having to also change
[mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py).
Please ensure that you add it correctly to the `wt` (eviction parameters) or `wt_table`
subdictionary.
Let `choices = [choice1, choice2, ..., choiceN]` be an array of choices that the parameter can have
as a value.
The parameters are added in order of priority chosen in the if-elif-else statement in
`generate_normal_wt_parameters()` in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py).
1. `param = rng.choices(choices)`, where choices is an array
Add:
@ -281,25 +345,32 @@ The parameters are added in order of priority chosen in the if-elif-else stateme
"param": {"min": min, "max": max}
```
If you have a parameter that depends on another parameter being generated (see `eviction_target` needing to be initialized before
`eviction_trigger` as an example in [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that differs from the above cases,
If you have a parameter that depends on another parameter being generated (see `eviction_target`
needing to be initialized before `eviction_trigger` as an example in
[mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py)) or behavior that differs from the above cases,
please do the following steps:
1. Add the parameter and the needed information to [config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py) (ensure to correctly add to the `wt` or `wt_table` sub-dictionary)
1. Add the parameter and the needed information to
[config_fuzzer_wt_limits.py](./config_fuzzer_wt_limits.py) (ensure to correctly add to the `wt`
or `wt_table` sub-dictionary)
In [mongo_fuzzer_configs.py](./mongo_fuzzer_configs.py):
2. Add the parameter to `excluded_normal_params` in `generate_eviction_configs()` or `generate_table_configs()`
3. Add the parameter's special handling in `generate_special_eviction_configs()` or `generate_special_table_configs()`
2. Add the parameter to `excluded_normal_params` in `generate_eviction_configs()` or
`generate_table_configs()`
3. Add the parameter's special handling in `generate_special_eviction_configs()` or
`generate_special_table_configs()`
> The main distinction between min/max vs. lower-bound/upper_bound is there is some transformation involving the lower and upper bounds,
> while the min/max should be the true min/max of the parameters. You should also include the true min/max of the parameter so this can be logged.
> If the min/max is not inclusive, this is added as a note above the parameter.
> The main distinction between min/max vs. lower-bound/upper_bound is there is some transformation
> involving the lower and upper bounds, while the min/max should be the true min/max of the
> parameters. You should also include the true min/max of the parameter so this can be logged. If
> the min/max is not inclusive, this is added as a note above the parameter.
## Exclusions
- `jstests/libs/override_methods/config_fuzzer_incompatible_commands.js`
- These commands are too impactful to run with the config fuzzer
- The `does_not_support_config_fuzzer` jstest tag
- Tests with this tag may manually specify server parameters modified by the fuzzer or read global state that is modified in some way by the fuzzer.
- Tests with this tag may manually specify server parameters modified by the fuzzer or read global
state that is modified in some way by the fuzzer.
- Just because a test is failing does not mean it is incompatible with the config fuzzer.

View File

@ -3,7 +3,9 @@
There are two main ways of running the core analyzer.
1. Running the core analyzer with local core dumps and binaries.
2. Running the core analyzer with core dumps and binaries from an evergreen task. Note that some analysis might fail if you are not on the same AMI (Amazon Machine Image) that the task was run on.
2. Running the core analyzer with core dumps and binaries from an evergreen task. Note that some
analysis might fail if you are not on the same AMI (Amazon Machine Image) that the task was run
on.
To run the core analyzer with local core dumps and binaries:
@ -11,7 +13,9 @@ To run the core analyzer with local core dumps and binaries:
python3 buildscripts/resmoke.py core-analyzer
```
This will look for binaries in the build/install directory, and it will look for core dumps in the current directory. If your local environment is different you can include `--install-dir` and `--core-dir` in your invocation to specify other locations.
This will look for binaries in the build/install directory, and it will look for core dumps in the
current directory. If your local environment is different you can include `--install-dir` and
`--core-dir` in your invocation to specify other locations.
To run the core analyzer with core dumps and binaries from an evergreen task:
@ -19,11 +23,15 @@ To run the core analyzer with core dumps and binaries from an evergreen task:
python3 buildscripts/resmoke.py core-analyzer --task-id={task_id}
```
This will download all of the core dumps and binaries from the task and put them into the configured `--working-dir`, this defaults to the `core-analyzer` directory.
This will download all of the core dumps and binaries from the task and put them into the configured
`--working-dir`, this defaults to the `core-analyzer` directory.
All of the task analysis will be added to the `analysis` directory inside the configured `--working-dir`.
All of the task analysis will be added to the `analysis` directory inside the configured
`--working-dir`.
Note: Currently the core analyzer only runs on linux. Windows uses the legacy hang analyzer but will be switched over when we run into issues or have time to do the transition. We have not tackled the problem of getting core dumps on macOS so we have no core dump analysis on that operating system.
Note: Currently the core analyzer only runs on linux. Windows uses the legacy hang analyzer but will
be switched over when we run into issues or have time to do the transition. We have not tackled the
problem of getting core dumps on macOS so we have no core dump analysis on that operating system.
### Getting core dumps
@ -37,28 +45,33 @@ sequenceDiagram
Hang Analyzer ->> Core Dumps: Attach to pid and generate core dumps
```
When a task times out, it hits the [timeout](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2694) section in the defined evergreen config.
In this timeout section, we run [this](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2302) task which runs the hang-analyzer with the following invocation:
When a task times out, it hits the
[timeout](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2694)
section in the defined evergreen config. In this timeout section, we run
[this](https://github.com/mongodb/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2302)
task which runs the hang-analyzer with the following invocation:
```
python3 buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python
```
This tells the hang-analyzer to look for all of the python processes (we are specifically looking for resmoke) on the machine and to signal them.
When resmoke is [signaled](https://github.com/mongodb/mongo/blob/08a99b15eea7ae0952b2098710d565dd7f709ff6/buildscripts/resmokelib/sighandler.py#L25), it again invokes the hang analyzer with the specific pids of it's child processes.
It will look similar to this most of the time:
This tells the hang-analyzer to look for all of the python processes (we are specifically looking
for resmoke) on the machine and to signal them. When resmoke is
[signaled](https://github.com/mongodb/mongo/blob/08a99b15eea7ae0952b2098710d565dd7f709ff6/buildscripts/resmokelib/sighandler.py#L25),
it again invokes the hang analyzer with the specific pids of it's child processes. It will look
similar to this most of the time:
```
python3 buildscripts/resmoke.py hang-analyzer -o file -o stdout -k -c -d pid1,pid2,pid3
```
The things to note here are the `-k` which kills the process and `-c` which takes core dumps.
The resulting core dumps are put into the current running directory.
The things to note here are the `-k` which kills the process and `-c` which takes core dumps. The
resulting core dumps are put into the current running directory.
#### When a test times out
An optional test timeout (`--testTimeout=N` seconds) can be used when running resmoke that will run the hang-analyzer on all processes related to that test.
When a test times out, it will analyze:
An optional test timeout (`--testTimeout=N` seconds) can be used when running resmoke that will run
the hang-analyzer on all processes related to that test. When a test times out, it will analyze:
- The proccess the testcase created.
- Any child of the testcase process.
@ -75,23 +88,31 @@ When a test times out, it will analyze:
| |-mongo (ENV_MARKER=2, pgid 9)
```
Caution: Should a process be created in a new process group as `bar` is in the above example, it may be missed on MacOS. If `foo` crashes/exits, `bar` is orphaned and reparented to the `init` process. It is no longer a "child" and it is not generally possible to read environment variables of arbitrary processes on MacOS with System Integrity Protection (SIP) enabled.
Caution: Should a process be created in a new process group as `bar` is in the above example, it may
be missed on MacOS. If `foo` crashes/exits, `bar` is orphaned and reparented to the `init` process.
It is no longer a "child" and it is not generally possible to read environment variables of
arbitrary processes on MacOS with System Integrity Protection (SIP) enabled.
#### When a task fails normally
When a task fails normally, core dumps may also be generated by the linux kernel and put into the working directory.
When a task fails normally, core dumps may also be generated by the linux kernel and put into the
working directory.
#### Note on archival/upload in Evergreen
We use a non-standard way of uploading core dumps to evergreen due to [timeout issues](https://jira.mongodb.org/browse/SERVER-73171) we were facing when archiving and uploading them normally through evergreen commands.
After investigation of the above issue, we found that compressing and uploading core dumps was slow for a couple reasons:
We use a non-standard way of uploading core dumps to evergreen due to
[timeout issues](https://jira.mongodb.org/browse/SERVER-73171) we were facing when archiving and
uploading them normally through evergreen commands. After investigation of the above issue, we found
that compressing and uploading core dumps was slow for a couple reasons:
1. Tarring all of the core dumps into one file takes up a lot of disk IO and disk IO was the bottleneck.
1. Tarring all of the core dumps into one file takes up a lot of disk IO and disk IO was the
bottleneck.
2. Gzip is single threaded.
3. Uploading a big file synchronously is not fast.
We made a [script](https://github.com/mongodb/mongo/blob/master/buildscripts/fast_archive.py) that gzips all of the core dumps in parallel and uploads them to S3 individually asynchronously.
This solved all of the problems listed above.
We made a [script](https://github.com/mongodb/mongo/blob/master/buildscripts/fast_archive.py) that
gzips all of the core dumps in parallel and uploads them to S3 individually asynchronously. This
solved all of the problems listed above.
### Generating the core analyzer task
@ -104,18 +125,26 @@ sequenceDiagram
Generated Task ->> Core Analyzer Output: Overwrite output with<br/> core dump analysis
```
In the [post task](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2665) section, we [define](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2184) the evergreen function used to generate the core analyzer task.
This [script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/buildscripts/resmokelib/hang_analyzer/gen_hang_analyzer_tasks.py) runs on every task (passing or failing) and is independent of anything else that happened prior in the task and does all of the checks to ensure it should run.
These checks include:
In the
[post task](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2665)
section, we
[define](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2184)
the evergreen function used to generate the core analyzer task. This
[script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/buildscripts/resmokelib/hang_analyzer/gen_hang_analyzer_tasks.py)
runs on every task (passing or failing) and is independent of anything else that happened prior in
the task and does all of the checks to ensure it should run. These checks include:
1. The task is being run on an operating system supported by the core analyzer.
2. The task has any core dumps uploaded and attached to it.
3. At least one of the binaries uploaded is from a binary we know how to process.
The output from this script is a json file in the format evergreen expects.
We then pass this json file into the `generate.tasks` evergreen command to generate the task.
The output from this script is a json file in the format evergreen expects. We then pass this json
file into the `generate.tasks` evergreen command to generate the task.
After the task is generated, we have [another script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2213) that finds the task that was just generated and attaches it to the current task being ran.
After the task is generated, we have
[another script](https://github.com/mongodb/mongo/blob/709e3f4efc04b42e5d29a8ad2417a01d3610fc3f/etc/evergreen_yml_components/definitions.yml#L2213)
that finds the task that was just generated and attaches it to the current task being ran.
The reason we upload a temporary file to the original task is to attach that s3 file link to the task.
Evergreen does not currently have a way to attach files to a task after it was ran so we need to upload something while the original task is in progress.
The reason we upload a temporary file to the original task is to attach that s3 file link to the
task. Evergreen does not currently have a way to attach files to a task after it was ran so we need
to upload something while the original task is in progress.

View File

@ -1,17 +1,15 @@
# Powercycle README
Power cycling is the process of turning hardware off and then turning it on again.
Powercycle test is designed to work across two machines, one machine is a "server"
that controls and monitors the workflow and a "client" that runs Mongo server and
is remotely crashed by "server" regularly.
Power cycling is the process of turning hardware off and then turning it on again. Powercycle test
is designed to work across two machines, one machine is a "server" that controls and monitors the
workflow and a "client" that runs Mongo server and is remotely crashed by "server" regularly.
In evergreen the localhost that runs the task acts as a "server" and the remote
host which is created by `host.create` evergreen command acts as a "client".
In evergreen the localhost that runs the task acts as a "server" and the remote host which is
created by `host.create` evergreen command acts as a "client".
Powercycle test is the part of resmoke. Python 3.13+ with python venv is required to
run the resmoke (python3 from [mongodbtoolchain](http://mongodbtoolchain.build.10gen.cc/)
is highly recommended). Python venv can be set up by running in the root mongo repo
directory:
Powercycle test is the part of resmoke. Python 3.13+ with python venv is required to run the resmoke
(python3 from [mongodbtoolchain](http://mongodbtoolchain.build.10gen.cc/) is highly recommended).
Python venv can be set up by running in the root mongo repo directory:
```
python3 -m venv python3-venv
@ -48,20 +46,18 @@ buildscripts/resmokelib/powercycle/__init__.py
### Set up EC2 instance
1. `Evergreen host.create command` - in Evergreen the remote host is created with
the same distro as the localhost runs and some initial connections are made to ensure
it's up before further steps
2. `Resmoke powercycle setup-host command` - prepares remote host via ssh to run
the powercycle test:
1. `Evergreen host.create command` - in Evergreen the remote host is created with the same distro as
the localhost runs and some initial connections are made to ensure it's up before further steps
2. `Resmoke powercycle setup-host command` - prepares remote host via ssh to run the powercycle
test:
```
python buildscripts/resmoke.py powercycle setup-host
```
Powercycle setup-host operations are located in
`buildscripts/resmokelib/powercycle/setup/__init__.py`.
`expansions.yml` file is used to load the configuration to run operations which is
created by `expansions.write` command in Evergreen.
`buildscripts/resmokelib/powercycle/setup/__init__.py`. `expansions.yml` file is used to load the
configuration to run operations which is created by `expansions.write` command in Evergreen.
It runs several operations via ssh:
@ -69,12 +65,12 @@ It runs several operations via ssh:
- copy `buildscripts` and `mongoDB executables` from localhost to the remote host
- set up python venv on the remote host
- set up curator to collect system & process stats on the remote host
- install [NotMyFault](https://docs.microsoft.com/en-us/sysinternals/downloads/notmyfault)
to crash Windows (only on Windows)
- install [NotMyFault](https://docs.microsoft.com/en-us/sysinternals/downloads/notmyfault) to crash
Windows (only on Windows)
Remote operation via ssh implementation is located in
`buildscripts/resmokelib/powercycle/lib/remote_operations.py`.
The following operations are supported:
`buildscripts/resmokelib/powercycle/lib/remote_operations.py`. The following operations are
supported:
- `copy_to` - copy files from the localhost to the remote host
- `copy_from` - copy files from the remote host to the localhost
@ -82,9 +78,8 @@ The following operations are supported:
### Run powercycle test
`Resmoke powercycle run command` - runs the powercycle test on the localhost
which runs remote operations on the remote host via ssh and local validation
checks:
`Resmoke powercycle run command` - runs the powercycle test on the localhost which runs remote
operations on the remote host via ssh and local validation checks:
```
python buildscripts/resmoke.py powercycle run \
@ -95,26 +90,26 @@ python buildscripts/resmoke.py powercycle run \
###### Resmoke powercycle run arguments
The arguments for resmoke powercycle run command are defined in `add_subcommand()`
function in `buildscripts/resmokelib/powercycle/__init__.py`. When powercycle test
runs remote operations on the remote host it calls the copied version of this script
on the remote host. Thus, some resmoke powercycle run command arguments are needed
for the remote call and shouldn't be used when calling the script on the localhost.
The arguments for resmoke powercycle run command are defined in `add_subcommand()` function in
`buildscripts/resmokelib/powercycle/__init__.py`. When powercycle test runs remote operations on the
remote host it calls the copied version of this script on the remote host. Thus, some resmoke
powercycle run command arguments are needed for the remote call and shouldn't be used when calling
the script on the localhost.
`--taskName` argument is used to get powercycle task configurations that are stored
in `buildscripts/resmokeconfig/powercycle/powercycle_tasks.yml`
`--taskName` argument is used to get powercycle task configurations that are stored in
`buildscripts/resmokeconfig/powercycle/powercycle_tasks.yml`
There is a known issue with `--setParameter` mongod options incorrectly processed
from `mongod_options` that is described in [SERVER-47621](https://jira.mongodb.org/browse/SERVER-47621)
There is a known issue with `--setParameter` mongod options incorrectly processed from
`mongod_options` that is described in [SERVER-47621](https://jira.mongodb.org/browse/SERVER-47621)
###### Powercycle test implementation
The powercycle test main implementation is located in `main()` function in
`buildscripts/resmokelib/powercycle/powercycle.py`.
The value of `--remoteOperation` argument is used to distinguish if we are running the script
on the localhost or on the remote host.
`remote_handler()` function performs the following remote operations:
The value of `--remoteOperation` argument is used to distinguish if we are running the script on the
localhost or on the remote host. `remote_handler()` function performs the following remote
operations:
- `noop` - do nothing
- `crash_server` - internally crash the server
@ -157,17 +152,17 @@ When running on localhost the powercycle test loops do the following steps:
### Save diagnostics
`Resmoke powercycle save-diagnostics command` - copies powercycle diagnostics
files from the remote host to the localhost (mainly used by Evergreen):
`Resmoke powercycle save-diagnostics command` - copies powercycle diagnostics files from the remote
host to the localhost (mainly used by Evergreen):
```
python buildscripts/resmoke.py powercycle save-diagnostics
```
Powercycle save-diagnostics operations are located in
`buildscripts/resmokelib/powercycle/save_diagnostics/__init__.py`.
`expansions.yml` file is used to load the configuration to run operations which is
created by `expansions.write` command in Evergreen.
`buildscripts/resmokelib/powercycle/save_diagnostics/__init__.py`. `expansions.yml` file is used to
load the configuration to run operations which is created by `expansions.write` command in
Evergreen.
It runs several operations via ssh:
@ -188,15 +183,14 @@ It runs several operations via ssh:
### Remote hang analyzer (optional)
`Resmoke powercycle remote-hang-analyzer command` - runs hang analyzer on the
remote host (mainly used by Evergreen):
`Resmoke powercycle remote-hang-analyzer command` - runs hang analyzer on the remote host (mainly
used by Evergreen):
```
$python buildscripts/resmoke.py powercycle remote-hang-analyzer
```
Powercycle remote-hang-analyzer command calls resmoke hang analyzer on the
remote host and is located in
`buildscripts/resmokelib/powercycle/remote_hang_analyzer/__init__.py`
`expansions.yml` file is used to load the configuration to run this command which is
created by `expansions.write` command in Evergreen.
Powercycle remote-hang-analyzer command calls resmoke hang analyzer on the remote host and is
located in `buildscripts/resmokelib/powercycle/remote_hang_analyzer/__init__.py` `expansions.yml`
file is used to load the configuration to run this command which is created by `expansions.write`
command in Evergreen.

View File

@ -4,24 +4,39 @@ Fixtures define a specific topology that tests run against.
## Supported Fixtures
Specify any of the following as the `fixture` in your [Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config:
Specify any of the following as the `fixture` in your
[Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config:
- [`BulkWriteFixture`](./bulk_write.py) - Fixture which provides JSTests with a set of clusters to run tests against.
- [`ExternalFixture`](./external.py) - Fixture which provides JSTests capability to connect to external (non-resmoke) cluster.
- [`ExternalShardedClusterFixture`](./shardedcluster.py) - Fixture to interact with external sharded cluster fixture.
- [`MongoDFixture`](./standalone.py) - Fixture which provides JSTests with a standalone mongod to run against.
- [`MongoTFixture`](./mongot.py) - Fixture which provides JSTests with a mongot to run alongside a mongod.
- [`MultiReplicaSetFixture`](./multi_replica_set.py) - Fixture which provides JSTests with a set of replica sets to run against.
- [`MultiShardedClusterFixture`](./multi_sharded_cluster.py) - Fixture which provides JSTests with a set of sharded clusters to run against.
- [`ReplicaSetFixture`](./replicaset.py) - Fixture which provides JSTests with a replica set to run against.
- [`ShardedClusterFixture`](./shardedcluster.py) - Fixture which provides JSTests with a sharded cluster to run against.
- Used when the MongoDB deployment is started by the JavaScript test itself with `MongoRunner`, `ReplSetTest`, or `ShardingTest`.
- [`YesFixture`](./yesfixture.py) - Fixture which spawns several `yes` executables to generate lots of log messages.
- [`BulkWriteFixture`](./bulk_write.py) - Fixture which provides JSTests with a set of clusters to
run tests against.
- [`ExternalFixture`](./external.py) - Fixture which provides JSTests capability to connect to
external (non-resmoke) cluster.
- [`ExternalShardedClusterFixture`](./shardedcluster.py) - Fixture to interact with external sharded
cluster fixture.
- [`MongoDFixture`](./standalone.py) - Fixture which provides JSTests with a standalone mongod to
run against.
- [`MongoTFixture`](./mongot.py) - Fixture which provides JSTests with a mongot to run alongside a
mongod.
- [`MultiReplicaSetFixture`](./multi_replica_set.py) - Fixture which provides JSTests with a set of
replica sets to run against.
- [`MultiShardedClusterFixture`](./multi_sharded_cluster.py) - Fixture which provides JSTests with a
set of sharded clusters to run against.
- [`ReplicaSetFixture`](./replicaset.py) - Fixture which provides JSTests with a replica set to run
against.
- [`ShardedClusterFixture`](./shardedcluster.py) - Fixture which provides JSTests with a sharded
cluster to run against.
- Used when the MongoDB deployment is started by the JavaScript test itself with `MongoRunner`,
`ReplSetTest`, or `ShardingTest`.
- [`YesFixture`](./yesfixture.py) - Fixture which spawns several `yes` executables to generate lots
of log messages.
## Interfaces
- [`Fixture`](./interface.py) - Base class for all fixtures.
- [`MultiClusterFixture`](./interface.py) - Base class for fixtures that may consist of multiple independent participant clusters.
- The participant clusters can function independently without coordination, but are bound together only for some duration as they participate in some process such as a migration. The participant clusters are fixtures themselves.
- [`MultiClusterFixture`](./interface.py) - Base class for fixtures that may consist of multiple
independent participant clusters.
- The participant clusters can function independently without coordination, but are bound together
only for some duration as they participate in some process such as a migration. The participant
clusters are fixtures themselves.
- [`NoOpFixture`](./interface.py) - A Fixture implementation that does not start any servers.
- [`ReplFixture`](./interface.py) - Base class for all fixtures that support replication.

View File

@ -4,84 +4,145 @@ Hooks are a mechanism to run routines _around_ the tests, at the test content bo
## Supported hooks
Specify any of the following as the `hooks` in your [Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config:
Specify any of the following as the `hooks` in your
[Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config:
- [`AnalyzeShardKeysInBackground`](./analyze_shard_key.py) - A hook for running `analyzeShardKey` commands while a test is running.
- [`AntithesisLogging`](./antithesis_logging.py) - Prints antithesis commands before & after test run.
- [`AnalyzeShardKeysInBackground`](./analyze_shard_key.py) - A hook for running `analyzeShardKey`
commands while a test is running.
- [`AntithesisLogging`](./antithesis_logging.py) - Prints antithesis commands before & after test
run.
- [`BackgroundInitialSync`](./initialsync.py) - Background Initial Sync
- After every test, this hook checks if a background node has finished initial sync and if so validates it, tears it down, and restarts it.
- This test accepts a parameter `n` that specifies a number of tests after which it will wait for replication to finish before validating and restarting the initial sync node.
- This requires the ReplicaSetFixture to be started with `start_initial_sync_node=True`. If used at the same time as `CleanEveryN`, the `n` value passed to this hook should be equal to the `n` value for `CleanEveryN`.
- [`CheckClusterIndexConsistency`](./cluster_index_consistency.py) - Checks that indexes are the same across chunks for the same collections.
- [`CheckMetadataConsistencyInBackground`](./metadata_consistency) - Check the metadata consistency of a sharded cluster.
- [`CheckOrphansDeleted`](./orphans.py) - Check if the range deleter failed to delete any orphan documents.
- [`CheckReplDBHashInBackground`](./dbhash_background.py) - A hook for comparing the dbhashes of all replica set members while a test is running.
- After every test, this hook checks if a background node has finished initial sync and if so
validates it, tears it down, and restarts it.
- This test accepts a parameter `n` that specifies a number of tests after which it will wait for
replication to finish before validating and restarting the initial sync node.
- This requires the ReplicaSetFixture to be started with `start_initial_sync_node=True`. If used
at the same time as `CleanEveryN`, the `n` value passed to this hook should be equal to the `n`
value for `CleanEveryN`.
- [`CheckClusterIndexConsistency`](./cluster_index_consistency.py) - Checks that indexes are the
same across chunks for the same collections.
- [`CheckMetadataConsistencyInBackground`](./metadata_consistency) - Check the metadata consistency
of a sharded cluster.
- [`CheckOrphansDeleted`](./orphans.py) - Check if the range deleter failed to delete any orphan
documents.
- [`CheckReplDBHashInBackground`](./dbhash_background.py) - A hook for comparing the dbhashes of all
replica set members while a test is running.
- [`CheckReplDBHash`](./dbhash.py) - Check if the dbhashes match.
- [`CheckReplOplogs`](./oplog.py) - Check that `local.oplog.rs` matches on the primary and secondaries.
- [`CheckReplPreImagesConsistency`](./preimages_consistency.py) - Check that `config.system.preimages` is consistent between the primary and secondaries.
- [`CheckRoutingTableConsistency`](./routing_table_consistency.py) - Verifies the absence of corrupted entries in config.chunks and config.collections.
- [`CheckShardFilteringMetadata`](./shard_filtering_metadata.py) - Inspect filtering metadata on shards
- [`CheckReplOplogs`](./oplog.py) - Check that `local.oplog.rs` matches on the primary and
secondaries.
- [`CheckReplPreImagesConsistency`](./preimages_consistency.py) - Check that
`config.system.preimages` is consistent between the primary and secondaries.
- [`CheckRoutingTableConsistency`](./routing_table_consistency.py) - Verifies the absence of
corrupted entries in config.chunks and config.collections.
- [`CheckShardFilteringMetadata`](./shard_filtering_metadata.py) - Inspect filtering metadata on
shards
- [`CleanEveryN`](./cleanup.py) - Restart the fixture after it has ran `n` tests.
- [`CleanupConcurrencyWorkloads`](./cleanup_concurrency_workloads.py) - Drop all databases, except those that have been excluded.
- For concurrency tests that run on different DBs, drop all databases except ones in `exclude_dbs`. For tests that run on the same DB, drop all databases except ones in `exclude_dbs` and the DB used by the test/workloads. For tests that run on the same collection, drop all collections in all databases except for `exclude_dbs` and the collection used by the test/workloads.
- [`CleanupConcurrencyWorkloads`](./cleanup_concurrency_workloads.py) - Drop all databases, except
those that have been excluded.
- For concurrency tests that run on different DBs, drop all databases except ones in
`exclude_dbs`. For tests that run on the same DB, drop all databases except ones in
`exclude_dbs` and the DB used by the test/workloads. For tests that run on the same collection,
drop all collections in all databases except for `exclude_dbs` and the collection used by the
test/workloads.
- On mongod-related fixtures, this will clear the dbpath
- [`ClusterParameter`](./cluster_parameter.py) - Sets the specified cluster server parameter.
- [`ContinuousAddRemoveShard`](./add_remove_shards.py) - Continuously adds and removes shards at regular intervals. If running with `configsvr` transitions, will transition in/out of config shard mode.
- [`ContinuousInitialSync`](./continuous_initial_sync.py) - Periodically initial sync nodes then step them up.
- [`ContinuousStepdown`](./stepdown.py) - regularly connect to replica sets and send a `replSetStepDown` command.
- [`ContinuousTransition`](./replicaset_transition_to_and_from_csrs.py) - connects to replica sets and transitions them from replica set to CSRS node in the background.
- [`DoReconfigInBackground`](./reconfig_background.py) - A hook for running a safe reconfig against a replica set while a test is running.
- [`DropConfigCacheCollections`](./drop_config_cache_collections.py) - A hook for dropping random entries of config.cache.collections in shards.
- [`DropSessionsCollection`](./drop_sessions_collection.py) - A hook for dropping and recreating config.system.sessions while tests are running.
- [`ContinuousAddRemoveShard`](./add_remove_shards.py) - Continuously adds and removes shards at
regular intervals. If running with `configsvr` transitions, will transition in/out of config shard
mode.
- [`ContinuousInitialSync`](./continuous_initial_sync.py) - Periodically initial sync nodes then
step them up.
- [`ContinuousStepdown`](./stepdown.py) - regularly connect to replica sets and send a
`replSetStepDown` command.
- [`ContinuousTransition`](./replicaset_transition_to_and_from_csrs.py) - connects to replica sets
and transitions them from replica set to CSRS node in the background.
- [`DoReconfigInBackground`](./reconfig_background.py) - A hook for running a safe reconfig against
a replica set while a test is running.
- [`DropConfigCacheCollections`](./drop_config_cache_collections.py) - A hook for dropping random
entries of config.cache.collections in shards.
- [`DropSessionsCollection`](./drop_sessions_collection.py) - A hook for dropping and recreating
config.system.sessions while tests are running.
- [`DropUserCollections`](./drop_user_collections.py) - Drops all user collections.
- [`EnableSpuriousWriteConflicts`](./enable_spurious_write_conflicts.py) - Toggles write conflicts.
- [`FCVUpgradeDowngradeInBackground`](./fcv_upgrade_downgrade.py) - A hook to run background FCV upgrade and downgrade against test servers while a test is running.
- [`FuzzRuntimeParameters`](./fuzz_runtime_parameters.py) - Regularly connect to nodes and sends them a `setParameter` command; uses the [Config Fuzzer](../../../../buildscripts/resmokelib/generate_fuzz_config/README.md).
- [`FuzzRuntimeStress`](./fuzz_runtime_stress.py) - Test hook that periodically changes the amount of stress the system is experiencing.
- [`FCVUpgradeDowngradeInBackground`](./fcv_upgrade_downgrade.py) - A hook to run background FCV
upgrade and downgrade against test servers while a test is running.
- [`FuzzRuntimeParameters`](./fuzz_runtime_parameters.py) - Regularly connect to nodes and sends
them a `setParameter` command; uses the
[Config Fuzzer](../../../../buildscripts/resmokelib/generate_fuzz_config/README.md).
- [`FuzzRuntimeStress`](./fuzz_runtime_stress.py) - Test hook that periodically changes the amount
of stress the system is experiencing.
- [`FuzzerRestoreSettings`](./fuzzer_restore_settings.py) - Cleans up unwanted changes from fuzzer.
- [`GenerateAndCheckPerfResults`](./generate_and_check_perf_results.py) - Combine JSON results from individual benchmarks and check their reported values against any thresholds set for them.
- Combines test results from individual benchmark files to a single file. This is useful for generating the json file to feed into the Evergreen performance visualization plugin.
- [`GenerateAndCheckPerfResults`](./generate_and_check_perf_results.py) - Combine JSON results from
individual benchmarks and check their reported values against any thresholds set for them.
- Combines test results from individual benchmark files to a single file. This is useful for
generating the json file to feed into the Evergreen performance visualization plugin.
- [`HelloDelays`](./hello_failures.py) - Sets Hello fault injections.
- [`IntermediateInitialSync`](./initialsync.py) - Intermediate Initial Sync
- This hook accepts a parameter `n` that specifies a number of tests after which it will start up a node to initial sync, wait for replication to finish, and then validate the data.
- This hook accepts a parameter `n` that specifies a number of tests after which it will start up
a node to initial sync, wait for replication to finish, and then validate the data.
- This requires the ReplicaSetFixture to be started with 'start_initial_sync_node=True'.
- [`LagOplogApplicationInBackground`](./secondary_lag.py) - Toggles secondary oplog application lag.
- [`LibfuzzerHook`](./cpp_libfuzzer.py) - Merges inputs after a fuzzer run.
- [`MagicRestoreEveryN`](./magic_restore.py) - Open a backup cursor and run magic restore process after `n` tests have run.
- [`MagicRestoreEveryN`](./magic_restore.py) - Open a backup cursor and run magic restore process
after `n` tests have run.
- Requires the use of `MagicRestoreFixture`.
- [`PeriodicKillSecondaries`](./periodic_kill_secondaries.py) - Periodically kills the secondaries in a replica set.
- Also verifies that the secondaries can reach the SECONDARY state without having connectivity to the primary after an unclean shutdown.
- [`PeriodicStackTrace`](./periodic_stack_trace.py) - Test hook that sends the stacktracing signal to mongo processes at randomized intervals.
- [`QueryableServerHook`](./queryable_server_hook.py) - Starts the queryable server before each test for queryable restores. Restarts the queryable server between tests.
- [`RotateExecutionControlParams`](./rotate_execution_control_params.py) - Periodically rotates 'executionControlConcurrencyAdjustmentAlgorithm' and deprioritization server parameters to random valid values.
- [`RunChangeStreamsInBackground`](./change_streams.py) - Run in the background full cluster change streams while a test is running. Open and close the change stream every `1..10` tests (random using `config.RANDOM_SEED`).
- [`RunDBCheckInBackground`](./dbcheck_background.py) - A hook for running `dbCheck` on a replica set while a test is running.
- This includes dbhashes for all non-local databases and non-replicated system collections that match on the primary and secondaries.
- It also will check the performance results against any thresholds that are set for each benchmark. If no thresholds are set for a test, this hook should always pass.
- [`RunQueryStats`](./run_query_stats.py) - Runs `$queryStats` after every test, and clears the query stats store before every test.
- [`PeriodicKillSecondaries`](./periodic_kill_secondaries.py) - Periodically kills the secondaries
in a replica set.
- Also verifies that the secondaries can reach the SECONDARY state without having connectivity to
the primary after an unclean shutdown.
- [`PeriodicStackTrace`](./periodic_stack_trace.py) - Test hook that sends the stacktracing signal
to mongo processes at randomized intervals.
- [`QueryableServerHook`](./queryable_server_hook.py) - Starts the queryable server before each test
for queryable restores. Restarts the queryable server between tests.
- [`RotateExecutionControlParams`](./rotate_execution_control_params.py) - Periodically rotates
'executionControlConcurrencyAdjustmentAlgorithm' and deprioritization server parameters to random
valid values.
- [`RunChangeStreamsInBackground`](./change_streams.py) - Run in the background full cluster change
streams while a test is running. Open and close the change stream every `1..10` tests (random
using `config.RANDOM_SEED`).
- [`RunDBCheckInBackground`](./dbcheck_background.py) - A hook for running `dbCheck` on a replica
set while a test is running.
- This includes dbhashes for all non-local databases and non-replicated system collections that
match on the primary and secondaries.
- It also will check the performance results against any thresholds that are set for each
benchmark. If no thresholds are set for a test, this hook should always pass.
- [`RunQueryStats`](./run_query_stats.py) - Runs `$queryStats` after every test, and clears the
query stats store before every test.
- [`SimulateCrash`](./simulate_crash.py) - A hook to simulate crashes.
- [`ValidateCollections`](./validate.py) - Run full validation.
- [`ValidateCollectionsInBackground`](./validate_background.py) - A hook to run background collection validation against test servers while a test is running.
- This will run on all collections in all databases on every stand-alone node, primary replica-set node, or primary shard node.
- [`ValidateDirectSecondaryReads`](./validate_direct_secondary_reads.py) - Only supported in suites that use `ReplicaSetFixture`.
- To be used with `set_read_preference_secondary.js` and `implicit_enable_profiler.js` in suites that read directly from secondaries in a replica set. Check the profiler collections of all databases at the end of the suite to verify that each secondary only ran the read commands it got directly from the shell.
- [`ValidateCollectionsInBackground`](./validate_background.py) - A hook to run background
collection validation against test servers while a test is running.
- This will run on all collections in all databases on every stand-alone node, primary replica-set
node, or primary shard node.
- [`ValidateDirectSecondaryReads`](./validate_direct_secondary_reads.py) - Only supported in suites
that use `ReplicaSetFixture`.
- To be used with `set_read_preference_secondary.js` and `implicit_enable_profiler.js` in suites
that read directly from secondaries in a replica set. Check the profiler collections of all
databases at the end of the suite to verify that each secondary only ran the read commands it
got directly from the shell.
- [`WaitForReplication`](./wait_for_replication.py) - Wait for replication to complete.
## Interfaces
All hooks inherit from the [`buildscripts.resmokelib.testing.hooks.interface.Hook`](./interface.py) parent class and can override any subset of the following empty base methods:
All hooks inherit from the [`buildscripts.resmokelib.testing.hooks.interface.Hook`](./interface.py)
parent class and can override any subset of the following empty base methods:
- `before_suite`
- `before_test`
- `after_test`
- `after_suite`
At least 1 base method must be overridden, otherwise the hook will not do anything at all. During test suite execution, each hook runs its custom logic in the respective scenarios. Some customizable tasks that hooks can perform include: _validating data, deleting data, performing cleanup_, etc.
At least 1 base method must be overridden, otherwise the hook will not do anything at all. During
test suite execution, each hook runs its custom logic in the respective scenarios. Some customizable
tasks that hooks can perform include: _validating data, deleting data, performing cleanup_, etc.
- [`BGHook`](./bghook.py) - A hook that repeatedly calls `run_action()` in a background thread for the duration of the test suite.
- [`DataConsistencyHook`](./jsfile.py) - A hook for running a static JavaScript file that checks data consistency of the server.
- If the mongo shell process running the JavaScript file exits with a non-zero return code, then an `errors.ServerFailure` exception is raised to cause resmoke.py's test execution to stop.
- [`BGHook`](./bghook.py) - A hook that repeatedly calls `run_action()` in a background thread for
the duration of the test suite.
- [`DataConsistencyHook`](./jsfile.py) - A hook for running a static JavaScript file that checks
data consistency of the server.
- If the mongo shell process running the JavaScript file exits with a non-zero return code, then
an `errors.ServerFailure` exception is raised to cause resmoke.py's test execution to stop.
- [`Hook`](./interface.py) - Common interface all Hooks will inherit from.
- [`JSHook`](./jsfile.py) - A hook interface with a static JavaScript file to execute.
- [`PerClusterDataConsistencyHook`](./jsfile.py) - A hook that runs on each independent cluster of the fixture.
- [`PerClusterDataConsistencyHook`](./jsfile.py) - A hook that runs on each independent cluster of
the fixture.
- The independent cluster itself may be another fixture.

View File

@ -1,33 +1,52 @@
# TestCases
TestCases extend Python-based `unittest.TestCase` objects that resmoke can run as different "kinds" of tests.
TestCases extend Python-based `unittest.TestCase` objects that resmoke can run as different "kinds"
of tests.
## Supported TestCases
Specify any of the following as the `test_kind` in your [Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config:
Specify any of the following as the `test_kind` in your
[Suite](../../../../buildscripts/resmokeconfig/suites/README.md) config:
- `all_versions_js_test`: [`AllVersionsJSTestCase`](./jstest.py) - Alias for JSTestCase for multiversion passthrough suites.
- It runs with all combinations of versions of replica sets and sharded clusters. The distinct name is picked up by task generation.
- `all_versions_js_test`: [`AllVersionsJSTestCase`](./jstest.py) - Alias for JSTestCase for
multiversion passthrough suites.
- It runs with all combinations of versions of replica sets and sharded clusters. The distinct
name is picked up by task generation.
- `benchmark_test`: [`BenchmarkTestCase`](./benchmark_test.py) - A Benchmark test to execute.
- `bulk_write_cluster_js_test`: [`BulkWriteClusterTestCase`](./bulk_write_cluster_js_test.py) - A test to execute with connection data for multiple clusters passed through TestData.
- `cpp_integration_test`: [`CPPIntegrationTestCase`](./cpp_integration_test.py) - A C++ integration test to execute.
- `cpp_libfuzzer_test`: [`CPPLibfuzzerTestCase`](./cpp_libfuzzer_test.py) - A C++ libfuzzer test to execute.
- `bulk_write_cluster_js_test`: [`BulkWriteClusterTestCase`](./bulk_write_cluster_js_test.py) - A
test to execute with connection data for multiple clusters passed through TestData.
- `cpp_integration_test`: [`CPPIntegrationTestCase`](./cpp_integration_test.py) - A C++ integration
test to execute.
- `cpp_libfuzzer_test`: [`CPPLibfuzzerTestCase`](./cpp_libfuzzer_test.py) - A C++ libfuzzer test to
execute.
- `cpp_unit_test`: [`CPPUnitTestCase`](./cpp_unittest.py) - A C++ unit test to execute.
- `db_test`: [`DBTestCase`](./dbtest.py) - A dbtest to execute.
- `fsm_workload_test`: [`FSMWorkloadTestCase`](./fsm_workload_test.py) - A wrapper for several copies of a `_SingleFSMWorkloadTestCase` to execute.
- `js_test`: [`JSTestCase`](./jstest.py) - A wrapper for several copies of a `_SingleJSTestCase` to execute
- Around **75% of all suites use the `js_test` kind**. See [jstests/README.md](../../../../jstests/README.md) for specific guidance.
- `fsm_workload_test`: [`FSMWorkloadTestCase`](./fsm_workload_test.py) - A wrapper for several
copies of a `_SingleFSMWorkloadTestCase` to execute.
- `js_test`: [`JSTestCase`](./jstest.py) - A wrapper for several copies of a `_SingleJSTestCase` to
execute
- Around **75% of all suites use the `js_test` kind**. See
[jstests/README.md](../../../../jstests/README.md) for specific guidance.
- `json_schema_test`: [`JSONSchemaTestCase`](./json_schema_test.py) - A JSON Schema test to execute.
- `magic_restore_js_test`: [`MagicRestoreTestCase`](./magic_restore_js_test.py) - A test to execute for running tests in a try/catch block.
- `mongos_test`: [`MongosTestCase`](./mongos_test.py) - A TestCase which runs a mongos binary with the given parameters.
- `multi_stmt_txn_passthrough`: [`MultiStmtTxnTestCase`](./multi_stmt_txn_test.py) - Test case for multi statement transactions.
- `parallel_fsm_workload_test`: [`ParallelFSMWorkloadTestCase`](./fsm_workload_test.py) - An FSM workload to execute.
- `pretty_printer_test`: [`PrettyPrinterTestCase`](./pretty_printer_testcase.py) - A pretty printer test to execute.
- `magic_restore_js_test`: [`MagicRestoreTestCase`](./magic_restore_js_test.py) - A test to execute
for running tests in a try/catch block.
- `mongos_test`: [`MongosTestCase`](./mongos_test.py) - A TestCase which runs a mongos binary with
the given parameters.
- `multi_stmt_txn_passthrough`: [`MultiStmtTxnTestCase`](./multi_stmt_txn_test.py) - Test case for
multi statement transactions.
- `parallel_fsm_workload_test`: [`ParallelFSMWorkloadTestCase`](./fsm_workload_test.py) - An FSM
workload to execute.
- `pretty_printer_test`: [`PrettyPrinterTestCase`](./pretty_printer_testcase.py) - A pretty printer
test to execute.
- `py_test`: [`PyTestCase`](./pytest.py) - A python test to execute.
- `query_tester_self_test`: [`QueryTesterSelfTestCase`](./query_tester_self_test.py) - A QueryTester self-test to execute.
- `query_tester_server_test`: [`QueryTesterServerTestCase`](./query_tester_server_test.py) - A QueryTester server test to execute.
- `sdam_json_test`: [`SDAMJsonTestCase`](./sdam_json_test.py) - Server Discovery and Monitoring JSON test case.
- `server_selection_json_test`: [`ServerSelectionJsonTestCase`](./server_selection_json_test.py) - Server Selection JSON test case.
- `query_tester_self_test`: [`QueryTesterSelfTestCase`](./query_tester_self_test.py) - A QueryTester
self-test to execute.
- `query_tester_server_test`: [`QueryTesterServerTestCase`](./query_tester_server_test.py) - A
QueryTester server test to execute.
- `sdam_json_test`: [`SDAMJsonTestCase`](./sdam_json_test.py) - Server Discovery and Monitoring JSON
test case.
- `server_selection_json_test`: [`ServerSelectionJsonTestCase`](./server_selection_json_test.py) -
Server Selection JSON test case.
- `sleep_test`: [`SleepTestCase`](./sleeptest.py) - SleepTestCase class.
- `tla_plus_test`: [`TLAPlusTestCase`](./tla_plus_test.py) - A TLA+ specification to model-check.
@ -36,26 +55,36 @@ Specify any of the following as the `test_kind` in your [Suite](../../../../buil
Top level interfaces:
- [`TestCase`](./interface.py) - A test case to execute. The `run_test` method must be implemented.
- [`ProcessTestCase`](./interface.py) - Base class for TestCases that executes an external process. The `_make_process` method must be implemented.
- [`ProcessTestCase`](./interface.py) - Base class for TestCases that executes an external process.
The `_make_process` method must be implemented.
Subclasses:
- [`JSRunnerFileTestCase`](./jsrunnerfile.py) - A test case with a static JavaScript runner file to execute.
- [`MultiClientsTestCase`](./jstest.py) - A wrapper for several copies of a SingleTestCase to execute.
- [`JSRunnerFileTestCase`](./jsrunnerfile.py) - A test case with a static JavaScript runner file to
execute.
- [`MultiClientsTestCase`](./jstest.py) - A wrapper for several copies of a SingleTestCase to
execute.
- [`TestCaseFactory`](./interface.py) - Convenience interface to initialize and build test cases
## Fixture TestCases
These are testcases that are used to coordinate fixture lifecycles via resmoke's internal `FixtureTestCaseManager`.
These are testcases that are used to coordinate fixture lifecycles via resmoke's internal
`FixtureTestCaseManager`.
> NOTE This design does lead to seeing "extra" tests in a run, where a fixture sets up, your `N` tests are run, and the fixture tears down, so you see `N+2` "tests" passing via resmoke.
> NOTE This design does lead to seeing "extra" tests in a run, where a fixture sets up, your `N`
> tests are run, and the fixture tears down, so you see `N+2` "tests" passing via resmoke.
- [`FixtureTestCase`](./fixture.py) - Base class for the fixture test cases.
- [`FixtureSetupTestCase`](./fixture.py) - TestCase for setting up a fixture.
- [`FixtureTeardownTestCase`](./fixture.py) - TestCase for tearing down a fixture.
- [`FixtureAbortTestCase`](./fixture.py) - TestCase for killing/aborting a fixture. Intended for use before archiving a failed test.
- When resmoke detects that a test has failed (and [archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is configured), it dynamically generates a new `FixtureAbortTestCase` for immediate execution. This test case sends a `SIGABRT` to each running mongod process.
- [`FixtureAbortTestCase`](./fixture.py) - TestCase for killing/aborting a fixture. Intended for use
before archiving a failed test.
- When resmoke detects that a test has failed (and
[archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is
configured), it dynamically generates a new `FixtureAbortTestCase` for immediate execution.
This test case sends a `SIGABRT` to each running mongod process.
## Testing TestCases
Self-tests for the testcases themselves can be found in [buildscripts/tests/resmokelib/testing/testcases/](../../../../buildscripts/tests/resmokelib/testing/testcases/)
Self-tests for the testcases themselves can be found in
[buildscripts/tests/resmokelib/testing/testcases/](../../../../buildscripts/tests/resmokelib/testing/testcases/)

View File

@ -1,33 +1,55 @@
# S3 Binary
This is a small utility to help safely manage tool binaries that are stored in MongoDB's S3 bucket for the purpose of using in this repository's build, test, or release processes.
This is a small utility to help safely manage tool binaries that are stored in MongoDB's S3 bucket
for the purpose of using in this repository's build, test, or release processes.
### Security
Any time a binary is pulled down from the internet and executed, there is risk that the binary has been modified unintentionally. This tool creates a hash of the binary that the developer is uploads and stores a record of it in a programmatically accessible Python script (see `buildscripts/s3_binary/hashes.py`). When a tool uses the S3 binary, this interface forces a checksum of the binary before the binary is run, verifying the result against the value stored in `hashes.py` and stopping execution if it doesn't match.
Any time a binary is pulled down from the internet and executed, there is risk that the binary has
been modified unintentionally. This tool creates a hash of the binary that the developer is uploads
and stores a record of it in a programmatically accessible Python script (see
`buildscripts/s3_binary/hashes.py`). When a tool uses the S3 binary, this interface forces a
checksum of the binary before the binary is run, verifying the result against the value stored in
`hashes.py` and stopping execution if it doesn't match.
### Hermetic Guarantee
The other risk of relying on a binary stored in S3 is that if the binary is changed, that it will change the results of previously run tests or builds in continuous integration. This is not ideal since there are often cases where an old commit needs to be re-ran to reproduce user issues. Storing the hash in the repository and preventing modifications prevents accidental compatibility breaks of previous commits.
The other risk of relying on a binary stored in S3 is that if the binary is changed, that it will
change the results of previously run tests or builds in continuous integration. This is not ideal
since there are often cases where an old commit needs to be re-ran to reproduce user issues. Storing
the hash in the repository and preventing modifications prevents accidental compatibility breaks of
previous commits.
### Example Usage
Scenario: You have a developer tool called db-contrib-tool that you want to build into a binary, and then use that binary as part of a test process in 10gen/mongo. To use the s3_binary tool you would:
Scenario: You have a developer tool called db-contrib-tool that you want to build into a binary, and
then use that binary as part of a test process in 10gen/mongo. To use the s3_binary tool you would:
1. Create your binaries and put them into a single directory on your local system, ex:
/tmp/db-contrib-tool/db-contrib-tool-v1_windows.exe
/tmp/db-contrib-tool/db-contrib-tool-v1_linux
/tmp/db-contrib-tool/db-contrib-tool-v1_windows.exe /tmp/db-contrib-tool/db-contrib-tool-v1_linux
2. Invoke bazel run buildscripts/s3_binary:upload -- /tmp/db-contrib-tool s3://mdb-build-public/db-contrib-tool/v1
2. Invoke bazel run buildscripts/s3_binary:upload -- /tmp/db-contrib-tool
s3://mdb-build-public/db-contrib-tool/v1
3. Follow the prompts, this will then update your local `buildscripts/s3_binary/hashes.py` file mapping the s3 path of each binary to its sha256 hash.
3. Follow the prompts, this will then update your local `buildscripts/s3_binary/hashes.py` file
mapping the s3 path of each binary to its sha256 hash.
4. Update your test code to call: `download_s3_binary(f"s3://mdb-build-public/db-contrib-tool/v1/db-contrib-tool-v1_{os}{ext}")`. This will then automatically verify the download matches the hash at runtime.
4. Update your test code to call:
`download_s3_binary(f"s3://mdb-build-public/db-contrib-tool/v1/db-contrib-tool-v1_{os}{ext}")`.
This will then automatically verify the download matches the hash at runtime.
5. Create a commit with your new code that adds in the `download_s3_binary` call and the `buildscripts/s3_binary/hashes.py` modifications.
5. Create a commit with your new code that adds in the `download_s3_binary` call and the
`buildscripts/s3_binary/hashes.py` modifications.
The case above covers usage in Python. If using another language like starlark for Bazel dependencies, you would follow the same flow but copy the hashes into the starlark code instead of relying off of hashes.py. Please retain the modifications to hashes.py regardless to make it easy to use your binaries in python.
The case above covers usage in Python. If using another language like starlark for Bazel
dependencies, you would follow the same flow but copy the hashes into the starlark code instead of
relying off of hashes.py. Please retain the modifications to hashes.py regardless to make it easy to
use your binaries in python.
### Future Additions
In general, it's less error prone to have the entire flow of building, uploading, and using a binary all happen in an automated pipeline without developer interaction. In the future, this tool will be updated to be easily invocable from a continuous integration pipeline that performs the build and either returns the hashes to the user to be later committed, or automatically submits a PR to update them.
In general, it's less error prone to have the entire flow of building, uploading, and using a binary
all happen in an automated pipeline without developer interaction. In the future, this tool will be
updated to be easily invocable from a continuous integration pipeline that performs the build and
either returns the hashes to the user to be later committed, or automatically submits a PR to update
them.

View File

@ -55,8 +55,8 @@ bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-p
## Storage Execution
The smoke test suites for storage execution are divided up into components. The smoke test suite
for all of the components that storage execution owns can be run with the following:
The smoke test suites for storage execution are divided up into components. The smoke test suite for
all of the components that storage execution owns can be run with the following:
```
bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-bsoncolumn,server-collection-write-path,server-external-sorter,server-index-builds,server-key-string,server-storage-engine-integration,server-timeseries-bucket-catalog,server-tracking-allocators,server-ttl //...
@ -76,7 +76,8 @@ There are currently no smoke test integration tests for this component.
### Server-Collection-Write-Path
The unit and integration tests for the server-collection-write-path component can be run with the following:
The unit and integration tests for the server-collection-write-path component can be run with the
following:
```
bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-collection-write-path //...
@ -112,7 +113,8 @@ There are currently no smoke test integration tests for this component.
### Server-Storage-Engine-Integration
The unit and integration tests for the server-storage-engine-integration component can be run with the following:
The unit and integration tests for the server-storage-engine-integration component can be run with
the following:
```
bazel test --test_output=summary --test_tag_filters=-intermediate_debug,server-storage-engine-integration //...

View File

@ -10,7 +10,8 @@ mongodb_repo_root$ source python3-venv/bin/activate
(python3-venv) mongodb_repo_root$ python buildscripts/resmoke.py run --suites resmoke_end2end_tests
```
- Finer grained control of tests can also be run with by invoking python's unittest main by hand. E.g:
- Finer grained control of tests can also be run with by invoking python's unittest main by hand.
E.g:
```
(python3-venv) mongodb_repo_root$ python -m unittest -v buildscripts.tests.resmoke_end2end.test_resmoke.TestTestSelection.test_at_sign_as_replay_file

View File

@ -4,24 +4,26 @@
Antithesis is a third party vendor with an environment that can perform network fuzzing. We can
upload images containing `docker-compose.yml` files, which represent various MongoDB topologies, to
the Antithesis Docker registry. Antithesis runs `docker-compose up` from these images to spin up
the corresponding multi-container application in their environment and run a test suite. Network
fuzzing is performed on the topology while the test suite runs & a report is generated by
Antithesis identifying bugs. Check out
https://github.com/mongodb/mongo/wiki/Testing-MongoDB-with-Antithesis to see an example of how we
use Antithesis today.
the Antithesis Docker registry. Antithesis runs `docker-compose up` from these images to spin up the
corresponding multi-container application in their environment and run a test suite. Network fuzzing
is performed on the topology while the test suite runs & a report is generated by Antithesis
identifying bugs. Check out https://github.com/mongodb/mongo/wiki/Testing-MongoDB-with-Antithesis to
see an example of how we use Antithesis today.
## Base Images
The `base_images` directory consists of the building blocks for creating a MongoDB test topology.
These images are uploaded to the Antithesis Docker registry [nightly](https://github.com/mongodb/mongo/blob/6cf8b162a61173eb372b54213def6dd61e1fd684/etc/evergreen_yml_components/variants/ubuntu/test_dev_master_and_lts_branches_only.yml#L28) during the
[`antithesis image build and push`](https://github.com/mongodb/mongo/blob/020632e3ae328f276b2c251417b5a39389af6141/etc/evergreen_yml_components/definitions.yml#L2823) function.
These images are uploaded to the Antithesis Docker registry
[nightly](https://github.com/mongodb/mongo/blob/6cf8b162a61173eb372b54213def6dd61e1fd684/etc/evergreen_yml_components/variants/ubuntu/test_dev_master_and_lts_branches_only.yml#L28)
during the
[`antithesis image build and push`](https://github.com/mongodb/mongo/blob/020632e3ae328f276b2c251417b5a39389af6141/etc/evergreen_yml_components/definitions.yml#L2823)
function.
### mongo_binaries
This image contains the latest `mongo`, `mongos` and `mongod` binaries. It can be used to
start a `mongod` instance, `mongos` instance or execute `mongo` commands. This is the main building
block for creating the System Under Test topology.
This image contains the latest `mongo`, `mongos` and `mongod` binaries. It can be used to start a
`mongod` instance, `mongos` instance or execute `mongo` commands. This is the main building block
for creating the System Under Test topology.
### workload
@ -36,16 +38,16 @@ buildscript/resmoke.py run --suite antithesis_concurrency_sharded_with_stepdowns
**Every topology must have 1 workload container.**
Note: During `workload` image build, `evergreen/antithesis_image_build_and_push.sh` runs, which generates
"antithesis compatible" test suites and prepends them with `antithesis_`. These are the test suites
that can run in antithesis and are available from within the `workload` container.
Note: During `workload` image build, `evergreen/antithesis_image_build_and_push.sh` runs, which
generates "antithesis compatible" test suites and prepends them with `antithesis_`. These are the
test suites that can run in antithesis and are available from within the `workload` container.
### Dockerfile
This assembles an image with the necessary files for spinning up the corresponding topology. It
consists of a `docker-compose.yml`, a `logs` directory, a `scripts` directory and a `data`
directory. If this is structured properly, you should be able to copy the files & directories
from this image and run `docker-compose up` to set up the desired topology.
directory. If this is structured properly, you should be able to copy the files & directories from
this image and run `docker-compose up` to set up the desired topology.
Example from what `buildscripts/resmokelib/testing/docker_cluster_image_builder.py` generates:
@ -67,8 +69,8 @@ therefore use `FROM scratch`.
### docker-compose.yml
This describes how to construct the corresponding topology using the
`mongo-binaries` and `workload` images.
This describes how to construct the corresponding topology using the `mongo-binaries` and `workload`
images.
Example from `buildscripts/antithesis/topologies/sharded_cluster/docker-compose.yml`:
@ -162,15 +164,15 @@ networks:
Each container must have a `command` in `docker-compose.yml` that runs an init script. The init
script belongs in the `scripts` directory, which is included as a volume. The `command` should be
set like so: `/bin/bash /scripts/[script_name].sh` or `python3 /scripts/[script_name].py`. This is
a requirement for the topology to start up properly in Antithesis.
set like so: `/bin/bash /scripts/[script_name].sh` or `python3 /scripts/[script_name].py`. This is a
requirement for the topology to start up properly in Antithesis.
When creating `mongod` or `mongos` instances, route the logs like so:
`--logpath /var/log/mongodb/mongodb.log` and utilize `volumes` -- as in `database1`.
This enables us to easily retrieve logs if a bug is detected by Antithesis.
`--logpath /var/log/mongodb/mongodb.log` and utilize `volumes` -- as in `database1`. This enables us
to easily retrieve logs if a bug is detected by Antithesis.
The `ipv4_address` should be set to `10.20.20.130` or higher if you do not want that container to
be affected by network fuzzing. For instance, you would likely not want the `workload` container
The `ipv4_address` should be set to `10.20.20.130` or higher if you do not want that container to be
affected by network fuzzing. For instance, you would likely not want the `workload` container
to be affected by network fuzzing -- as shown in the example above.
Use the `evergreen-latest-master` tag for all images. This is updated automatically in
@ -182,20 +184,26 @@ Take a look at `buildscripts/antithesis/topologies/sharded_cluster/scripts/mongo
how to use util methods from `buildscripts/antithesis/topologies/sharded_cluster/scripts/utils.py`
to set up the desired topology. You can also use simple shell scripts as in the case of
`buildscripts/antithesis/topologies/sharded_cluster/scripts/database_init.py`. These init scripts
must not end in order to keep the underlying container alive. You can use an infinite while
loop for `python` scripts or you can use `tail -f /dev/null` for shell scripts.
must not end in order to keep the underlying container alive. You can use an infinite while loop for
`python` scripts or you can use `tail -f /dev/null` for shell scripts.
## How do I create a new topology for Antithesis testing?
This should be done with care to ensure we are using our limited resources efficiently.
Create a new task extending the `antithesis_task_template`, tagged with `antithesis`, passing the specified `suite` to the `antithesis image build and push` task. See other examples to get started.
Create a new task extending the `antithesis_task_template`, tagged with `antithesis`, passing the
specified `suite` to the `antithesis image build and push` task. See other examples to get started.
## How do I test my suite in antithesis?
If you provide the evergreen parameter `schedule_antithesis_tests` to your evergreen patch, once we build the antithesis images in your evergreen patch we send antithesis an api request to run your newly created images for an hour. You will get emailed the report when it finishes running in antithesis.
If you provide the evergreen parameter `schedule_antithesis_tests` to your evergreen patch, once we
build the antithesis images in your evergreen patch we send antithesis an api request to run your
newly created images for an hour. You will get emailed the report when it finishes running in
antithesis.
Important Note: This will happen for every antithesis task you schedule in your patch. Please do not schedule more than 1 or 2 tasks with this parameter at a time or it will use up a lot of our testing time allocated with antithesis.
Important Note: This will happen for every antithesis task you schedule in your patch. Please do not
schedule more than 1 or 2 tasks with this parameter at a time or it will use up a lot of our testing
time allocated with antithesis.
`evergreen patch --param schedule_antithesis_tests=true`
@ -203,10 +211,10 @@ Important Note: This will happen for every antithesis task you schedule in your
### Normal resmoke testing
Antithesis constantly runs your resmoke suite with one random test from the suite at a time.
We support this out-of-the-box with most resmoke suites that use python fixtures.
This is very similar to how tests run in evergreen.
Your antithesis tasks in evergreen will default to this if the `antithesis_test_composer_dir` var is not specified on the task.
Antithesis constantly runs your resmoke suite with one random test from the suite at a time. We
support this out-of-the-box with most resmoke suites that use python fixtures. This is very similar
to how tests run in evergreen. Your antithesis tasks in evergreen will default to this if the
`antithesis_test_composer_dir` var is not specified on the task.
### Test Composer
@ -222,4 +230,5 @@ Evergreen configuration details, see
## Additional Resources
If you are interested in leveraging Antithesis feel free to reach out to #ask-devprod-correctness or #server-testing on Slack.
If you are interested in leveraging Antithesis feel free to reach out to #ask-devprod-correctness or
#server-testing on Slack.

View File

@ -1,11 +1,10 @@
# Server-Internal Baton Pattern
Batons are lightweight job queues in _mongod_ and _mongos_ processes that allow
recording the intent to execute a task (e.g., polling on a network socket) and
deferring its execution to a later time. Batons, often by reusing `Client`
threads and through the _Waitable_ interface, move the execution of scheduled
tasks out of the line, potentially hiding the execution cost from the critical
path. A total of four baton classes are available today:
Batons are lightweight job queues in _mongod_ and _mongos_ processes that allow recording the intent
to execute a task (e.g., polling on a network socket) and deferring its execution to a later time.
Batons, often by reusing `Client` threads and through the _Waitable_ interface, move the execution
of scheduled tasks out of the line, potentially hiding the execution cost from the critical path. A
total of four baton classes are available today:
- [Baton][baton]
- [DefaultBaton][defaultBaton]
@ -14,72 +13,74 @@ path. A total of four baton classes are available today:
## Baton Basics
All baton implementations extend _Baton_. They are tightly associated with an
`OperationContext` and its `Client` thread. An `OperationContext` that belongs
to a `ServiceContext` with a `TransportLayer` uses an `AsioNetworkingBaton`,
else a `DefaultBaton`. The baton is accessed through the `OperationContext` with
a call to `OperationContext::getBaton()`.
All baton implementations extend _Baton_. They are tightly associated with an `OperationContext` and
its `Client` thread. An `OperationContext` that belongs to a `ServiceContext` with a
`TransportLayer` uses an `AsioNetworkingBaton`, else a `DefaultBaton`. The baton is accessed through
the `OperationContext` with a call to `OperationContext::getBaton()`.
Each baton implementation exposes an interface to allow scheduling tasks on the
baton, to demand the awakening of the baton on client socket disconnect, and to
create a _SubBaton_. A _SubBaton_, for any of the baton types, is essentially a
handle to a local object that proxies scheduling requests to its underlying baton
until it is detached (e.g., through destruction of its handle).
Each baton implementation exposes an interface to allow scheduling tasks on the baton, to demand the
awakening of the baton on client socket disconnect, and to create a _SubBaton_. A _SubBaton_, for
any of the baton types, is essentially a handle to a local object that proxies scheduling requests
to its underlying baton until it is detached (e.g., through destruction of its handle).
Additionally, a _NetworkingBaton_ enables consumers of a transport layer to
execute I/O themselves, rather than delegating it to other threads. They are
special batons that are able to poll network sockets, which is not feasible
through other baton types. This is essential for minimizing context switches and
improving the readability of stack traces.
Additionally, a _NetworkingBaton_ enables consumers of a transport layer to execute I/O themselves,
rather than delegating it to other threads. They are special batons that are able to poll network
sockets, which is not feasible through other baton types. This is essential for minimizing context
switches and improving the readability of stack traces.
A baton runs automatically when blocking on its associated `OperationContext`
with a call to `OperationContext::waitForConditionOrInterrupt()`. Many different
apis that take in or use an _Interruptible_ will eventually call into this method
(e.g. `Future::get(...)`, `OperationContext::sleepUntil(...)`, etc.).
A baton runs automatically when blocking on its associated `OperationContext` with a call to
`OperationContext::waitForConditionOrInterrupt()`. Many different apis that take in or use an
_Interruptible_ will eventually call into this method (e.g. `Future::get(...)`,
`OperationContext::sleepUntil(...)`, etc.).
### DefaultBaton
DefaultBaton is the most basic baton implementation. This baton provides the
platform to execute tasks while a client thread awaits an event or a timeout,
essentially paving the way towards utilizing idle cycles of client threads for
useful work. Tasks can be scheduled on this baton through its associated
`OperationContext` and using `OperationContext::getBaton()::schedule(...)`.
DefaultBaton is the most basic baton implementation. This baton provides the platform to execute
tasks while a client thread awaits an event or a timeout, essentially paving the way towards
utilizing idle cycles of client threads for useful work. Tasks can be scheduled on this baton
through its associated `OperationContext` and using `OperationContext::getBaton()::schedule(...)`.
Note that because _Baton_ extends an _OutOfLineExecutor_, it can be used as the
executor to run work on an `ExecutorFuture`.
Note that because _Baton_ extends an _OutOfLineExecutor_, it can be used as the executor to run work
on an `ExecutorFuture`.
### AsioNetworkingBaton
The AsioNetworkingBaton can schedule and run tasks similarly to the _DefaultBaton_,
but it also implements the _NetworkingBaton_ interface to provide a networking
reactor. It can register sessions to monitor and will utilize `poll(2)` and
`eventfd(2)` to wait until I/O can be performed on the socket or until interrupted.
The AsioNetworkingBaton can schedule and run tasks similarly to the _DefaultBaton_, but it also
implements the _NetworkingBaton_ interface to provide a networking reactor. It can register sessions
to monitor and will utilize `poll(2)` and `eventfd(2)` to wait until I/O can be performed on the
socket or until interrupted.
This baton is primarily used for egress networking where it gets scheduled to send
off a command after a connection is made (see the relevant code [here][asioNetworkingBatonScheduling]).
This means that the AsioNetworkingBaton will normally perform socket I/O without
needing to poll. It only registers a session for polling if another read or
write is needed on the socket (e.g. [registering a session during socket read][asioNetworkingBatonPollingSetup]).
This baton is primarily used for egress networking where it gets scheduled to send off a command
after a connection is made (see the relevant code [here][asioNetworkingBatonScheduling]). This means
that the AsioNetworkingBaton will normally perform socket I/O without needing to poll. It only
registers a session for polling if another read or write is needed on the socket (e.g. [registering
a session during socket read][asioNetworkingBatonPollingSetup]).
In order for an egress session to use the baton, it must be specified as an
argument to `TaskExecutor::scheduleRemoteCommand(...)`.
In order for an egress session to use the baton, it must be specified as an argument to
`TaskExecutor::scheduleRemoteCommand(...)`.
Note that this baton is only available for Linux.
## Example
For an example of scheduling a task on the `OperationContext` baton, see
[here][example].
For an example of scheduling a task on the `OperationContext` baton, see [here][example].
## Considerations
Since any task scheduled on a baton is intended for out-of-line execution, it
must be non-blocking and preferably short-lived to ensure forward progress.
Since any task scheduled on a baton is intended for out-of-line execution, it must be non-blocking
and preferably short-lived to ensure forward progress.
[baton]: https://github.com/mongodb/mongo/blob/5906d967c3144d09fab6a4cc1daddb295df19ffb/src/mongo/db/baton.h#L61-L178
[defaultBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/db/default_baton.h#L46-L75
[networkingBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton.h#L61-L96
[asioNetworkingBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton_asio_linux.h#L60-L529
[asioNetworkingBatonScheduling]: https://github.com/mongodb/mongo/blob/46b8c49b4e13cc4c8389b2822f9e30dd73b81d6e/src/mongo/executor/network_interface_tl.cpp#L910
[asioNetworkingBatonPollingSetup]: https://github.com/mongodb/mongo/blob/eab4ec41cc2b28bf0a38eb813f9690e1bfa6c9a6/src/mongo/transport/asio/asio_session_impl.cpp#L666-L696
[example]: https://github.com/mongodb/mongo/blob/262e5a961fa7221bfba5722aeea2db719f2149f5/src/mongo/s/multi_statement_transaction_requests_sender.cpp#L91-L99
[baton]:
https://github.com/mongodb/mongo/blob/5906d967c3144d09fab6a4cc1daddb295df19ffb/src/mongo/db/baton.h#L61-L178
[defaultBaton]:
https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/db/default_baton.h#L46-L75
[networkingBaton]:
https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton.h#L61-L96
[asioNetworkingBaton]:
https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton_asio_linux.h#L60-L529
[asioNetworkingBatonScheduling]:
https://github.com/mongodb/mongo/blob/46b8c49b4e13cc4c8389b2822f9e30dd73b81d6e/src/mongo/executor/network_interface_tl.cpp#L910
[asioNetworkingBatonPollingSetup]:
https://github.com/mongodb/mongo/blob/eab4ec41cc2b28bf0a38eb813f9690e1bfa6c9a6/src/mongo/transport/asio/asio_session_impl.cpp#L666-L696
[example]:
https://github.com/mongodb/mongo/blob/262e5a961fa7221bfba5722aeea2db719f2149f5/src/mongo/s/multi_statement_transaction_requests_sender.cpp#L91-L99

View File

@ -1,6 +1,7 @@
# Branching
This document describes branching task regarding file updates in `10gen/mongo` repository that should be done on a new branch immediately after a branch cut.
This document describes branching task regarding file updates in `10gen/mongo` repository that
should be done on a new branch immediately after a branch cut.
## Table of contents
@ -14,11 +15,14 @@ This document describes branching task regarding file updates in `10gen/mongo` r
### GitHub App credentials
Add GitHub app credentials (app id and key) in the new project settings, eg. https://spruce.corp.mongodb.com/project/mongodb-mongo-v8.3/settings/github-app-settings (additional MANA permissions may be required, else coordinate with Release team contacts).
Add GitHub app credentials (app id and key) in the new project settings, eg.
https://spruce.corp.mongodb.com/project/mongodb-mongo-v8.3/settings/github-app-settings (additional
MANA permissions may be required, else coordinate with Release team contacts).
## 2. Create working branch
To save time during the branch cut these branching changes could be done beforehand, but not too early to avoid extra file conflicts, and then rebased on a new `vX.Y` branch.
To save time during the branch cut these branching changes could be done beforehand, but not too
early to avoid extra file conflicts, and then rebased on a new `vX.Y` branch.
Create a working branch from `master` or from a new `vX.Y` branch if it already exists:
@ -30,13 +34,16 @@ git checkout -b vX.Y-branching-task
## 2. Update files
**IMPORTANT!** All of these changes should be a separate commit, but they should be pushed together in the same commit-queue task.
**IMPORTANT!** All of these changes should be a separate commit, but they should be pushed together
in the same commit-queue task.
The reason they should be pushed as separate commits is in the case of needing to revert one aspect of this entire task.
The reason they should be pushed as separate commits is in the case of needing to revert one aspect
of this entire task.
> See [8.2 branching PR](https://github.com/mongodb/mongo/pull/38920/commits) for reference.
Some have some automated steps you can run, but please double-check their edits. Initialize the version here, used throughout:
Some have some automated steps you can run, but please double-check their edits. Initialize the
version here, used throughout:
```sh
VERSION=8.3
@ -51,7 +58,9 @@ sed -i "s/master/v$VERSION/g" copy.bara.sky
sed -i 's/branch = "master"/branch = "v'"$VERSION"'"/' buildscripts/sync_repo_with_copybara.py
```
For each file [`copy.bara.sky`](../../copy.bara.sky) and [`sync_repo_with_copybara.py`](../../buildscripts/sync_repo_with_copybara.py), the "master" branch references should be replaced with the new branch name.
For each file [`copy.bara.sky`](../../copy.bara.sky) and
[`sync_repo_with_copybara.py`](../../buildscripts/sync_repo_with_copybara.py), the "master" branch
references should be replaced with the new branch name.
### Evergreen YAML configurations
@ -63,16 +72,23 @@ Run the following automation and verify results:
sed -i "s/suffix\"] = \"latest\"/suffix\"] = \"v$VERSION-latest\"/g" buildscripts/generate_version_expansions.py
```
In the file [`buildscripts/generate_version_expansions.py`](../../buildscripts/generate_version_expansions.py), the "latest" suffixes should be replaced with the new branch name.
In the file
[`buildscripts/generate_version_expansions.py`](../../buildscripts/generate_version_expansions.py),
the "latest" suffixes should be replaced with the new branch name.
#### 2. Nightly YAML
[`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) will be used as YAML configuration in the new `mongodb-mongo-vX.Y` evergreen project.
[`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) will be used as YAML configuration in
the new `mongodb-mongo-vX.Y` evergreen project.
This will move some build variants from `etc/evergreen.yml` to continue running on a new branch project. More information about build variants after branching is [here](../evergreen-testing/yaml_configuration/buildvariants.md#build-variants-after-branching).
This will move some build variants from `etc/evergreen.yml` to continue running on a new branch
project. More information about build variants after branching is
[here](../evergreen-testing/yaml_configuration/buildvariants.md#build-variants-after-branching).
- Copy over commit-queue aliases and patch aliases from [`etc/evergreen.yml`](../../etc/evergreen.yml)
- Update "include" section: comment out or uncomment file includes as instructions in the comments suggest.
- Copy over commit-queue aliases and patch aliases from
[`etc/evergreen.yml`](../../etc/evergreen.yml)
- Update "include" section: comment out or uncomment file includes as instructions in the comments
suggest.
#### 3. Burn-in tasks
@ -82,7 +98,12 @@ Run the following automation and verify results:
sed -i '/burn_in_tag_include_build_variants/{N;N;N;d;}' etc/evergreen_yml_components/variants/misc/misc.yml
```
In the file [`etc/evergreen_yml_components/variants/misc/misc.yml`](../../etc/evergreen_yml_components/variants/misc/misc.yml), build variant names in the ["burn_in_tag_include_build_variants" expansion](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/etc/evergreen_yml_components/variants/misc/misc.yml#L21) that are _not_ included in [`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) are _removed_.
In the file
[`etc/evergreen_yml_components/variants/misc/misc.yml`](../../etc/evergreen_yml_components/variants/misc/misc.yml),
build variant names in the
["burn_in_tag_include_build_variants" expansion](https://github.com/mongodb/mongo/blob/0a68308f0d39a928ed551f285ba72ca560c38576/etc/evergreen_yml_components/variants/misc/misc.yml#L21)
that are _not_ included in [`etc/evergreen_nightly.yml`](../../etc/evergreen_nightly.yml) are
_removed_.
#### 4. Suggested to Required
@ -94,7 +115,9 @@ sed -i 's@display_name: "\* Amazon Linux 2023 arm64 Enterprise"@display_name: "!
sed -i 's/tags: \["suggested", "forbid_tasks_tagged_with_experimental"\]/tags: ["required", "forbid_tasks_tagged_with_experimental"]/g' etc/evergreen_yml_components/variants/amazon/test_dev.yml
```
For the variant `enterprise-amazon-linux2023-arm64` in [`etc/evergreen_yml_components/variants/amazon/test_dev.yml`](../../etc/evergreen_yml_components/variants/amazon/test_dev.yml), replace:
For the variant `enterprise-amazon-linux2023-arm64` in
[`etc/evergreen_yml_components/variants/amazon/test_dev.yml`](../../etc/evergreen_yml_components/variants/amazon/test_dev.yml),
replace:
- "\*" with "!" in their display names
- "suggested" variant tag with "required"
@ -116,10 +139,12 @@ sed -i 's/!.incompatible_all_feature_flags/!.requires_all_feature_flags/g' $FILE
For the build variant names:
- in [`etc/evergreen_yml_components/variants/windows/test_dev.yml`](../../etc/evergreen_yml_components/variants/windows/test_dev.yml):
- in
[`etc/evergreen_yml_components/variants/windows/test_dev.yml`](../../etc/evergreen_yml_components/variants/windows/test_dev.yml):
- `enterprise-windows-all-feature-flags-required`
- `enterprise-windows-all-feature-flags-non-essential`
- in [`etc/evergreen_yml_components/variants/sanitizer/test_dev.yml`](../../etc/evergreen_yml_components/variants/sanitizer/test_dev.yml):
- in
[`etc/evergreen_yml_components/variants/sanitizer/test_dev.yml`](../../etc/evergreen_yml_components/variants/sanitizer/test_dev.yml):
- `linux-debug-aubsan-lite-all-feature-flags-required`
@ -130,9 +155,12 @@ For the build variant names:
#### 6. Sys-perf YAML
[`etc/system_perf.yml`](../../etc/system_perf.yml) will be used as YAML configuration for a new `sys-perf-X.Y` evergreen project
[`etc/system_perf.yml`](../../etc/system_perf.yml) will be used as YAML configuration for a new
`sys-perf-X.Y` evergreen project
> Ensure that [DSI](https://github.com/10gen/dsi/blob/master/evergreen/system_perf/README.md#branching) has been updated with new branches
> Ensure that
> [DSI](https://github.com/10gen/dsi/blob/master/evergreen/system_perf/README.md#branching) has been
> updated with new branches
Run the following automation and verify results:
@ -146,8 +174,13 @@ sed -i "s@evergreen/system_perf/master/variants.yml@evergreen/system_perf/$VERSI
In the file [`etc/system_perf.yml`](../../etc/system_perf.yml), the following should be reflected:
- Remove `evergreen/system_perf/master/master_variants.yml` from "include" section
- With the exception of `base.yml`, update all other entries that contain `master` in the path to contain `X.Y` in the path instead. (e.g. `evergreen/system_perf/master/variants.yml` should become `evergreen/system_perf/X.Y/variants.yml`).
- Update the [evergreen project variable](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-and-Distro-Settings#variables) `compile_project` in the new sys-perf-X.Y evergreen project to point to the new mongodb-mongo-vX.Y branch
- With the exception of `base.yml`, update all other entries that contain `master` in the path to
contain `X.Y` in the path instead. (e.g. `evergreen/system_perf/master/variants.yml` should become
`evergreen/system_perf/X.Y/variants.yml`).
- Update the
[evergreen project variable](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-and-Distro-Settings#variables)
`compile_project` in the new sys-perf-X.Y evergreen project to point to the new mongodb-mongo-vX.Y
branch
#### 7. Evergreen project validation
@ -157,7 +190,10 @@ Run the following automation and verify results:
sed -i 's/RELEASE_BRANCH = False/RELEASE_BRANCH = True/g' buildscripts/validate_evg_project_config.py
```
In file [`buildscripts/validate_evg_project_config.py`](../../buildscripts/validate_evg_project_config.py), the `RELEASE_BRANCH` variable should be set to `True` to leverage a specialized shortcut conditional to `evaluate` the project, not `validate`.
In file
[`buildscripts/validate_evg_project_config.py`](../../buildscripts/validate_evg_project_config.py),
the `RELEASE_BRANCH` variable should be set to `True` to leverage a specialized shortcut conditional
to `evaluate` the project, not `validate`.
#### 8. Coverity
@ -167,7 +203,8 @@ Run the following automation and verify results:
sed -i "s/stream: mongo.master/stream: mongo.v$VERSION/g" etc/coverity.yml
```
In the file [`etc/coverity.yml`](../../etc/coverity.yml), the "stream" should be updated to the new branch.
In the file [`etc/coverity.yml`](../../etc/coverity.yml), the "stream" should be updated to the new
branch.
#### Finally: format and lint
@ -179,7 +216,8 @@ Run linters and formatters and fix anything that couldn't be autofixed.
## 3. Test changes
In case working branch was created from `master` branch, rebase it on a new `vX.Y` branch and fix file conflicts if any.
In case working branch was created from `master` branch, rebase it on a new `vX.Y` branch and fix
file conflicts if any.
Schedule required patch on a new `mongodb-mongo-vX.Y` project:
@ -187,7 +225,8 @@ Schedule required patch on a new `mongodb-mongo-vX.Y` project:
evergreen patch -p mongodb-mongo-vX.Y -a required
```
If patch results reveal that some steps are missing or outdated in this file, make sure to update the branching documentation on a "master" branch accordingly.
If patch results reveal that some steps are missing or outdated in this file, make sure to update
the branching documentation on a "master" branch accordingly.
## 4. Merge changes

View File

@ -1,8 +1,7 @@
# Building MongoDB
Please note that prebuilt binaries are available on
[mongodb.org](http://www.mongodb.org/downloads) and may be the easiest
way to get started, rather than building from source.
Please note that prebuilt binaries are available on [mongodb.org](http://www.mongodb.org/downloads)
and may be the easiest way to get started, rather than building from source.
To build MongoDB, you will need:
@ -20,13 +19,13 @@ To build MongoDB, you will need:
- On Ubuntu, the lzma library is required. Install `liblzma-dev`
- On Amazon Linux, the xz-devel library is required. `yum install xz-devel`
- Python 3.13
- About 13 GB of free disk space for the core binaries (`mongod`,
`mongos`, and `mongo`).
- About 13 GB of free disk space for the core binaries (`mongod`, `mongos`, and `mongo`).
If using a newer version of a C++ compiler than listed above, it may work. However the versions listed above have been verified to work.
If using a newer version of a C++ compiler than listed above, it may work. However the versions
listed above have been verified to work.
MongoDB supports the following architectures: arm64, ppc64le, s390x,
and x86-64. More detailed platform instructions can be found below.
MongoDB supports the following architectures: arm64, ppc64le, s390x, and x86-64. More detailed
platform instructions can be found below.
## Quick (re)Start
@ -45,23 +44,21 @@ If you only want to build the database server `mongod`:
$ bazel build install-mongod
**_Note_**: For C++ compilers that are newer than the supported
version, the compiler may issue new warnings that cause MongoDB to
fail to build since the build system treats compiler warnings as
errors. To ignore the warnings, pass the switch
`--disable_warnings_as_errors=True` to the bazel command.
**_Note_**: For C++ compilers that are newer than the supported version, the compiler may issue new
warnings that cause MongoDB to fail to build since the build system treats compiler warnings as
errors. To ignore the warnings, pass the switch `--disable_warnings_as_errors=True` to the bazel
command.
$ bazel build install-mongod --disable_warnings_as_errors=True
If you want to build absolutely everything (`mongod`, `mongo`, unit
tests, etc):
If you want to build absolutely everything (`mongod`, `mongo`, unit tests, etc):
$ bazel build --build_tag_filters=mongo_binary //src/mongo/...
## Bazel Targets
The following targets can be named on the bazel command line to build and
install a subset of components:
The following targets can be named on the bazel command line to build and install a subset of
components:
- `install-mongod`
- `install-mongos`
@ -69,16 +66,15 @@ install a subset of components:
- `install-dist` (includes all server components)
- `install-devcore` (includes `mongod`, `mongos`, and `jstestshell` (formerly `mongo` shell))
**_NOTE_**: The `install-core` and `install-dist` targets are _not_
guaranteed to be identical. The `install-core` target will only ever include a
minimal set of "core" server components, while `install-dist` is intended
for a functional end-user installation. If you are testing, you should use the
`install-devcore` or `install-dist` targets instead.
**_NOTE_**: The `install-core` and `install-dist` targets are _not_ guaranteed to be identical. The
`install-core` target will only ever include a minimal set of "core" server components, while
`install-dist` is intended for a functional end-user installation. If you are testing, you should
use the `install-devcore` or `install-dist` targets instead.
## Where to find Binaries
The build system will produce an installation tree into `bazel-bin/install`, as well
individual install target trees like `bazel-bin/<install-target>`.
The build system will produce an installation tree into `bazel-bin/install`, as well individual
install target trees like `bazel-bin/<install-target>`.
## Windows
@ -97,8 +93,6 @@ To install dependencies on Debian or Ubuntu systems:
## OS X
Install Xcode 16.4 or newer. Make sure macOS 15.5 platform
is installed.
Install Xcode 16.4 or newer. Make sure macOS 15.5 platform is installed.
Install llvm and lld, version 19 from brew:
brew install llvm@19 lld@19
Install llvm and lld, version 19 from brew: brew install llvm@19 lld@19

View File

@ -5,25 +5,23 @@ current version of master, if not explicitly stated otherwise. Implementation de
versions may vary slightly.
Change streams are a convenient way for an application to monitor changes made to the data in a
deployment.
The events produced by change streams are called "change events". The event data is produced from
the oplog(s) of the deployment.
The events that are emitted by change streams include
deployment. The events produced by change streams are called "change events". The event data is
produced from the oplog(s) of the deployment. The events that are emitted by change streams include
- DML events: emitted for operations that insert, update, replace, or delete individual documents.
- DDL events: emitted for operations that create, drop, or modify collections, databases, or views.
- Data placement events: emitted for operations that define or modify the placement of data inside
a sharded cluster.
- Data placement events: emitted for operations that define or modify the placement of data inside a
sharded cluster.
- Cluster topology events: emitted for operations that add or remove shards in a sharded cluster.
Which exact event types are emitted by a change stream depends on the change stream configuration
and the deployment type.
Change streams are mainly used by customer applications and tools to keep track of changes to the
data in a deployment, in order to relay these updates to external systems.
Some of MongoDB's own tools and components are also based on change streams, e.g. _mongosync_ (C2C),
Atlas Search, Atlas Stream Processing, and the resharding process.
The component that opens a change stream and pulls events from it is called the "consumer".
data in a deployment, in order to relay these updates to external systems. Some of MongoDB's own
tools and components are also based on change streams, e.g. _mongosync_ (C2C), Atlas Search, Atlas
Stream Processing, and the resharding process. The component that opens a change stream and pulls
events from it is called the "consumer".
## Change Stream Guarantees
@ -31,17 +29,16 @@ Change Streams provide various guarantees:
- Ordering: change streams deliver events in the order they originally occurred within the target
namespace (e.g., collection, database, or entire cluster). The order is based on the sequence in
which the operations were applied to the oplog.
In a sharded cluster, the events from multiple oplogs will be merged deterministically into a
single, ordered stream of change events.
which the operations were applied to the oplog. In a sharded cluster, the events from multiple
oplogs will be merged deterministically into a single, ordered stream of change events.
- Durability and reproducability: change streams are based on the internal oplog, which is part of
the deployment's replication mechanism. Change streams only deliver events after they have been
committed to a majority of nodes and durably persisted, ensuring they will not be rolled back.
- Exactly-once delivery: every event in a change stream is emitted exactly once, and no event that
matches the change stream filter is skipped.
- Resumability: change stream consumption can be interrupted due to transient errors (e.g. network
issues, node failures, application errors), but it can be resumed from the exact point where
the consumption stopped. This is made possible by the resume token (`_id` field) that accompanies
issues, node failures, application errors), but it can be resumed from the exact point where the
consumption stopped. This is made possible by the resume token (`_id` field) that accompanies
every change event, which acts as a bookmark. This allows to the consumer to continue processing
changes from the last known position without missing events.
@ -71,9 +68,8 @@ opened against standalone _mongod_ instances, as there is no oplog to generate t
standalone mode.
In replica set deployments, the change stream can be opened directly on any replica set member of
the deployment.
In sharded cluster deployments, the change stream must be opened against any of the deployment's
_mongos_ processes.
the deployment. In sharded cluster deployments, the change stream must be opened against any of the
deployment's _mongos_ processes.
A change stream is opened by executing an `aggregate` command with a pipeline that contains at least
the `$changeStream` pipeline stage.
@ -115,9 +111,8 @@ db.getSiblingDB("testDB").runCommand({
```
The `aggregate` parameter must be set to `1` for database-level change streams, and the command must
be executed inside the desired database.
The internal namespace that is used by database-level change streams is `<dbName>.$cmd.aggregate`
(where `<dbName>` is the actual name of the database).
be executed inside the desired database. The internal namespace that is used by database-level
change streams is `<dbName>.$cmd.aggregate` (where `<dbName>` is the actual name of the database).
### Opening an All-Cluster Change Stream
@ -161,9 +156,8 @@ into smaller fragments, in order to avoid running into `BSONObjectTooLarge` erro
### Change Stream Start Time
When opening a change stream without specifying an explicit point in time, the change stream will be
opened using the current time, and will report only change events that happened after that point
in time.
The current time here is
opened using the current time, and will report only change events that happened after that point in
time. The current time here is
- the time of the latest majority-committed operation for replica set change streams, or
- the value of the cluster's vector clock for sharded cluster change streams.
@ -174,9 +168,8 @@ parameter is specified as a logical timestamp.
### Resuming Change Streams
Change streams allow the consumer to resume the change stream after an error occurred.
To support resumability, change streams report a "resume token" inside the `_id` field of every
emitted event.
Change streams allow the consumer to resume the change stream after an error occurred. To support
resumability, change streams report a "resume token" inside the `_id` field of every emitted event.
To resume a change stream after an error occurred, the resume token of a previously consumed event
can be passed in one of the parameters `resumeAfter` or `startAfter` when opening a change stream.
@ -198,8 +191,7 @@ with a different `$match` expression may lead to different events being returned
the event with the original resume token not being found in the new change stream.
The resume tokens that are emitted by change streams are string values that contain a hexadecimal
encoding of the internal resume token data.
The internal resume token data contains
encoding of the internal resume token data. The internal resume token data contains
- the cluster time of an event.
- the version of the resume token format.
@ -212,11 +204,13 @@ The internal resume token data contains
Resume tokens are versioned. Currently only version 2 is supported.
Future versions may introduce new resume token versions. Client applications should treat resume
tokens as opaque identifiers and should not make any assumptions about the format or internals
or resume tokens, nor should they rely on the internal implementation details of resume tokens.
tokens as opaque identifiers and should not make any assumptions about the format or internals or
resume tokens, nor should they rely on the internal implementation details of resume tokens.
Resume tokens are serialized and deserialized by the [ResumeToken](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L148)
class. The resume token internal data is stored in [ResumeTokenData](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L51).
Resume tokens are serialized and deserialized by the
[ResumeToken](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L148)
class. The resume token internal data is stored in
[ResumeTokenData](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/resume_token.h#L51).
#### Resume Token Types
@ -225,12 +219,12 @@ There are two types of resume tokens:
- event resume tokens
- high watermark resume tokens
The former stem from actual change events.
High watermark token are a special kind of change stream resume token that represent a logical
position in the global change stream ordered only by cluster time, not a specific event.
The former stem from actual change events. High watermark token are a special kind of change stream
resume token that represent a logical position in the global change stream ordered only by cluster
time, not a specific event.
High watermark tokens sort strictly before any real event token at the same cluster time.
That is, a highwatermark token for time T sorts ahead of all events whose cluster time >= T.
High watermark tokens sort strictly before any real event token at the same cluster time. That is, a
highwatermark token for time T sorts ahead of all events whose cluster time >= T.
#### Decoding Resume Tokens
@ -267,43 +261,42 @@ by the consumer or the change stream runs into an error. Also, unused cursors ar
garbage-collected after a period of inactivity.
When opening a change stream on a sharded cluster, the targeted `mongos` instance will open the
required cursors on the relevant shards of the cluster and also the config server. Here, the `mongos`
instance will also automatically open additional cursors in case new shards are added to the
cluster. All this is abstracted from the consumer of the change stream. The consumer of the change
stream will only see a single cursor and interact with _mongos_, which handles the complexity of
managing the underlying shard cursors.
required cursors on the relevant shards of the cluster and also the config server. Here, the
`mongos` instance will also automatically open additional cursors in case new shards are added to
the cluster. All this is abstracted from the consumer of the change stream. The consumer of the
change stream will only see a single cursor and interact with _mongos_, which handles the complexity
of managing the underlying shard cursors.
If a change stream cursor can be successfully established, the cursor id is returned to the
consumer. The consumer can then use the cursor id to pull change events from the change stream by
issuing follow-up `getMore` commands to this cursor.
If a change stream cursor cannot be successfully opened, the initial `aggregate` command will
return an error, and the returned cursor id will be `0`. In this case, no events can be consumed
from the change stream, and the consumer needs to resolve the error.
If a change stream cursor cannot be successfully opened, the initial `aggregate` command will return
an error, and the returned cursor id will be `0`. In this case, no events can be consumed from the
change stream, and the consumer needs to resolve the error.
### Change Stream errors
When a change stream is opened at a specific point in time, it is validated that the oplog of all
participating nodes actually contains data for this point in time.
If the oplog does not contain any data for the exact point in time or before, it would be possible
that the requested data has already fallen off the oplog.
In case no oplog entry can be found that is at least as old as the specified timetamp, opening the
change stream will fail with error code `OplogQueryMinTsMissing`.
This validation happens for all change streams, regardless if the start timestamp is specified via
the `resumeAfter`, `startAfter` or `startAtOperationTime` parameters, or if the start time is
implied from the current time.
An exception in which opening a change stream at a later point in time than the timestamp of the
first present oplog entry is permitted is for new shard primaries.
New shard primary can be added to an existing cluster at any point in time. When a new shard primary
is added, its first oplog entry will be a no-op entry with `msg` == `initiating set` (on ASC) or
`msg` == `new primary` (on DSC).
participating nodes actually contains data for this point in time. If the oplog does not contain any
data for the exact point in time or before, it would be possible that the requested data has already
fallen off the oplog. In case no oplog entry can be found that is at least as old as the specified
timetamp, opening the change stream will fail with error code `OplogQueryMinTsMissing`. This
validation happens for all change streams, regardless if the start timestamp is specified via the
`resumeAfter`, `startAfter` or `startAtOperationTime` parameters, or if the start time is implied
from the current time. An exception in which opening a change stream at a later point in time than
the timestamp of the first present oplog entry is permitted is for new shard primaries. New shard
primary can be added to an existing cluster at any point in time. When a new shard primary is added,
its first oplog entry will be a no-op entry with `msg` == `initiating set` (on ASC) or `msg` ==
`new primary` (on DSC).
The code for this can be found [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/classic/collection_scan.cpp#L195-L227).
The code for this can be found
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/classic/collection_scan.cpp#L195-L227).
Another common error is `ChangeStreamHistoryLost`. This error is raised when a change stream is
opened with a resume token that cannot be found (anymore) in any of the participating nodes' oplogs.
This can either happen when the resume event has actually fallen off the oplog, or, when a
change stream is resumed with the resume token from another change stream with a different `$match`
This can either happen when the resume event has actually fallen off the oplog, or, when a change
stream is resumed with the resume token from another change stream with a different `$match`
expression. In this case, the new change stream may filter out the resume event due to the different
`$match` expression, so it cannot be found anymore.
@ -342,9 +335,9 @@ request:
- `maxTimeMS`: maximum server-side waiting time for producing events.
The `getMore` command will fill the response with up to `batchSize` results if that many events are
available. A response can also contain less events than the specified `batchSize`.
Regardless of the specified batch size, the maximum response size limit of 16MB will be honored, in
order to prevent responses from getting too large.
available. A response can also contain less events than the specified `batchSize`. Regardless of the
specified batch size, the maximum response size limit of 16MB will be honored, in order to prevent
responses from getting too large.
A change stream response is returned to the consumer when
@ -353,14 +346,13 @@ A change stream response is returned to the consumer when
would make it exceed the 16MB size limit.
In case the change stream cursor has reached the end of the oplog and there are currently no events
to return, the response will be returned immediately if it already contains at least one event.
If the response is empty, the change stream will wait for at most `maxTimeMS` for new oplog entries
to arrive.
If no new oplog entries arrive within `maxTimeMS`, an empty response will be returned. If new oplog
entries arrive within `maxTimeMS` and at least one of them matches the change stream's filter, the
matching event will be returned immediately. If oplog entries arrive but do not match the change
stream's filter, the change stream will wait for matching oplog entries until `maxTimeMS` is fully
expired.
to return, the response will be returned immediately if it already contains at least one event. If
the response is empty, the change stream will wait for at most `maxTimeMS` for new oplog entries to
arrive. If no new oplog entries arrive within `maxTimeMS`, an empty response will be returned. If
new oplog entries arrive within `maxTimeMS` and at least one of them matches the change stream's
filter, the matching event will be returned immediately. If oplog entries arrive but do not match
the change stream's filter, the change stream will wait for matching oplog entries until `maxTimeMS`
is fully expired.
### Generic Event layout
@ -379,8 +371,8 @@ The following generic fields are added for change streams that were opened with
- `collectionUUID`: UUID of the collection for which the event occurred, if applicable.
- `operationDescription`: populated for DDL events.
Most other fields are event type-specific, so they are only present for specific events.
A few such fields include:
Most other fields are event type-specific, so they are only present for specific events. A few such
fields include:
- `documentKey`: the `_id` value of the affected document, populated for DML events. May contain the
shard key values for sharded collections.
@ -389,9 +381,11 @@ A few such fields include:
value than `default`.
- `updateDescription` / `rawUpdateDescription`: contains details for "update" events.
The majority of change stream event fields are emitted by the `ChangeStreamDefaultEventTransformation`
object [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321). This object is called by the `ChangeStreamEventTransform`
stage [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75).
The majority of change stream event fields are emitted by the
`ChangeStreamDefaultEventTransformation` object
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321).
This object is called by the `ChangeStreamEventTransform` stage
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75).
A custom `$project` stage in the change stream pipeline can be used to suppress certain fields.
@ -401,8 +395,8 @@ Emitted change events can get large, especially if they contain pre- or post-ima
the events can exceed the maximum BSON object size of 16MB, which can lead to `BSONObjectTooLarge`
errors when trying to process these change stream events.
To split large change stream events into multiple smaller chunks, change stream consumers can add
a `$changeStreamSplitLargeEvent` stage as the last step of their change stream pipeline, e.g.
To split large change stream events into multiple smaller chunks, change stream consumers can add a
`$changeStreamSplitLargeEvent` stage as the last step of their change stream pipeline, e.g.
```js
db.getSiblingDB("testDB").runCommand({
@ -419,8 +413,10 @@ db.getSiblingDB("testDB").runCommand({
});
```
The splitting is performed by the `ChangeStreamSplitLargeEventStage` stage [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_split_large_event_stage.cpp#L72),
using [this helper function](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_split_event_helpers.cpp#L63).
The splitting is performed by the `ChangeStreamSplitLargeEventStage` stage
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_split_large_event_stage.cpp#L72),
using
[this helper function](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_split_event_helpers.cpp#L63).
The change stream consumer is responsible for assembling the split event fragments into a single
event later.
@ -434,10 +430,9 @@ close the change stream cursor in specific situations:
- the target collection is renamed
- the parent database of the target collection is dropped
- in database-level change streams, the change stream is invalidated if the target database is
dropped.
In case a change stream gets invalidated by any of the above situations, it will emit a special
"invalidate" event to inform the consumer that further processing is not possible.
There are no "invalidate" events in all-cluster change streams.
dropped. In case a change stream gets invalidated by any of the above situations, it will emit a
special "invalidate" event to inform the consumer that further processing is not possible. There
are no "invalidate" events in all-cluster change streams.
Issuing of change stream invalidate events is implemented in the `ChangeStreamCheckInvalidateStage`
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_invalidate_stage.cpp#L106-L157).
@ -445,12 +440,13 @@ Issuing of change stream invalidate events is implemented in the `ChangeStreamCh
## Change Stream Parameters
The behavior of change streams can be controlled via various parameters that can be passed with the
initial `aggregate` command used to open the change stream.
The parameters are defined in an [IDL file](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.idl#L84).
initial `aggregate` command used to open the change stream. The parameters are defined in an
[IDL file](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.idl#L84).
The parameters that are provided when opening the change stream are automatically validated using
mechanisms provided by the IDL framework. Additional validation of the change stream parameters is
performed [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.cpp#L391).
performed
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream.cpp#L391).
Invalid change stream parameters are immediately rejected with appropriate errors.
### `fullDocument`
@ -466,17 +462,16 @@ The following values are possible:
may not be the same version of the document that was present when the "update" change event was
originally recorded. If no document can be found by the lookup, the `fullDocument` field will
contain `null`.
- `whenAvailable`: the `fullDocument` field will be populated with the post-image for the event.
The post-image is generated on the fly from a stored pre-image and applying a delta update from
the event on top of it. If no post-image is available, the `fullDocument` field will contain
`null`.
- `whenAvailable`: the `fullDocument` field will be populated with the post-image for the event. The
post-image is generated on the fly from a stored pre-image and applying a delta update from the
event on top of it. If no post-image is available, the `fullDocument` field will contain `null`.
- `required`: populates the `fullDocument` field with the post-image for the event. Post-images are
generated in the same way as in `whenAvailable`. If no post-image can be generated, this will
abort the change stream with a `NoMatchingDocument` error.
The latter two options rely on pre-images to be enabled for the target collection(s).
When pre-images are enabled, they are written synchronously with the regular "update" oplog entry,
and change stream events arent returned until both have been majority-committed.
The latter two options rely on pre-images to be enabled for the target collection(s). When
pre-images are enabled, they are written synchronously with the regular "update" oplog entry, and
change stream events arent returned until both have been majority-committed.
Post-images for "update" events are added to change events by the `ChangeStreamAddPostImage` stage
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_post_image_stage.cpp#L84).
@ -506,29 +501,25 @@ parameters are:
#### `showExpandedEvents` (public)
The `showExpandedEvents` flag can be used to make a change stream return both additional event types
and additional fields.
The flag defaults to `false`. In this mode, change streams will only return DML events and no DDL
events.
When setting `showExpandedEvents` to `true`, change streams will also emit events for various DDL
operations.
In addition, setting `showExpandedEvents` will make change streams return the additional fields
`collectionUUID` (for various change stream event types) and `updateDescription.disambiguatedPaths`
(for update events).
and additional fields. The flag defaults to `false`. In this mode, change streams will only return
DML events and no DDL events. When setting `showExpandedEvents` to `true`, change streams will also
emit events for various DDL operations. In addition, setting `showExpandedEvents` will make change
streams return the additional fields `collectionUUID` (for various change stream event types) and
`updateDescription.disambiguatedPaths` (for update events).
#### `matchCollectionUUIDForUpdateLookup` (public)
The `matchCollectionUUIDForUpdateLookup` field can be used to ensure that "updateLookup" operations
are performed on the correct collection in case multiple collections with the same name have existed
over time.
This is relevant, because change streams can be opened retroactively on collections that were already
dropped and may have been recreated with the same name but different contents afterwards.
over time. This is relevant, because change streams can be opened retroactively on collections that
were already dropped and may have been recreated with the same name but different contents
afterwards.
The flag defaults to `false`. In this case, "updateLookup" operations will not verify that the
looked-up document is actually from the same collection "generation" as the change event the
document was looked up for.
If set to `true`, "updateLookup" operations will compare the collection UUID of the change event
with the UUID of the collection. If there is a UUID mismatch, the returned `fullDocument` field of
the event will be set to `null`.
document was looked up for. If set to `true`, "updateLookup" operations will compare the collection
UUID of the change event with the UUID of the collection. If there is a UUID mismatch, the returned
`fullDocument` field of the event will be set to `null`.
#### `allChangesForCluster` (public)
@ -539,29 +530,28 @@ automatically when opening an all-cluster change stream.
The `showSystemEvents` flag can be used to make change streams return events for collections inside
the `system` namespace. These are not emitted by default. Setting `showSystemEvents` to `true` will
also include events related to system collections in the change stream.
The flag defaults to `false` and is internal.
also include events related to system collections in the change stream. The flag defaults to `false`
and is internal.
#### `showMigrationEvents` (internal)
The `showMigrationEvents` flag can be used to make change streams return DML events that are
happening during chunk migrations. If set to `true`, insert and delete events related to chunk
migrations will be reported as if they were regular events.
The flag defaults to `false` and is internal.
migrations will be reported as if they were regular events. The flag defaults to `false` and is
internal.
#### `showCommitTimestamp` (internal)
The `showCommitTimestamp` flag can be used to include the transaction commit timestamp inside DML
events that were part of a prepared transaction.
The flag defaults to `true` and is internal. It is used by the resharding.
events that were part of a prepared transaction. The flag defaults to `true` and is internal. It is
used by the resharding.
#### `showRawUpdateDescription` (internal)
The `showRawUpdateDescription` flag can be used to make change streams emit the raw, internal format
used for "update" oplog entries.
If set to `true`, emitted change stream "update" events will contain a `rawUpdateDescription` field.
The default is `false`. In this case, emitted change stream "update" events will contain the regular
`updateDescription` field.
used for "update" oplog entries. If set to `true`, emitted change stream "update" events will
contain a `rawUpdateDescription` field. The default is `false`. In this case, emitted change stream
"update" events will contain the regular `updateDescription` field.
#### `allowToRunOnConfigDB` (internal)
@ -572,9 +562,9 @@ server to keep track of shard additions and removals in the deployment.
#### `$_passthroughToShard` (internal)
In sharded cluster deployments, all change streams are supposed to be opened on _mongos_. _mongos_
will open the required cursors to the data shards and the config server on the consumer's behalf.
If the consumer only wants to target a specific shard of the cluster, they can use the `$_passthroughToShard`
aggregation parameter to limit the change stream to a single shard.
will open the required cursors to the data shards and the config server on the consumer's behalf. If
the consumer only wants to target a specific shard of the cluster, they can use the
`$_passthroughToShard` aggregation parameter to limit the change stream to a single shard.
For example, to open a collection-level change stream targeting only one of the cluster's shards
(identified by the value in `shardId`), the following example code can be used:
@ -592,8 +582,8 @@ db.getSiblingDB("testDB").runCommand({
});
```
Using `$_passthroughToShard` will bypass the regular cluster shard targeting for change streams
and open a replica set change stream pipeline (only) on the targeted shard. The change events that
Using `$_passthroughToShard` will bypass the regular cluster shard targeting for change streams and
open a replica set change stream pipeline (only) on the targeted shard. The change events that
mongos retrieves from the single shard will be returned as is, without using a merge pipeline on
_mongos_.
@ -609,23 +599,26 @@ stream against a _mongos_ instance. The _mongos_ instance will then use the clus
information to open the cursors on the config server and the data shards on behalf of the consumer.
Because of the ordering guarantee provided by change streams, _mongos_ must wait until all cursors
have either responded with events, or ran into a timeout and reported that currently no more events
are available for them.
The latter is why change streams in a sharded cluster can have higher latency than change streams
in replica sets.
are available for them. The latter is why change streams in a sharded cluster can have higher
latency than change streams in replica sets.
For sharded cluster change streams, the merging of the multiple streams of change events from the
different cursors is performed by the [`AsyncResultsMerger`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/s/query/exec/async_results_merger.h#L100).
different cursors is performed by the
[`AsyncResultsMerger`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/s/query/exec/async_results_merger.h#L100).
## Change Stream Pipeline Building
A change stream pipeline issued by a consumer contains the `$changeStream` meta stage.
This stage is expanded internally into multiple `DocumentSource`s [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_pipeline_helpers.cpp#L171).
A change stream pipeline issued by a consumer contains the `$changeStream` meta stage. This stage is
expanded internally into multiple `DocumentSource`s
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_pipeline_helpers.cpp#L171).
The change stream `DocumentSource`s are located in the `src/mongo/db/pipeline` directory [here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline), among other `DocumentSource`s that
are not related to change streams.
The `DocumentSource`s are only used for pipeline building and optimization, but they are converted
into execution `Stage`s later when the change stream is executed.
These `Stage`s are located in the `src/mongo/db/exec/agg` directory [here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg).
The change stream `DocumentSource`s are located in the `src/mongo/db/pipeline` directory
[here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline),
among other `DocumentSource`s that are not related to change streams. The `DocumentSource`s are only
used for pipeline building and optimization, but they are converted into execution `Stage`s later
when the change stream is executed. These `Stage`s are located in the `src/mongo/db/exec/agg`
directory
[here](https://github.com/mongodb/mongo/tree/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg).
### Replica Set Pipelines
@ -634,13 +627,14 @@ On a replica set, the `$changeStream` stage is expanded into the following inter
- `$_internalChangeStreamOplogMatch`
- `$_internalChangeStreamUnwindTransaction`
- `$_internalChangeStreamTransform`
- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level change
streams)
- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level
change streams)
- `$_internalChangeStreamCheckResumability`
- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to `off`)
- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to
`off`)
- `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`)
- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token is
not a high water mark token)
- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token
is not a high water mark token)
- user-defined `$match` expression (only present if the user's change stream pipeline contains a
`$match` stage)
- user-defined `$project` expression (only present if the user's change stream pipeline contains a
@ -648,8 +642,8 @@ On a replica set, the `$changeStream` stage is expanded into the following inter
- `$_internalChangeStreamSplitLargeEvent` (only present if the change stream is opened with the
`$changeStreamSplitLargeEvent` pipeline step)
The change stream pipeline on replica sets will also contain a `$match` stage to filter out all non-DML
change events in case `showExpandedEvents` is not set.
The change stream pipeline on replica sets will also contain a `$match` stage to filter out all
non-DML change events in case `showExpandedEvents` is not set.
### Sharded Cluster Pipelines
@ -659,10 +653,11 @@ following internal stages:
- `$_internalChangeStreamOplogMatch`
- `$_internalChangeStreamUnwindTransaction`
- `$_internalChangeStreamTransform`
- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level change
streams)
- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level
change streams)
- `$_internalChangeStreamCheckResumability`
- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to `off`)
- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to
`off`)
- `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`)
- user-defined `$match` expression (only present if the user's change stream pipeline contains a
`$match` stage)
@ -674,8 +669,8 @@ following internal stages:
---
- `$_internalChangeStreamHandleTopologyChange`
- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token is
not a high water mark token)
- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token
is not a high water mark token)
Additionally, the change stream pipeline on a sharded cluster will contain a `$match` stage to
filter out all non-DML change events in case `showExpandedEvents` is not set.
@ -685,9 +680,9 @@ After building the initial pipeline stages, _mongos_ will split the pipeline int
- a part that is executed on data shards ("shard pipeline") and
- a part that is executed on _mongos_ ("merge pipeline").
The pipeline split point is above the `$_internalChangeStreamHandleTopologyChange` stage.
_mongos_ will also add a `$mergeCursors` stage that aggregates the responses from different shards
and the config server into a single, sorted stream.
The pipeline split point is above the `$_internalChangeStreamHandleTopologyChange` stage. _mongos_
will also add a `$mergeCursors` stage that aggregates the responses from different shards and the
config server into a single, sorted stream.
#### Data Shard Pipeline
@ -696,15 +691,16 @@ The shard pipeline will look like this:
- `$_internalChangeStreamOplogMatch`
- `$_internalChangeStreamUnwindTransaction`
- `$_internalChangeStreamTransform`
- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level change
streams)
- `$_internalChangeStreamCheckInvalidate` (only present for collection-level and database-level
change streams)
- `$_internalChangeStreamCheckResumability`
- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to `off`)
- `$_internalChangeStreamAddPreImage` (only present if `fullDocumentBeforeChange` is not set to
`off`)
- `$_internalChangeStreamAddPostImage` (only present if `fullDocument` is not set to `default`)
- user-defined `$match` expression (only present if the user's change stream pipeline contains a
`$match` stage)
- user-defined `$project` expression (only present if the change stream pipeline contains a `$project`
stage)
- user-defined `$project` expression (only present if the change stream pipeline contains a
`$project` stage)
- `$_internalChangeStreamSplitLargeEvent` (only present if the change stream is opened with the
`$changeStreamSplitLargeEvent` pipeline step)
@ -714,16 +710,18 @@ The merge pipeline on _mongos_ will look like this:
- `$mergeCursors`
- `$_internalChangeStreamHandleTopologyChange`
- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token is
not a high water mark token)
- `$_internalChangeStreamEnsureResumeTokenPresent` (only present if the change stream resume token
is not a high water mark token)
### Details of individual Pipeline Stages
#### `$_internalChangeStreamOplogMatch`
This stage is responsible for reading data from the oplog and filtering out irrelevant events.
The `DocumentSourceChangeStreamOplogMatch` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.h#L61).
The oplog filter for the stage is built [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.cpp#L79).
This stage is responsible for reading data from the oplog and filtering out irrelevant events. The
`DocumentSourceChangeStreamOplogMatch` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.h#L61).
The oplog filter for the stage is built
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_oplog_match.cpp#L79).
There is no `Stage` equivalent for `DocumentSourceChangeStreamOplogMatch`, as it will be turned into
a `$cursor` stage for execution.
@ -731,28 +729,35 @@ a `$cursor` stage for execution.
#### `$_internalChangeStreamUnwindTransaction`
This stage is responsible for "unwinding" (expanding) multiple operations that are contained in an
"applyOps" oplog entry into individual events.
The `DocumentSourceChangeStreamUnwindTransaction` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.h#L71).
The `ChangeStreamUnwindTransactionStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.cpp#L83).
"applyOps" oplog entry into individual events. The `DocumentSourceChangeStreamUnwindTransaction`
code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.h#L71).
The `ChangeStreamUnwindTransactionStage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_unwind_transaction.cpp#L83).
#### `$_internalChangeStreamTransform`
This stage is responsible for converting oplog entries into change events. It will build a change
event document for every oplog entry that enters this stage.
Event fields are added based on the change stream configuration.
The `DocumentSourceChangeStreamTransform` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_transform.h#L60).
The `ChangeStreamTransformStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75).
The actual event transformation happens inside `ChangeStreamDefaultEventTransformation` [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321).
event document for every oplog entry that enters this stage. Event fields are added based on the
change stream configuration. The `DocumentSourceChangeStreamTransform` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_transform.h#L60).
The `ChangeStreamTransformStage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_transform_stage.cpp#L75).
The actual event transformation happens inside `ChangeStreamDefaultEventTransformation`
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_event_transform.cpp#L321).
#### `$_internalChangeStreamCheckInvalidate`
This stage is responsible for creating change stream "invalidate" events and is only added for
collection-level and database-level change streams.
The `DocumentSourceChangeStreamCheckInvalidate` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_invalidate.h#L65).
The `ChangeStreamCheckInvalidate` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_invalidate_stage.cpp#L106).
collection-level and database-level change streams. The `DocumentSourceChangeStreamCheckInvalidate`
code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_invalidate.h#L65).
The `ChangeStreamCheckInvalidate` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_invalidate_stage.cpp#L106).
When an invalidate event is encountered, the stage will first emit an "invalidate" event, and then
throws a `ChangeStreamInvalidated` exception on the next call. The [`ChangeStreamInvalidatedInfo`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_invalidation_info.h#L47).
throws a `ChangeStreamInvalidated` exception on the next call. The
[`ChangeStreamInvalidatedInfo`](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/change_stream_invalidation_info.h#L47).
exception type contains the error code `ChangeStreamInvalidated`.
#### `$_internalChangeStreamCheckResumability`
@ -761,18 +766,22 @@ This stage checks if the oplog has enough history to resume the change stream, a
events up to the given resume point. If no data for the resume point can be found in the oplog
anymore, it will throw a `ChangeStreamHistoryLost` error.
The `DocumentSourceChangeStreamCheckResumability` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_resumability.h#L79).
The `ChangeStreamCheckResumabilityStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_resumability_stage.cpp#L68).
The `DocumentSourceChangeStreamCheckResumability` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_check_resumability.h#L79).
The `ChangeStreamCheckResumabilityStage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_check_resumability_stage.cpp#L68).
#### `$_internalChangeStreamAddPreImage`
This stage is responsible for adding pre-image data to "update", "replace" and "delete" events. It
is only added to change stream pipelines if the `fullDocumentBeforeChange` parameter is not set to
`off`.
If enabled, the stage relies on the pre-images stored in the system's pre-image system collection.
`off`. If enabled, the stage relies on the pre-images stored in the system's pre-image system
collection.
The `DocumentSourceChangeStreamAddPreImage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_pre_image.h#L67).
The `ChangeStreamAddPreImageStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_pre_image_stage.cpp#L67).
The `DocumentSourceChangeStreamAddPreImage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_pre_image.h#L67).
The `ChangeStreamAddPreImageStage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_pre_image_stage.cpp#L67).
#### `$_internalChangeStreamAddPostImage`
@ -780,23 +789,24 @@ This stage is responsible for adding post-image data to "update" events. It is o
stream pipelines if the `fullDocument` parameter is not set to `default`.
If `fullDocument` is set to `updateLookup`, the stage will perform a lookup for the current version
of a document that was updated by an "update" event, and store it in the `fullDocument` field of
the "update" event if present. The lookup is performed using the `_id` value of the document from
the change event. As the lookup is executed at a different point in time than when the change event
was recorded, it is possible that the lookup finds a different version of the document than the one
that was active when the change event was recorded. This can happen if the document was updated
again between the change event and the lookup. The lookup may also find no document at all if the
document was deleted after the "update" event, but before the lookup.
In case the lookup cannot find a document with the requested `_id`, it will populate the
`fullDocument` field with a value of `null`.
of a document that was updated by an "update" event, and store it in the `fullDocument` field of the
"update" event if present. The lookup is performed using the `_id` value of the document from the
change event. As the lookup is executed at a different point in time than when the change event was
recorded, it is possible that the lookup finds a different version of the document than the one that
was active when the change event was recorded. This can happen if the document was updated again
between the change event and the lookup. The lookup may also find no document at all if the document
was deleted after the "update" event, but before the lookup. In case the lookup cannot find a
document with the requested `_id`, it will populate the `fullDocument` field with a value of `null`.
If `fullDocument` is set to `whenAvailable` or `required`, the stage will make use of the stored
pre-image of the document in the system's pre-image system collection. It will fetch the pre-image
and then apply the delta that is stored in the "update" change event on top of it, and store the
result in the `fullDocument` field.
The `DocumentSourceChangeStreamAddPostImage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_post_image.h#L63).
The `ChangeStreamAddPostImageStage` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_post_image_stage.cpp#L84).
The `DocumentSourceChangeStreamAddPostImage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_add_post_image.h#L63).
The `ChangeStreamAddPostImageStage` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_add_post_image_stage.cpp#L84).
#### `$_internalChangeStreamEnsureResumeTokenPresent`
@ -805,18 +815,22 @@ the change stream parameters is actually in the stream. The stage is only presen
stream resume token is not a high water mark token. If the resume token cannot be found in the
stream, it will throw a `ChangeStreamFatalError`.
The `DocumentSourceChangeStreamEnsureResumeTokenPresent` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_ensure_resume_token_present.h#L51).
The `ChangeStreamEnsureResumeTokenPresent` code is [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_ensure_resume_token_present_stage.cpp#L67).
The `DocumentSourceChangeStreamEnsureResumeTokenPresent` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_ensure_resume_token_present.h#L51).
The `ChangeStreamEnsureResumeTokenPresent` code is
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_ensure_resume_token_present_stage.cpp#L67).
#### `$_internalChangeStreamHandleTopologyChange`
This stage is only present in sharded cluster change streams and is always part of the _mongos_
merge pipeline. The stage is responsible for opening additional cursors to shards that have been
added to the cluster. It will handle "insert" events into the `config.shards` collection that
were observed from the config server.
added to the cluster. It will handle "insert" events into the `config.shards` collection that were
observed from the config server.
The `DocumentSourceChangeStreamHandleTopologyChange` code can be found [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_handle_topology_change.h#L63).
The `ChangeStreamHandleTopologyChangeStage` code can be found [here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_handle_topology_change_stage.cpp#L121).
The `DocumentSourceChangeStreamHandleTopologyChange` code can be found
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/pipeline/document_source_change_stream_handle_topology_change.h#L63).
The `ChangeStreamHandleTopologyChangeStage` code can be found
[here](https://github.com/mongodb/mongo/blob/eb4c6148f6a25c444be39a0e330506834526d935/src/mongo/db/exec/agg/change_stream_handle_topology_change_stage.cpp#L121).
## Missing documentation (to be completed)

View File

@ -1,75 +1,70 @@
# Command Dispatch
Command dispatch refers to the general process by which client requests are
taken from the network, parsed, sanitized, then finally run on databases.
Command dispatch refers to the general process by which client requests are taken from the network,
parsed, sanitized, then finally run on databases.
## Service Entry Points
[Service entry points][service_entry_point_h] fulfill the transition from the
transport layer into command implementations. For each incoming connection
from a client (in the form of a [session][session_h] object), a new dedicated
thread is spawned then detached, and is also assigned a new [session workflow]
[session_workflow_h], responsible for maintaining the workflow of a
single client connection during its lifetime. Central to the entry point is the
`handleRequest()` function, which manages the server-side logic of processing
requests and returns a response message indicating the result of the
corresponding request message. This function is currently implemented by several
subclasses of the parent `ServiceEntryPoint` in order to account for the
differences in processing requests between the shard and router roles -- these
distinctions are reflected in the `ServiceEntryPointRouterRole` and
`ServiceEntryPointShardRole` subclasses (see [here][service_entry_point_router_role_h]
and [here][service_entry_point_shard_role.h]).
[Service entry points][service_entry_point_h] fulfill the transition from the transport layer into
command implementations. For each incoming connection from a client (in the form of a
[session][session_h] object), a new dedicated thread is spawned then detached, and is also assigned
a new [session workflow] [session_workflow_h], responsible for maintaining the workflow of a single
client connection during its lifetime. Central to the entry point is the `handleRequest()` function,
which manages the server-side logic of processing requests and returns a response message indicating
the result of the corresponding request message. This function is currently implemented by several
subclasses of the parent `ServiceEntryPoint` in order to account for the differences in processing
requests between the shard and router roles -- these distinctions are reflected in the
`ServiceEntryPointRouterRole` and `ServiceEntryPointShardRole` subclasses (see
[here][service_entry_point_router_role_h] and [here][service_entry_point_shard_role.h]).
## Strategy
One area in which the _mongos_ entry point differs from its _mongod_ counterpart
is in its usage of the [Strategy class][strategy_h]. `Strategy` operates as a
legacy interface for processing client read, write, and command requests; there
is a near 1-to-1 mapping between its constituent functions and request types
(e.g. `writeOp()` for handling write operation requests, `getMore()` for a
getMore request, etc.). These functions comprise the backbone of the _mongos_
entry point's `handleRequest()` -- that is to say, when a valid request is
received, it is sieved and ultimately passed along to the appropriate Strategy
class member function. The significance of using the Strategy class specifically
with the _mongos_ entry point is that it [facilitates query routing to
shards][mongos_router] in _addition_ to running queries against targeted
databases (see [s/transaction_router.h][transaction_router_h] for finer
details).
One area in which the _mongos_ entry point differs from its _mongod_ counterpart is in its usage of
the [Strategy class][strategy_h]. `Strategy` operates as a legacy interface for processing client
read, write, and command requests; there is a near 1-to-1 mapping between its constituent functions
and request types (e.g. `writeOp()` for handling write operation requests, `getMore()` for a getMore
request, etc.). These functions comprise the backbone of the _mongos_ entry point's
`handleRequest()` -- that is to say, when a valid request is received, it is sieved and ultimately
passed along to the appropriate Strategy class member function. The significance of using the
Strategy class specifically with the _mongos_ entry point is that it [facilitates query routing to
shards][mongos_router] in _addition_ to running queries against targeted databases (see
[s/transaction_router.h][transaction_router_h] for finer details).
## Commands
The [Command class][commands_h] serves as a means of cataloging a server command
as well as ascribing various attributes and behaviors to commands via the [type
system][template_method_pattern], that will likely be used during the lifespan
of a particular server. Construction of a Command should only occur during
server startup. When a new Command is constructed, that Command is stored in a
global `CommandRegistry` object for future reference. There are two kinds of
Command subclasses: `BasicCommand` and `TypedCommand`.
The [Command class][commands_h] serves as a means of cataloging a server command as well as
ascribing various attributes and behaviors to commands via the [type
system][template_method_pattern], that will likely be used during the lifespan of a particular
server. Construction of a Command should only occur during server startup. When a new Command is
constructed, that Command is stored in a global `CommandRegistry` object for future reference. There
are two kinds of Command subclasses: `BasicCommand` and `TypedCommand`.
A major distinction between the two is in their implementation of the `parse()`
member function. `parse()` takes in a request and returns a handle to a single
invocation of a particular Command (represented by a `CommandInvocation`), that
can then be used to run the Command. The `BasicCommand::parse()` is a naive
implementation that merely forwards incoming requests to the Invocation and
makes sure that the Command does not support document sequences. The
implementation of `TypedCommand::parse()`, on the other hand, varies depending
on the Request type parameter the Command takes in. Since the `TypedCommand`
accepts requests generated by IDL, the parsing function associated with a usable
Request type must allow it to be parsed as an IDL command. In handling requests,
both the _mongos_ and _mongod_ entry points interact with the Command subclasses
through the `CommandHelpers` struct in order to parse requests and ultimately
run them as Commands.
A major distinction between the two is in their implementation of the `parse()` member function.
`parse()` takes in a request and returns a handle to a single invocation of a particular Command
(represented by a `CommandInvocation`), that can then be used to run the Command. The
`BasicCommand::parse()` is a naive implementation that merely forwards incoming requests to the
Invocation and makes sure that the Command does not support document sequences. The implementation
of `TypedCommand::parse()`, on the other hand, varies depending on the Request type parameter the
Command takes in. Since the `TypedCommand` accepts requests generated by IDL, the parsing function
associated with a usable Request type must allow it to be parsed as an IDL command. In handling
requests, both the _mongos_ and _mongod_ entry points interact with the Command subclasses through
the `CommandHelpers` struct in order to parse requests and ultimately run them as Commands.
## Admission control
To ensure stability of our servers, we have implemented different admission control mechanisms to prevent data-nodes from becoming overloaded with operations. When implementing a new command, it's important to decide whether the command will be subject to one of the admission controls in place and understand the resulting outcomes.
To ensure stability of our servers, we have implemented different admission control mechanisms to
prevent data-nodes from becoming overloaded with operations. When implementing a new command, it's
important to decide whether the command will be subject to one of the admission controls in place
and understand the resulting outcomes.
For example, user commands may be subject to Ingress Admission Control, which happens in the [ServiceEntryPoint][IngressControl].
For information on admission control and how to implement admission control into a new command, please see [Admission Control README][ACReadMe]
For example, user commands may be subject to Ingress Admission Control, which happens in the
[ServiceEntryPoint][IngressControl]. For information on admission control and how to implement
admission control into a new command, please see [Admission Control README][ACReadMe]
## See Also
For details on transport internals, including ingress networking, see [this document][transport_internals].
For details on transport internals, including ingress networking, see [this
document][transport_internals].
[service_entry_point_h]: ../src/mongo/transport/service_entry_point.h
[session_h]: ../src/mongo/transport/session.h
@ -85,4 +80,5 @@ For details on transport internals, including ingress networking, see [this docu
[template_method_pattern]: https://en.wikipedia.org/wiki/Template_method_pattern
[transport_internals]: ../src/mongo/transport/README.md
[ACReadMe]: ../src/mongo/db/admission/README.md
[IngressControl]: https://github.com/mongodb/mongo/blob/a86c7f5de2a5de4d2f49e40e8970754ec6a5ba6c/src/mongo/db/service_entry_point_shard_role.cpp#L1803
[IngressControl]:
https://github.com/mongodb/mongo/blob/a86c7f5de2a5de4d2f49e40e8970754ec6a5ba6c/src/mongo/db/service_entry_point_shard_role.cpp#L1803

View File

@ -14,9 +14,9 @@ dynamically extensible.
A `ServiceContext` represents all of the state of a single Mongo server process, which may be either
a `mongod` or a `mongos`. It creates and manages the previously mentioned `Client`s and
`OperationContext`s, as well as a `TransportLayer` for performing network operations, a
`PeriodicRunner` for running housekeeping tasks periodically, a `StorageEngine` for interacting
with the actual database itself, and a set of time sources. In general, every Mongo server process
has a single `ServiceContext`, known as the _global_ `ServiceContext`. Typical uses of the global
`PeriodicRunner` for running housekeeping tasks periodically, a `StorageEngine` for interacting with
the actual database itself, and a set of time sources. In general, every Mongo server process has a
single `ServiceContext`, known as the _global_ `ServiceContext`. Typical uses of the global
`ServiceContext` outside of server initialization and shutdown include looking up `Client` or
`OperationContext` information for a particular thread or operation, or killing one or more running
operations during, e.g., a primary replica step-down. The global `ServiceContext` is created during
@ -28,16 +28,16 @@ The `ServiceContext` associated with a given `Client` object can be fetched in a
using [`Client::getServiceContext()`][client-get-service-context-url] when possible. As of time of
writing, every server process only maintains a single `ServiceContext`, but preferring
`Client::getServiceContext()` or `ServiceContext::getCurrentServiceContext()` over
[`ServiceContext::getGlobalServiceContext()`][get-global-service-context-url] will allow us to
more easily maintain multiple `ServiceContext`s per server process if desired in the future.
[`ServiceContext::getGlobalServiceContext()`][get-global-service-context-url] will allow us to more
easily maintain multiple `ServiceContext`s per server process if desired in the future.
## [`Client`][client-url]
Each logical connection to a Mongo service is managed by a `Client` object, where a logical
connection may be a user or an internal process that needs to run a command or query on the database.
Construction of a `Client` object is typically performed with a call to `makeClient` on the global
`ServiceContext`, which can then be attached to any thread of execution, or with a call to
[`Client::initThread`][client-init-thread-url] which constructs a `Client` on the global
connection may be a user or an internal process that needs to run a command or query on the
database. Construction of a `Client` object is typically performed with a call to `makeClient` on
the global `ServiceContext`, which can then be attached to any thread of execution, or with a call
to [`Client::initThread`][client-init-thread-url] which constructs a `Client` on the global
`ServiceContext` and binds it to the current thread. All operations executed by the `Client` will
take place on that `Client`s associated thread serially over the network connection managed by the
`Session` object that was passed into the `Client`s constructor. If no `Session` is passed to the
@ -70,13 +70,13 @@ operations. The semantics of the `Client` lock are summarized in the table below
[`Client::cc()`][client-cc-url] may be used to get the `Client` object associated with the currently
executing thread. Prefer passing `Client` objects as parameters over calls to `Client::cc()` when
possible. A [`ThreadClient`][thread-client-url] is an RAII-style class which may be used to construct
and bind a `Client` to the current running thread and automatically unbind it once the `ThreadClient`
goes out of scope. An [`AlternativeClientRegion`][acr-url] is another RAII-style class which may be
used to temporarily bind a `Client` object to the currently running thread (holding any currently
bound `Client` in reserve), rebinding the current threads old `Client` to the current thread upon
falling out of scope. [`ClientStrand`][client-strand-url] functions similarly, but also provides an
`Executor` interface for binding a `Client` to an arbitrary thread.
possible. A [`ThreadClient`][thread-client-url] is an RAII-style class which may be used to
construct and bind a `Client` to the current running thread and automatically unbind it once the
`ThreadClient` goes out of scope. An [`AlternativeClientRegion`][acr-url] is another RAII-style
class which may be used to temporarily bind a `Client` object to the currently running thread
(holding any currently bound `Client` in reserve), rebinding the current threads old `Client` to
the current thread upon falling out of scope. [`ClientStrand`][client-strand-url] functions
similarly, but also provides an `Executor` interface for binding a `Client` to an arbitrary thread.
## [`OperationContext`][operation-context-url]
@ -92,23 +92,37 @@ performed asynchronously.
### Interruptibility
`OperationContext`s implement the [`Interruptible`][interruptible-url] interface, which allows them to
be killed by their associated `Client`s (or, by proxy, their owning `ServiceContext`). See
[this comment block][opctx-interruptible-comment-block-url] for more details on when and how
`OperationContext`s implement the [`Interruptible`][interruptible-url] interface, which allows them
to be killed by their associated `Client`s (or, by proxy, their owning `ServiceContext`). See [this
comment block][opctx-interruptible-comment-block-url] for more details on when and how
`OperationContext`s are interrupted.
[service-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L141
[decorable-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/decorable.h
[client-get-service-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L117
[get-global-service-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L755
[client-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h
[client-init-thread-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L75
[client-cc-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L372
[thread-client-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L320
[acr-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L347
[client-strand-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client_strand.h
[operation-context-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.h
[service-context-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L141
[decorable-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/decorable.h
[client-get-service-context-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L117
[get-global-service-context-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/service_context.h#L755
[client-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h
[client-init-thread-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L75
[client-cc-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L372
[thread-client-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L320
[acr-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client.h#L347
[client-strand-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/client_strand.h
[operation-context-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.h
[kill-op-url]: https://docs.mongodb.com/manual/reference/command/killOp/
[baton-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/baton.h
[interruptible-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/interruptible.h
[opctx-interruptible-comment-block-url]: https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.cpp#L281
[baton-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/baton.h
[interruptible-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/util/interruptible.h
[opctx-interruptible-comment-block-url]:
https://github.com/mongodb/mongo/blob/ecc6179c18ed1e3b38d7ee244319210b18e24bad/src/mongo/db/operation_context.cpp#L281

File diff suppressed because it is too large Load Diff

View File

@ -4,8 +4,10 @@
**👉 Please visit the new [Dev Container Documentation](./devcontainer/README.md) for:**
- 📖 [**Getting Started Guide**](./devcontainer/getting-started.md) - Step-by-step setup instructions
- 🏗️ [**Architecture & Technical Details**](./devcontainer/architecture.md) - How everything works under the hood
- 📖 [**Getting Started Guide**](./devcontainer/getting-started.md) - Step-by-step setup
instructions
- 🏗️ [**Architecture & Technical Details**](./devcontainer/architecture.md) - How everything works
under the hood
- 🔧 [**Troubleshooting Guide**](./devcontainer/troubleshooting.md) - Solutions to common issues
- 💡 [**Advanced Usage**](./devcontainer/advanced.md) - Customization and power user features
- ❓ [**FAQ**](./devcontainer/faq.md) - Frequently asked questions

View File

@ -1,10 +1,12 @@
# MongoDB Development with Dev Containers
**⚠️ BETA:** The devcontainer setup is currently in Beta stage. Please report issues and feedback to the team.
**⚠️ BETA:** The devcontainer setup is currently in Beta stage. Please report issues and feedback to
the team.
## 📚 Documentation Index
This is the comprehensive guide for developing MongoDB using Dev Containers. Choose the guide that best fits your needs:
This is the comprehensive guide for developing MongoDB using Dev Containers. Choose the guide that
best fits your needs:
### 🚀 [Getting Started](./getting-started.md)
@ -80,7 +82,8 @@ This is the comprehensive guide for developing MongoDB using Dev Containers. Cho
## What are Dev Containers?
Dev Containers provide a consistent, reproducible development environment using Docker containers. This ensures:
Dev Containers provide a consistent, reproducible development environment using Docker containers.
This ensures:
- ✅ **Consistency**: Everyone works with identical tooling and dependencies
- ✅ **Isolation**: Your host system stays clean

View File

@ -1,8 +1,10 @@
# Advanced Dev Container Usage
This guide covers advanced workflows and power user features for managing multiple containers, backups, and complex development scenarios.
This guide covers advanced workflows and power user features for managing multiple containers,
backups, and complex development scenarios.
**Looking to customize your devcontainer?** See the [Customization Guide](./customization.md) for dotfiles, VS Code settings, extensions, and performance tuning.
**Looking to customize your devcontainer?** See the [Customization Guide](./customization.md) for
dotfiles, VS Code settings, extensions, and performance tuning.
## Table of Contents

View File

@ -1,6 +1,7 @@
# Dev Container Architecture
This document provides a deep dive into how the MongoDB devcontainer is structured and how all the pieces work together.
This document provides a deep dive into how the MongoDB devcontainer is structured and how all the
pieces work together.
## Table of Contents
@ -201,7 +202,8 @@ MongoDB requires specific compiler versions. The toolchain installation process
### Toolchain Configuration
The `toolchain_config.env` file contains architecture-specific toolchain definitions for both ARM64 and AMD64:
The `toolchain_config.env` file contains architecture-specific toolchain definitions for both ARM64
and AMD64:
```bash
# Generated by toolchain.py
@ -289,7 +291,8 @@ The MongoDB toolchain includes:
### Toolchain Updates
The toolchain is managed by the MongoDB team. When updates are available, you'll get them automatically when you:
The toolchain is managed by the MongoDB team. When updates are available, you'll get them
automatically when you:
- Pull the latest changes from the repository
- Rebuild your devcontainer

View File

@ -1,10 +1,14 @@
# Customizing Your Dev Container
This guide covers personal customizations you can make to your MongoDB devcontainer **without modifying the repository's devcontainer configuration**. These are user-level settings that only affect your development environment.
This guide covers personal customizations you can make to your MongoDB devcontainer **without
modifying the repository's devcontainer configuration**. These are user-level settings that only
affect your development environment.
**Want to modify the devcontainer setup for everyone?** See [Contributing Customizations](#contributing-customizations) at the bottom.
**Want to modify the devcontainer setup for everyone?** See
[Contributing Customizations](#contributing-customizations) at the bottom.
**For general VS Code settings** (themes, fonts, keybindings), see the [VS Code documentation](https://code.visualstudio.com/docs/getstarted/settings).
**For general VS Code settings** (themes, fonts, keybindings), see the
[VS Code documentation](https://code.visualstudio.com/docs/getstarted/settings).
## Table of Contents
@ -76,7 +80,9 @@ This applies to all devcontainers you work with, not just MongoDB.
## Contributing Customizations
The customizations above are all user-level and don't require changes to the repository. If you want to modify the devcontainer setup itself to benefit all MongoDB developers, you'll need to submit a PR.
The customizations above are all user-level and don't require changes to the repository. If you want
to modify the devcontainer setup itself to benefit all MongoDB developers, you'll need to submit a
PR.
**Examples of repository-level customizations:**
@ -108,4 +114,5 @@ The customizations above are all user-level and don't require changes to the rep
- [Architecture](./architecture.md) - How devcontainers work
- [Advanced Usage](./advanced.md) - Multiple containers, backups, workflows
- [Troubleshooting](./troubleshooting.md) - Fix issues
- [VS Code Dev Containers Documentation](https://code.visualstudio.com/docs/devcontainers/containers) - General VS Code features
- [VS Code Dev Containers Documentation](https://code.visualstudio.com/docs/devcontainers/containers) -
General VS Code features

View File

@ -6,14 +6,16 @@ Frequently asked questions about MongoDB development with dev containers.
### What is a dev container?
A dev container (development container) is a Docker container configured specifically for development. It includes:
A dev container (development container) is a Docker container configured specifically for
development. It includes:
- All build tools and dependencies
- IDE configuration and extensions
- Persistent storage for caches and settings
- Consistent environment across all developers
Think of it as a portable, reproducible development environment that runs on any machine with Docker.
Think of it as a portable, reproducible development environment that runs on any machine with
Docker.
[Learn more about dev containers →](https://containers.dev/)
@ -43,11 +45,14 @@ Report issues to help improve it for everyone!
- Pros: Works without SSH keys, simpler for read-only access
- Cons: May require password/token for push operations
See the [Getting Started guide SSH setup section](./getting-started.md#4-configure-ssh-keys-recommended) for details.
See the
[Getting Started guide SSH setup section](./getting-started.md#4-configure-ssh-keys-recommended) for
details.
### How do SSH keys work with devcontainers?
VS Code automatically forwards your SSH agent to the container, so you don't need to copy keys into the container.
VS Code automatically forwards your SSH agent to the container, so you don't need to copy keys into
the container.
**Requirements:**
@ -65,7 +70,8 @@ ssh-add -l
ssh -T git@github.com
```
**Inside the container**, Git commands will automatically use your host's SSH keys through agent forwarding.
**Inside the container**, Git commands will automatically use your host's SSH keys through agent
forwarding.
[Learn more about SSH agent forwarding →](https://code.visualstudio.com/remote/advancedcontainers/sharing-git-credentials)
@ -126,7 +132,8 @@ First-time setup includes:
- WSL2 installed and configured
- Docker Desktop with WSL2 integration enabled
**Important:** Clone repository in WSL2 filesystem (not `/mnt/c/`), not Windows filesystem, for best performance.
**Important:** Clone repository in WSL2 filesystem (not `/mnt/c/`), not Windows filesystem, for best
performance.
### Can I use this on Apple Silicon (M1/M2/M3)?
@ -161,7 +168,8 @@ docker cp <container_id>:/workspaces/mongo/file.txt ~/Downloads/
**Option 3: Use bind mount** (sacrifices performance)
Open your existing local repository in VS Code and use "Dev Containers: Reopen in Container". This uses a bind mount which allows direct host filesystem access but is slower, especially on macOS.
Open your existing local repository in VS Code and use "Dev Containers: Reopen in Container". This
uses a bind mount which allows direct host filesystem access but is slower, especially on macOS.
### Can I use my existing local clone?
@ -369,8 +377,7 @@ gcc --version # Should show the MongoDB toolchain GCC version
ls -la ~/.config/engflow_auth/
```
**Re-authenticate:**
Contact MongoDB team for authentication flow.
**Re-authenticate:** Contact MongoDB team for authentication flow.
**Build locally instead:**
@ -406,13 +413,15 @@ Allocate as much disk space as you can comfortably spare. We recommend at least
**Allocate as much as possible** while leaving enough for your host OS to function (~4-8 GB).
More RAM = faster builds with more parallel jobs. MongoDB builds are resource-intensive and benefit greatly from additional memory.
More RAM = faster builds with more parallel jobs. MongoDB builds are resource-intensive and benefit
greatly from additional memory.
### How many CPU cores should I allocate?
**Allocate as many cores as possible** while leaving a couple for your host OS (1-2 cores).
Bazel parallelizes well; more cores = significantly faster builds. If you have 8+ cores available, MongoDB builds will complete much faster.
Bazel parallelizes well; more cores = significantly faster builds. If you have 8+ cores available,
MongoDB builds will complete much faster.
### Can I reduce resource usage?
@ -437,7 +446,8 @@ bazel clean # Clear build outputs
bazel clean --expunge # Clear everything (reclaim disk space)
```
> **Note:** Reducing resources will make builds slower. If possible, it's better to allocate more resources to Docker instead.
> **Note:** Reducing resources will make builds slower. If possible, it's better to allocate more
> resources to Docker instead.
### How do I monitor resource usage?
@ -492,7 +502,8 @@ But you lose VS Code integration, extensions, and convenience features.
- **Architecture Details**: [architecture.md](./architecture.md)
- **Troubleshooting**: [troubleshooting.md](./troubleshooting.md)
- **Advanced Topics**: [advanced.md](./advanced.md)
- **VS Code Docs**: [code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers)
- **VS Code Docs**:
[code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers)
### Who do I contact for help?

View File

@ -1,16 +1,19 @@
# Getting Started with MongoDB Dev Containers
This guide will walk you through setting up your MongoDB development environment using Dev Containers.
This guide will walk you through setting up your MongoDB development environment using Dev
Containers.
## Prerequisites
### 1. Install Docker
Dev Containers require Docker to be installed and running on your system. Choose one of the following Docker providers:
Dev Containers require Docker to be installed and running on your system. Choose one of the
following Docker providers:
#### Option A: Rancher Desktop (Recommended)
[Rancher Desktop](https://rancherdesktop.io/) is our recommended Docker provider for devcontainer development.
[Rancher Desktop](https://rancherdesktop.io/) is our recommended Docker provider for devcontainer
development.
**Installation:**
@ -20,28 +23,34 @@ Dev Containers require Docker to be installed and running on your system. Choose
- **Container Engine**: Select `dockerd (moby)` ⚠️ **Important!**
- **Configure Path**: Select "Automatic"
**Recommended Settings:**
After installation, increase resources for better build performance:
**Recommended Settings:** After installation, increase resources for better build performance:
1. Open Rancher Desktop → Preferences → Virtual Machine
2. **Memory**: Allocate as much as your system allows (leave ~4-8 GB for your host OS)
3. **CPUs**: Allocate as many cores as possible (leave 1-2 for your host OS)
4. **Disk**: Rancher Desktop doesn't have a UI for disk size. To increase it, see [Troubleshooting - Increase Docker disk allocation](./troubleshooting.md#build-fails-with-no-space-left-on-device) for instructions.
4. **Disk**: Rancher Desktop doesn't have a UI for disk size. To increase it, see
[Troubleshooting - Increase Docker disk allocation](./troubleshooting.md#build-fails-with-no-space-left-on-device)
for instructions.
5. Apply changes and restart Rancher Desktop
> **Tip:** More resources = faster builds. MongoDB builds benefit significantly from additional CPU cores and memory.
> **Tip:** More resources = faster builds. MongoDB builds benefit significantly from additional CPU
> cores and memory.
**IMPORTANT!**: If you already have VSCode open when you install Rancher Desktop, make sure to restart VSCode otherwise it may not find the Docker socket and VSCode will prompt you to install Docker Desktop instead.
**IMPORTANT!**: If you already have VSCode open when you install Rancher Desktop, make sure to
restart VSCode otherwise it may not find the Docker socket and VSCode will prompt you to install
Docker Desktop instead.
#### Option B: Docker Desktop
[Docker Desktop](https://www.docker.com/products/docker-desktop/) is a popular alternative.
> **Note on Licensing**: Docker Desktop may require a paid license for commercial use. Please review the licensing terms to ensure compliance with your use case.
> **Note on Licensing**: Docker Desktop may require a paid license for commercial use. Please review
> the licensing terms to ensure compliance with your use case.
**Installation:**
1. Download from [docker.com/products/docker-desktop](https://www.docker.com/products/docker-desktop/)
1. Download from
[docker.com/products/docker-desktop](https://www.docker.com/products/docker-desktop/)
2. Install and start Docker Desktop
3. Go to Settings → Resources and allocate generously:
- **Memory**: Allocate as much as possible (leave ~4-8 GB for your host OS)
@ -52,7 +61,8 @@ After installation, increase resources for better build performance:
[OrbStack](https://orbstack.dev/) is a lightweight, fast Docker alternative for macOS.
> **Note on Licensing**: OrbStack may require a paid license for commercial use. Please review the licensing terms to ensure compliance with your use case.
> **Note on Licensing**: OrbStack may require a paid license for commercial use. Please review the
> licensing terms to ensure compliance with your use case.
**Installation:**
@ -64,12 +74,14 @@ After installation, increase resources for better build performance:
For Linux users, you can use Docker Engine directly.
**Installation:**
Follow the official guide: [docs.docker.com/engine/install](https://docs.docker.com/engine/install/)
**Installation:** Follow the official guide:
[docs.docker.com/engine/install](https://docs.docker.com/engine/install/)
### 2. Create SSH Directory (Required)
> **⚠️ Critical:** You **must** have a `~/.ssh` directory on your host machine before building the devcontainer. The devcontainer requires this directory to exist, regardless of whether you use SSH or HTTPS to clone the repository.
> **⚠️ Critical:** You **must** have a `~/.ssh` directory on your host machine before building the
> devcontainer. The devcontainer requires this directory to exist, regardless of whether you use SSH
> or HTTPS to clone the repository.
```bash
# On your HOST machine (not inside the container)
@ -87,13 +99,17 @@ Download and install VS Code from [code.visualstudio.com](https://code.visualstu
1. Open VS Code
2. Go to Extensions (⌘/Ctrl+Shift+X)
3. Search for "Dev Containers"
4. Install the [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension by Microsoft
4. Install the
[Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
extension by Microsoft
### 5. Configure SSH Keys (Recommended)
To clone the repository using SSH (recommended for contributors), you'll need SSH keys configured with GitHub.
To clone the repository using SSH (recommended for contributors), you'll need SSH keys configured
with GitHub.
> **⚠️ Important:** Run all commands in this section on your **host machine** (not inside the container). SSH keys need to be set up before cloning the repository into the container.
> **⚠️ Important:** Run all commands in this section on your **host machine** (not inside the
> container). SSH keys need to be set up before cloning the repository into the container.
#### Check if you have SSH keys
@ -183,7 +199,8 @@ Get-Service ssh-agent | Set-Service -StartupType Automatic
Start-Service ssh-agent
```
> **Note:** VS Code automatically forwards your SSH agent to the container, so your keys will be available inside the devcontainer.
> **Note:** VS Code automatically forwards your SSH agent to the container, so your keys will be
> available inside the devcontainer.
[Learn more about using SSH keys with GitHub →](https://docs.github.com/en/authentication/connecting-to-github-with-ssh)
@ -191,7 +208,8 @@ Start-Service ssh-agent
### Step 1: Clone Repository in Named Container Volume
For **optimal performance**, especially on macOS, clone the repository directly into a Docker volume rather than your local filesystem. This is crucial for Bazel performance.
For **optimal performance**, especially on macOS, clone the repository directly into a Docker volume
rather than your local filesystem. This is crucial for Bazel performance.
#### Why Named Volumes?
@ -397,7 +415,8 @@ ssh-add ~/.ssh/id_ed25519
# Command Palette → "Dev Containers: Rebuild Container"
```
**VS Code SSH Agent Forwarding**: The Dev Containers extension automatically forwards your SSH agent, but this requires:
**VS Code SSH Agent Forwarding**: The Dev Containers extension automatically forwards your SSH
agent, but this requires:
- SSH agent running on host with keys loaded
- SSH key files in default location (`~/.ssh/`)

View File

@ -28,7 +28,8 @@ Docker version <version> or later is required
**Solution**
Restart VSCode. If you install Rancher Desktop while you already have VSCode open, it doesn't properly detect the Docker socket and prompts you to install Docker Desktop by mistake.
Restart VSCode. If you install Rancher Desktop while you already have VSCode open, it doesn't
properly detect the Docker socket and prompts you to install Docker Desktop by mistake.
## Container Build Issues
@ -48,7 +49,9 @@ Error response from daemon: invalid mount config for type "bind": bind source pa
**Root Cause:**
The devcontainer configuration mounts your `~/.ssh` directory to enable Git operations over SSH. If this directory doesn't exist on your host machine, the container fails to start. **This directory is required even if you plan to use HTTPS instead of SSH for cloning.**
The devcontainer configuration mounts your `~/.ssh` directory to enable Git operations over SSH. If
this directory doesn't exist on your host machine, the container fails to start. **This directory is
required even if you plan to use HTTPS instead of SSH for cloning.**
**Solutions:**
@ -73,7 +76,8 @@ SSH agent forwarding behavior varies by Docker provider on macOS:
- With dockerd runtime: Automatic agent forwarding
- With containerd runtime: Agent forwarding requires additional setup
To use SSH agent forwarding, ensure your SSH keys are added to your host's SSH agent before starting the container:
To use SSH agent forwarding, ensure your SSH keys are added to your host's SSH agent before starting
the container:
```bash
ssh-add ~/.ssh/id_ed25519 # or your key name
@ -117,7 +121,8 @@ Error: failed to solve: write /var/lib/docker/...: no space left on device
disk: 100GB
```
4. Start Rancher Desktop
5. If Rancher Desktop was previously initialized, you may need to perform a factory reset (Preferences → Troubleshooting → Reset Kubernetes) for the disk size change to take effect.
5. If Rancher Desktop was previously initialized, you may need to perform a factory reset
(Preferences → Troubleshooting → Reset Kubernetes) for the disk size change to take effect.
**On Windows (WSL2):**
@ -125,7 +130,8 @@ Error: failed to solve: write /var/lib/docker/...: no space left on device
1. Stop Rancher Desktop
2. Run: `wsl --shutdown`
3. Follow Microsoft's guide to increase WSL2 disk size: https://learn.microsoft.com/en-us/windows/wsl/disk-space
3. Follow Microsoft's guide to increase WSL2 disk size:
https://learn.microsoft.com/en-us/windows/wsl/disk-space
**Docker Desktop:**
@ -174,7 +180,8 @@ Error: Failed to download toolchain
curl -I "$(grep TOOLCHAIN_URL .devcontainer/toolchain_config.env | cut -d'"' -f2)"
```
3. **If toolchain URL is broken**, report it to the MongoDB team. This is a devcontainer configuration issue that needs to be fixed upstream.
3. **If toolchain URL is broken**, report it to the MongoDB team. This is a devcontainer
configuration issue that needs to be fixed upstream.
### Build Fails with Checksum Mismatch
@ -203,7 +210,8 @@ Got: def456...
# Command Palette → "Dev Containers: Rebuild Container Without Cache"
```
3. **If problem persists**, this is likely a devcontainer configuration issue - report it to the MongoDB team.
3. **If problem persists**, this is likely a devcontainer configuration issue - report it to the
MongoDB team.
### Container Fails to Start
@ -288,11 +296,9 @@ Got: def456...
- File save is delayed
- Terminal autocomplete is slow
**Root Cause:**
Bind mounts on macOS use osxfs which has high latency for filesystem operations.
**Root Cause:** Bind mounts on macOS use osxfs which has high latency for filesystem operations.
**Solution:**
**Use named volumes instead of bind mounts** (see Getting Started guide)
**Solution:** ✅ **Use named volumes instead of bind mounts** (see Getting Started guide)
### High CPU Usage
@ -517,7 +523,8 @@ fatal: Could not read from remote repository.
ssh-add ~/.ssh/id_ed25519 # or id_rsa
```
See [Getting Started - SSH Setup](./getting-started.md#4-configure-ssh-keys-recommended) for detailed instructions.
See [Getting Started - SSH Setup](./getting-started.md#4-configure-ssh-keys-recommended) for
detailed instructions.
### SSH Works on Host But Not in Container
@ -527,8 +534,7 @@ See [Getting Started - SSH Setup](./getting-started.md#4-configure-ssh-keys-reco
- Same operations fail inside devcontainer
- "Permission denied" or asks for password
**Root Cause:**
SSH agent forwarding isn't working properly.
**Root Cause:** SSH agent forwarding isn't working properly.
**Solutions:**
@ -633,8 +639,7 @@ git config --global credential.helper store
# Next time you enter credentials, they'll be saved
```
**Option 3: Fix SSH agent forwarding**:
See "SSH Works on Host But Not in Container" section above.
**Option 3: Fix SSH agent forwarding**: See "SSH Works on Host But Not in Container" section above.
### Multiple SSH Keys (Personal + Work)
@ -868,8 +873,7 @@ ModuleNotFoundError: No module named 'pymongo'
- History cleared
- Python venv empty
**Root Cause:**
Volumes not mounting correctly
**Root Cause:** Volumes not mounting correctly
**Solutions:**
@ -917,8 +921,8 @@ docker cp <container_id>:/workspaces/mongo/file.txt ~/Downloads/
# Right-click file → Download...
```
**To edit with external tools:**
Use bind mounts instead of named volumes (but sacrifices performance).
**To edit with external tools:** Use bind mounts instead of named volumes (but sacrifices
performance).
### Volume Fills Up Disk
@ -1070,8 +1074,7 @@ permission denied while trying to connect to Docker daemon
- Slow builds
- Out of memory errors
**Solution:**
Go to Docker Desktop → Settings → Resources and allocate generously:
**Solution:** Go to Docker Desktop → Settings → Resources and allocate generously:
- **CPUs**: Allocate as many as possible (leave 1-2 for host OS)
- **Memory**: Allocate as much as possible (leave ~4-8 GB for host OS)
@ -1087,8 +1090,7 @@ Go to Docker Desktop → Settings → Resources and allocate generously:
- Docker-outside-of-docker doesn't work
- Volume mounts fail
**Solution:**
OrbStack has some limitations with devcontainer features. Try:
**Solution:** OrbStack has some limitations with devcontainer features. Try:
1. Update to latest OrbStack version
2. Check OrbStack documentation for devcontainer compatibility
@ -1177,7 +1179,8 @@ cd mongo
If your issue isn't covered here:
1. **Check VS Code Docs**: [code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers)
1. **Check VS Code Docs**:
[code.visualstudio.com/docs/devcontainers](https://code.visualstudio.com/docs/devcontainers/containers)
2. **Search Issues**: MongoDB GitHub repository issues
3. **Ask the Team**: MongoDB developers Slack/chat
4. **File a Bug**: Include:

View File

@ -1,26 +1,95 @@
# Egress Networking
Egress networking entails outbound communication (i.e. requests) from a client process to a server process (e.g. _mongod_), as well as inbound communication (i.e. responses) from such a server process back to a client process.
Egress networking entails outbound communication (i.e. requests) from a client process to a server
process (e.g. _mongod_), as well as inbound communication (i.e. responses) from such a server
process back to a client process.
## Remote Commands
A remote command represents an exchange of data between a client and a server. A remote command consists of two steps: a request, which the clients sends to the server, and a response, which the client receives from the server. These elements are represented by the [request][remote_command_request_h] and [response][remote_command_response_h] objects; each wraps the BSON that represents the on-wire transacted data and metadata that describes the context of the command, such as the host that the command targets. Each object also contains metadata that corresponds to its half of the command lifecycle. For example, the request object notes the timeout of the command and the operation's unique identifier, among other fields, and the response object notes the final disposition of the command's data exchange as a `Status` object (which takes no position on the success of the command's semantics at the remote) and the time that the command actually took to execute, among other fields. In the case of an exhaust command, there may be multiple responses for a single request.
A remote command represents an exchange of data between a client and a server. A remote command
consists of two steps: a request, which the clients sends to the server, and a response, which the
client receives from the server. These elements are represented by the
[request][remote_command_request_h] and [response][remote_command_response_h] objects; each wraps
the BSON that represents the on-wire transacted data and metadata that describes the context of the
command, such as the host that the command targets. Each object also contains metadata that
corresponds to its half of the command lifecycle. For example, the request object notes the timeout
of the command and the operation's unique identifier, among other fields, and the response object
notes the final disposition of the command's data exchange as a `Status` object (which takes no
position on the success of the command's semantics at the remote) and the time that the command
actually took to execute, among other fields. In the case of an exhaust command, there may be
multiple responses for a single request.
## Connection Pooling
The [executor::ConnectionPool][connection_pool_h] class is responsible for pooling connections to any number of hosts. It contains zero or more `ConnectionPool::SpecificPool` objects, each of which pools connections for a unique host, and exactly one `ConnectionPool::ControllerInterface` object, which is responsible for the addition, removal, and updating of `SpecificPool`s to, from, and in its owning `ConnectionPool`. When a caller requests a connection to a host from the `ConnectionPool`, the `ConnectionPool` creates a new `SpecificPool` to pool connections for that host if one does not exist already, and then the `ConnectionPool` forwards the request to the `SpecificPool`. A `SpecificPool` expires when its `hostTimeout` has passed without any connection requests, after which time it becomes unusable; further requests for connections to that host will trigger the creation of a fresh `SpecificPool`.
The [executor::ConnectionPool][connection_pool_h] class is responsible for pooling connections to
any number of hosts. It contains zero or more `ConnectionPool::SpecificPool` objects, each of which
pools connections for a unique host, and exactly one `ConnectionPool::ControllerInterface` object,
which is responsible for the addition, removal, and updating of `SpecificPool`s to, from, and in its
owning `ConnectionPool`. When a caller requests a connection to a host from the `ConnectionPool`,
the `ConnectionPool` creates a new `SpecificPool` to pool connections for that host if one does not
exist already, and then the `ConnectionPool` forwards the request to the `SpecificPool`. A
`SpecificPool` expires when its `hostTimeout` has passed without any connection requests, after
which time it becomes unusable; further requests for connections to that host will trigger the
creation of a fresh `SpecificPool`.
The final result of a successful connection request made through `ConnectionPool::getConnection` is a `ConnectionPool::ConnectionInterface`, which represents a connection ready for use. Externally, the `ConnectionInterface` is primarily used by the caller to exchange data with its remote host. Callers return `ConnectionInterface`s to the pool by allowing them to destruct and callers must signal to the pool the final disposition of the connection beforehand through the `indicate*` family of methods. `ConnectionInterface`s also support setting timers to schedule future activities. Internally, the `ConnectionInterface` is used to prepare the connection for data exchange before transferring ownership to the caller and refreshing the health of a connection when the caller returns the connection to the pool. `ConnectionInterface` also maintains a notion of generation, which is implemented as a monotonically-incrementing counter. When a caller returns a `ConnectionInterface` to a `ConnectionPool` from a generation prior to the current generation of the corresponding `SpecificPool`, the connection is dropped. The current generation of a `SpecificPool` is incremented when the pool experiences certain failures (e.g., when to establish a new connection). `ConnectionPool` also drops a connection if the caller called `indicateFailure` on the connection before returning it. `ConnectionPool` uses a global mutex for access to `SpecificPool`s as well as generation counters.
The final result of a successful connection request made through `ConnectionPool::getConnection` is
a `ConnectionPool::ConnectionInterface`, which represents a connection ready for use. Externally,
the `ConnectionInterface` is primarily used by the caller to exchange data with its remote host.
Callers return `ConnectionInterface`s to the pool by allowing them to destruct and callers must
signal to the pool the final disposition of the connection beforehand through the `indicate*` family
of methods. `ConnectionInterface`s also support setting timers to schedule future activities.
Internally, the `ConnectionInterface` is used to prepare the connection for data exchange before
transferring ownership to the caller and refreshing the health of a connection when the caller
returns the connection to the pool. `ConnectionInterface` also maintains a notion of generation,
which is implemented as a monotonically-incrementing counter. When a caller returns a
`ConnectionInterface` to a `ConnectionPool` from a generation prior to the current generation of the
corresponding `SpecificPool`, the connection is dropped. The current generation of a `SpecificPool`
is incremented when the pool experiences certain failures (e.g., when to establish a new
connection). `ConnectionPool` also drops a connection if the caller called `indicateFailure` on the
connection before returning it. `ConnectionPool` uses a global mutex for access to `SpecificPool`s
as well as generation counters.
`ConnectionPool` uses its single instance of `EgressConnectionCloserManager` to determine when hosts should be dropped. The manager consists of multiple `EgressConnectionClosers`, which are used to determine whether hosts should be dropped. In the context of the ConnectionPool, the manager's purpose is to drop _connections_ to hosts based on whether they have been marked as keep open or not.
`ConnectionPool` uses its single instance of `EgressConnectionCloserManager` to determine when hosts
should be dropped. The manager consists of multiple `EgressConnectionClosers`, which are used to
determine whether hosts should be dropped. In the context of the ConnectionPool, the manager's
purpose is to drop _connections_ to hosts based on whether they have been marked as keep open or
not.
## Internal Network Clients
Client-side outbound communication in egress networking is primarily handled by the [AsyncDBClient class][async_client_h]. The async client is responsible for initializing a connection to a particular host as well as initializing the [wire protocol][wire_protocol] for client-server communication, after which remote requests can be sent by the client and corresponding remote responses from a database can subsequently be received. In setting up the wire protocol, the async client sends an [isMaster][is_master] request to the server and parses the server's isMaster response to ensure that the status of the connection is OK. An initial isMaster request is constructed in the legacy OP_QUERY protocol, so that clients can still communicate with servers that may not support other protocols. The async client also supports client authentication functionality (i.e. authenticating a user's credentials, client host, remote host, etc.).
Client-side outbound communication in egress networking is primarily handled by the [AsyncDBClient
class][async_client_h]. The async client is responsible for initializing a connection to a
particular host as well as initializing the [wire protocol][wire_protocol] for client-server
communication, after which remote requests can be sent by the client and corresponding remote
responses from a database can subsequently be received. In setting up the wire protocol, the async
client sends an [isMaster][is_master] request to the server and parses the server's isMaster
response to ensure that the status of the connection is OK. An initial isMaster request is
constructed in the legacy OP_QUERY protocol, so that clients can still communicate with servers that
may not support other protocols. The async client also supports client authentication functionality
(i.e. authenticating a user's credentials, client host, remote host, etc.).
The scheduling of requests is managed by the [task executor][task_executor_h], which maintains the notion of **events** and **callbacks**. Callbacks represent work (e.g. remote requests) that is to be executed by the executor, and are scheduled by client threads as well as other callbacks. There are several variations of work scheduling methods, which include: immediate scheduling, scheduling no earlier than a specified time, and scheduling iff a specified event has been signalled. These methods return a handle that can be used while the executor is still in scope for either waiting on or cancelling the scheduled callback in question. If a scheduled callback is cancelled, it remains on the work queue and is technically still run, but is labeled as having been 'cancelled' beforehand. Once a given callback/request is scheduled, the task executor is then able to execute such requests via a [network interface][network_interface_h]. The network interface, connected to a particular host/server, begins the asynchronous execution of commands specified via a request bundled in the aforementioned callback handle. The interface is capable of blocking threads until its associated task executor has work that needs to be performed, and is likewise able to return from an idle state when it receives a signal that the executor has new work to process.
The scheduling of requests is managed by the [task executor][task_executor_h], which maintains the
notion of **events** and **callbacks**. Callbacks represent work (e.g. remote requests) that is to
be executed by the executor, and are scheduled by client threads as well as other callbacks. There
are several variations of work scheduling methods, which include: immediate scheduling, scheduling
no earlier than a specified time, and scheduling iff a specified event has been signalled. These
methods return a handle that can be used while the executor is still in scope for either waiting on
or cancelling the scheduled callback in question. If a scheduled callback is cancelled, it remains
on the work queue and is technically still run, but is labeled as having been 'cancelled'
beforehand. Once a given callback/request is scheduled, the task executor is then able to execute
such requests via a [network interface][network_interface_h]. The network interface, connected to a
particular host/server, begins the asynchronous execution of commands specified via a request
bundled in the aforementioned callback handle. The interface is capable of blocking threads until
its associated task executor has work that needs to be performed, and is likewise able to return
from an idle state when it receives a signal that the executor has new work to process.
Client-side legacy networking draws upon the `DBClientBase` class, of which there are multiple subclasses residing in the `src/mongo/client` folder. The [replica set DBClient][dbclient_rs_h] discerns which one of multiple servers in a replica set is the primary at construction time, and establishes a connection (using the `DBClientConnection` wrapper class, also extended from `DBClientBase`) with the replica set via the primary. In cases where the primary server is unresponsive within a specified time range, the RS DBClient will automatically attempt to establish a secondary server as the new primary (see [automatic failover][automatic_failover]).
Client-side legacy networking draws upon the `DBClientBase` class, of which there are multiple
subclasses residing in the `src/mongo/client` folder. The [replica set DBClient][dbclient_rs_h]
discerns which one of multiple servers in a replica set is the primary at construction time, and
establishes a connection (using the `DBClientConnection` wrapper class, also extended from
`DBClientBase`) with the replica set via the primary. In cases where the primary server is
unresponsive within a specified time range, the RS DBClient will automatically attempt to establish
a secondary server as the new primary (see [automatic failover][automatic_failover]).
## See Also

View File

@ -3,26 +3,26 @@
## What it is
Similar to [burn_in_tests](burn_in_tests.md), `burn_in_tags` also detects the javascript tests
(under the [jstests directory](https://github.com/mongodb/mongo/tree/master/jstests))
that are new or have changed since the last git command and then runs those tests in repeated
mode to validate their stability. But instead of running the tests on their original build
variants, `burn_in_tags` runs them on the burn_in build variants that are generated separately.
(under the [jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) that are new
or have changed since the last git command and then runs those tests in repeated mode to validate
their stability. But instead of running the tests on their original build variants, `burn_in_tags`
runs them on the burn_in build variants that are generated separately.
## How to use it
You can use `burn_in_tags` on evergreen by selecting the `burn_in_tags_gen` task when creating a patch.
The burn_in build variants, i.e., `enterprise-rhel-8-64-bit-inmem` and `enterprise-rhel-8-64-bit-multiversion`
will be generated, each of which will have a `burn_in_tests` task generated by the
[mongo-task-generator](https://github.com/mongodb/mongo-task-generator). `burn_in_tests` task, a
[generated task](task_generation.md), may have multiple sub-tasks which run the test suites only for the
new or changed javascript tests (note that a javascript test can be included in multiple test suites). Each of
those tests will be run 2 times minimum, and 1000 times maximum or for 10 minutes, whichever is reached first.
You can use `burn_in_tags` on evergreen by selecting the `burn_in_tags_gen` task when creating a
patch. The burn_in build variants, i.e., `enterprise-rhel-8-64-bit-inmem` and
`enterprise-rhel-8-64-bit-multiversion` will be generated, each of which will have a `burn_in_tests`
task generated by the [mongo-task-generator](https://github.com/mongodb/mongo-task-generator).
`burn_in_tests` task, a [generated task](task_generation.md), may have multiple sub-tasks which run
the test suites only for the new or changed javascript tests (note that a javascript test can be
included in multiple test suites). Each of those tests will be run 2 times minimum, and 1000 times
maximum or for 10 minutes, whichever is reached first.
## ! Run All Affected JStests
The `! Run All Affected JStests` variant has a single `burn_in_tags_gen` task. This task will create &
activate [`burn_in_tests`](burn_in_tests.md) tasks for all required and suggested
variants. The end result is that any jstests that have been modified in the patch will
run on all required and suggested variants. This should give users a clear signal on
whether their jstests changes have introduced a failure that could potentially lead
to a revert or follow-up bug fix commit.
The `! Run All Affected JStests` variant has a single `burn_in_tags_gen` task. This task will create
& activate [`burn_in_tests`](burn_in_tests.md) tasks for all required and suggested variants. The
end result is that any jstests that have been modified in the patch will run on all required and
suggested variants. This should give users a clear signal on whether their jstests changes have
introduced a failure that could potentially lead to a revert or follow-up bug fix commit.

View File

@ -3,19 +3,21 @@
## What it is
`burn_in_tests` detects the javascript tests (under the
[jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) that are new or have changed
since the last git command and then runs those tests in repeated mode to validate their stability.
[jstests directory](https://github.com/mongodb/mongo/tree/master/jstests)) that are new or have
changed since the last git command and then runs those tests in repeated mode to validate their
stability.
## How to use it
You can use `burn_in_tests` on evergreen by selecting the `burn_in_tests_gen` task when creating a patch,
since `burn_in_tests` task is a [generated task](task_generation.md) generated by the
[mongo-task-generator](https://github.com/mongodb/mongo-task-generator).
`burn_in_tests` task will be generated on each of the applicable build variants, and
may have multiple sub-tasks which run the test suites only for the new or changed javascript tests (note
that a javascript test can be included in multiple test suites). Each of those tests will be run 2 times
minimum, and 1000 times maximum or for 10 minutes, whichever is reached first.
You can use `burn_in_tests` on evergreen by selecting the `burn_in_tests_gen` task when creating a
patch, since `burn_in_tests` task is a [generated task](task_generation.md) generated by the
[mongo-task-generator](https://github.com/mongodb/mongo-task-generator). `burn_in_tests` task will
be generated on each of the applicable build variants, and may have multiple sub-tasks which run the
test suites only for the new or changed javascript tests (note that a javascript test can be
included in multiple test suites). Each of those tests will be run 2 times minimum, and 1000 times
maximum or for 10 minutes, whichever is reached first.
You can also use `burn_in_tests` locally from within the [mongo repo](https://github.com/mongodb/mongo)
by running the script `python buildscripts/burn_in_tests.py`. For more information about this usage, you can
run `python buildscripts/burn_in_tests.py --help`.
You can also use `burn_in_tests` locally from within the
[mongo repo](https://github.com/mongodb/mongo) by running the script
`python buildscripts/burn_in_tests.py`. For more information about this usage, you can run
`python buildscripts/burn_in_tests.py --help`.

View File

@ -34,37 +34,37 @@ For some of the versions we are using such generic names as `latest`, `last-lts`
- `latest` - the current version. In Evergreen, the version that was compiled in the current build.
- `last-lts` - the latest LTS (Long Term Support) Major release version. In Evergreen, the version
that was downloaded from the last LTS release branch project. It resolves to an entry
in `longTermSupportReleases` of [releases.yml](../../src/mongo/util/version/releases.yml).
that was downloaded from the last LTS release branch project. It resolves to an entry in
`longTermSupportReleases` of [releases.yml](../../src/mongo/util/version/releases.yml).
- `last-continuous` - the latest Rapid release version. In Evergreen, the version that was
downloaded from the Rapid release branch project. It resolves to the entry in
`featureCompatibilityVersions` of [releases.yml](../../src/mongo/util/version/releases.yml)
that looks older than the output of `git describe`. Will not be tested against if it is listed in
`featureCompatibilityVersions` of [releases.yml](../../src/mongo/util/version/releases.yml) that
looks older than the output of `git describe`. Will not be tested against if it is listed in
`eolVersions` as being end of life.
Note: The latest release.yml file from master is always used, even fetched remotely when on another branch.
Note: The latest release.yml file from master is always used, even fetched remotely when on another
branch.
### Old vs new
Many multiversion tasks are running tests against `latest`/`last-lts` or `latest`/`last-continuous`
versions. In such context we refer to `last-lts` and `last-continuous` versions as the `old`
version and to `latest` as a `new` version.
versions. In such context we refer to `last-lts` and `last-continuous` versions as the `old` version
and to `latest` as a `new` version.
A `new` version is compiled in the same way as for non-multiversion tasks. The `old` versions of
compiled binaries are downloaded from the old branch projects with
[`db-contrib-tool`](https://github.com/10gen/db-contrib-tool).
`db-contrib-tool` searches for the latest available compiled binaries on the old branch projects in
Evergreen.
[`db-contrib-tool`](https://github.com/10gen/db-contrib-tool). `db-contrib-tool` searches for the
latest available compiled binaries on the old branch projects in Evergreen.
### Explicit and Implicit multiversion suites
Multiversion suites can be explicit and implicit.
- Explicit - JS tests are aware of the binary versions they are running,
e.g. [multiversion.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/multiversion.yml).
The version of binaries is explicitly set in JS tests,
e.g. [jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js](https://github.com/mongodb/mongo/blob/397c8da541940b3fbe6257243f97a342fe7e0d3b/jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js#L33-L44):
- Explicit - JS tests are aware of the binary versions they are running, e.g.
[multiversion.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/multiversion.yml).
The version of binaries is explicitly set in JS tests, e.g.
[jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js](https://github.com/mongodb/mongo/blob/397c8da541940b3fbe6257243f97a342fe7e0d3b/jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js#L33-L44):
```js
const versions = [
@ -101,8 +101,8 @@ const versions = [
];
```
- Implicit - JS tests know nothing about the binary versions they are running,
e.g. [retryable_writes_downgrade.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/retryable_writes_downgrade.yml).
- Implicit - JS tests know nothing about the binary versions they are running, e.g.
[retryable_writes_downgrade.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/retryable_writes_downgrade.yml).
Most of the implicit multiversion suites are using matrix suites, e.g. `replica_sets_last_lts`:
```bash
@ -134,7 +134,8 @@ test_kind: js_test
In implicit multiversion suites the version of binaries is defined on the resmoke fixture level.
The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L5-L8)
The
[example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L5-L8)
of replica set fixture configuration override:
```yaml
@ -144,7 +145,8 @@ fixture:
mixed_bin_versions: new_new_old
```
The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L53-L57)
The
[example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L53-L57)
of sharded cluster fixture configuration override:
```yaml
@ -155,7 +157,8 @@ fixture:
mixed_bin_versions: new_old_old_new
```
The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L139-L145)
The
[example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/matrix_suites/overrides/multiversion.yml#L139-L145)
of shell fixture configuration override:
```yaml
@ -171,20 +174,25 @@ value:
### Version combinations
In implicit multiversion suites the same set of tests may run in similar suites that are using
various mixed version combinations. Those version combinations depend on the type of resmoke
fixture the suite is running with. These are the recommended version combinations to test against based on the suite fixtures:
various mixed version combinations. Those version combinations depend on the type of resmoke fixture
the suite is running with. These are the recommended version combinations to test against based on
the suite fixtures:
- Replica set fixture combinations:
- `last-lts new-new-old` (i.e. suite runs the replica set fixture that spins up the `latest` and
the `last-lts` versions in a 3-node replica set where the 1st node is the `latest`, 2nd - `latest`,
3rd - `last-lts`, etc.)
the `last-lts` versions in a 3-node replica set where the 1st node is the `latest`, 2nd -
`latest`, 3rd - `last-lts`, etc.)
- `last-lts new-old-new`
- `last-lts old-new-new`
- `last-continuous new-new-old`
- `last-continuous new-old-new`
- `last-continuous old-new-new`
- Ex: [change_streams](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml) uses a [`ReplicaSetFixture`](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml#L50) so the corresponding multiversion suites are
- Ex:
[change_streams](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml)
uses a
[`ReplicaSetFixture`](https://github.com/mongodb/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml#L50)
so the corresponding multiversion suites are
- [`change_streams_last_continuous_new_new_old`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_new_old.yml)
- [`change_streams_last_continuous_new_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_old_new.yml)
- [`change_streams_last_continuous_old_new_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_old_new_new.yml)
@ -199,7 +207,11 @@ fixture the suite is running with. These are the recommended version combination
replica sets per shard where the 1st node of the 1st shard is the `latest`, 2nd node of 1st
shard - `last-lts`, 1st node of 2nd shard - `last-lts`, 2nd node of 2nd shard - `latest`, etc.)
- `last-continuous new-old-old-new`
- Ex: [change_streams_downgrade](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml) uses a [`ShardedClusterFixture`](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml#L408) so the corresponding multiversion suites are
- Ex:
[change_streams_downgrade](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml)
uses a
[`ShardedClusterFixture`](https://github.com/mongodb/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml#L408)
so the corresponding multiversion suites are
- [`change_streams_downgrade_last_continuous_new_old_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_continuous_new_old_old_new.yml)
- [`change_streams_downgrade_last_lts_new_old_old_new`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_lts_new_old_old_new.yml)
@ -207,18 +219,21 @@ fixture the suite is running with. These are the recommended version combination
- `last-lts` (i.e. suite runs the shell fixture that spins up `last-lts` as the `old` versions,
etc.)
- `last-continuous`
- Ex: [initial_sync_fuzzer](https://github.com/mongodb/mongo/blob/908625ffdec050a71aa2ce47c35788739f629c60/buildscripts/resmokeconfig/suites/initial_sync_fuzzer.yml) uses a Shell Fixture, so the corresponding multiversion suites are
- Ex:
[initial_sync_fuzzer](https://github.com/mongodb/mongo/blob/908625ffdec050a71aa2ce47c35788739f629c60/buildscripts/resmokeconfig/suites/initial_sync_fuzzer.yml)
uses a Shell Fixture, so the corresponding multiversion suites are
- [`initial_sync_fuzzer_last_lts`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_lts.yml)
- [`initial_sync_fuzzer_last_continuous`](https://github.com/mongodb/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_continuous.yml)
If `last-lts` and `last-continuous` versions happen to be the same, or last-continuous is EOL, we skip `last-continuous`
and run multiversion suites with only `last-lts` combinations in Evergreen.
If `last-lts` and `last-continuous` versions happen to be the same, or last-continuous is EOL, we
skip `last-continuous` and run multiversion suites with only `last-lts` combinations in Evergreen.
## Working with multiversion tasks in Evergreen
### Multiversion task generation
Please refer to mongo-task-generator [documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#multiversion-testing)
Please refer to mongo-task-generator
[documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#multiversion-testing)
for generating multiversion tasks in Evergreen.
### Exclude tests from multiversion testing
@ -240,20 +255,21 @@ multiversion where `XX` is the version number, e.g. `requires_fcv_70` stands for
```
Tests with `requires_fcv_XX` tags are excluded from multiversion tasks that may run the versions
below the specified FCV version, e.g. when the `latest` version is `6.2`, `last-continuous` is
`6.1` and `last-lts` is `6.0`, tests tagged with `requires_fcv_61` will NOT run in multiversion
tasks that run `latest` with `last-lts`, but will run in multiversion tasks that run `lastest` with
below the specified FCV version, e.g. when the `latest` version is `6.2`, `last-continuous` is `6.1`
and `last-lts` is `6.0`, tests tagged with `requires_fcv_61` will NOT run in multiversion tasks that
run `latest` with `last-lts`, but will run in multiversion tasks that run `lastest` with
`last-continuous`.
In addition to disabling multiversion tests based on FCV, there is no need to run in-development `featureFlagXYZ` tests
(featureFlags that have `default: false`) because these tests will most likely fail on older versions that
have not implemented this feature. For multiversion tasks, we pass the `--runNoFeatureFlagTests` flag to avoid these
failures on `all feature flag` variants.
In addition to disabling multiversion tests based on FCV, there is no need to run in-development
`featureFlagXYZ` tests (featureFlags that have `default: false`) because these tests will most
likely fail on older versions that have not implemented this feature. For multiversion tasks, we
pass the `--runNoFeatureFlagTests` flag to avoid these failures on `all feature flag` variants.
For more info on FCV, take a look at [FCV_AND_FEATURE_FLAG_README.md](https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/FCV_AND_FEATURE_FLAG_README.md).
For more info on FCV, take a look at
[FCV_AND_FEATURE_FLAG_README.md](https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/FCV_AND_FEATURE_FLAG_README.md).
Another common case could be that the changes on master branch are breaking multiversion tests,
but with those changes backported to the older branches the multiversion tests should work.
In order to temporarily disable the test from running in multiversion it can be added to the
Another common case could be that the changes on master branch are breaking multiversion tests, but
with those changes backported to the older branches the multiversion tests should work. In order to
temporarily disable the test from running in multiversion it can be added to the
[etc/backports_required_for_multiversion_tests.yml](https://github.com/mongodb/mongo/blob/fcdfe29cee066278b94ea2749456fc433cc398c6/etc/backports_required_for_multiversion_tests.yml#L1-L19).
Please follow the instructions described in the file.

View File

@ -7,21 +7,22 @@ evergreen command.
Task generation allow us to do things like dynamically split a task into sub-tasks that can be run
in parallel, or generate sub-tasks to run against different mongodb versions.
Task generation is typically done with the [mongo-task-generator](https://github.com/mongodb/mongo-task-generator)
tool. Refer to its [documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md)
Task generation is typically done with the
[mongo-task-generator](https://github.com/mongodb/mongo-task-generator) tool. Refer to its
[documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md)
for details on how it works.
## Configuring a task to be generated
In order to generate a task, we typically create a placeholder task. By convention the name of
these tasks should end in "\_gen". Most of the time, generated tasks should inherit the
In order to generate a task, we typically create a placeholder task. By convention the name of these
tasks should end in "\_gen". Most of the time, generated tasks should inherit the
[gen_task_template](https://github.com/mongodb/mongo/blob/31864e3866ce9cc54c08463019846ded2ad9e6e5/etc/evergreen_yml_components/definitions.yml#L99-L107)
which configures the required dependencies.
The placeholder tasks needs to have the "generate resmoke tasks" function as one of its `commands`.
This is how the `mongo-task-generator` knows that the task needs to be generated. You can also
add `vars` to the function call to configure how the task will generated. You can refer to
the [mongo-task-generator](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#use-cases)
This is how the `mongo-task-generator` knows that the task needs to be generated. You can also add
`vars` to the function call to configure how the task will generated. You can refer to the
[mongo-task-generator](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#use-cases)
documentation for details on what options are available.
Once a placeholder task in defined, you can reference it just like a normal task.
@ -40,15 +41,15 @@ Task generation is performed as a 2-step process.
additional tasks in the future, they will exist to be run.
This step will also hide all the placeholder tasks into a display task called `generator_tasks`
in each build variant. Once task generation is completed, the user should perform actions on
the generated tasks instead of the placeholder tasks, we encourage this by hiding the
placeholder tasks from view.
in each build variant. Once task generation is completed, the user should perform actions on the
generated tasks instead of the placeholder tasks, we encourage this by hiding the placeholder
tasks from view.
2. After the tasks have been generated, the placeholder tasks are free to run. The placeholder tasks
simply find the task generated for them and mark it activated. Since generated tasks are
created in the "inactive" state, this will activate any generated tasks whose placeholder task
runs. This enables users to select tasks to run on the initial task selection page even though
the tasks have not yet been generated.
simply find the task generated for them and mark it activated. Since generated tasks are created
in the "inactive" state, this will activate any generated tasks whose placeholder task runs. This
enables users to select tasks to run on the initial task selection page even though the tasks
have not yet been generated.
**Note**: While this 2-step process allows a similar user experience to working with normal tasks,
it does create a few UI quirks. For example, evergreen will hide "inactive" tasks in the UI, as a

View File

@ -2,10 +2,15 @@
## Types of timeouts
There are two types of timeouts that [Evergreen supports](https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#timeoutupdate):
There are two types of timeouts that
[Evergreen supports](https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#timeoutupdate):
- **Exec Timeout**: The _exec timeout_ is the overall timeout for a task. Once the total runtime for a test exceeds this value, the timeout logic will be triggered. This value is specified by `exec_timeout_secs` in the Evergreen configuration.
- **Idle Timeout**: The _idle timeout_ is the amount of time Evergreen will wait for output to be generated before considering the task hung and triggering the timeout logic. This value is specified by `timeout_secs` in the Evergreen configuration.
- **Exec Timeout**: The _exec timeout_ is the overall timeout for a task. Once the total runtime for
a test exceeds this value, the timeout logic will be triggered. This value is specified by
`exec_timeout_secs` in the Evergreen configuration.
- **Idle Timeout**: The _idle timeout_ is the amount of time Evergreen will wait for output to be
generated before considering the task hung and triggering the timeout logic. This value is
specified by `timeout_secs` in the Evergreen configuration.
**Note**: In most cases, the **exec timeout** is the more useful of the two timeouts.
@ -15,15 +20,27 @@ There are several ways to set the timeout for a task running in Evergreen.
### Specifying timeouts in the Evergreen YAML configuration
Timeouts can be specified directly in the `evergreen.yml` (and related) files, both for tasks and build variants. This approach is useful for setting default timeout values but is limited because different build variants often have varying runtime characteristics. This means it is not possible to set timeouts for a specific task running on a specific build variant using only this method.
Timeouts can be specified directly in the `evergreen.yml` (and related) files, both for tasks and
build variants. This approach is useful for setting default timeout values but is limited because
different build variants often have varying runtime characteristics. This means it is not possible
to set timeouts for a specific task running on a specific build variant using only this method.
### Overrides: [etc/evergreen_timeouts.yml](../../etc/evergreen_timeouts.yml)
The `etc/evergreen_timeouts.yml` file allows overriding timeouts for specific tasks on specific build variants. This workaround helps address the limitations of directly specifying timeouts in `evergreen.yml`. To use this method, the task must include the `determine task timeout` and `update task timeout expansions` functions at the beginning of its Evergreen definition. Many Resmoke tasks already incorporate these functions.
The `etc/evergreen_timeouts.yml` file allows overriding timeouts for specific tasks on specific
build variants. This workaround helps address the limitations of directly specifying timeouts in
`evergreen.yml`. To use this method, the task must include the `determine task timeout` and
`update task timeout expansions` functions at the beginning of its Evergreen definition. Many
Resmoke tasks already incorporate these functions.
### Resmoke tasks: [buildscripts/evergreen_task_timeout.py](../../buildscripts/evergreen_task_timeout.py)
This script reads the `etc/evergreen_timeouts.yml` file to calculate the appropriate timeout settings. Additionally, it checks historical test results for the task being run to determine if enough information is available to calculate timeouts based on past data. The script also supports more advanced methods of determining timeouts, such as applying aggressive timeout measures for tasks executed in the commit queue or on required build variants. In cases of conflict, the commit queue and required build variant limits take precedence over the previous two methods.
This script reads the `etc/evergreen_timeouts.yml` file to calculate the appropriate timeout
settings. Additionally, it checks historical test results for the task being run to determine if
enough information is available to calculate timeouts based on past data. The script also supports
more advanced methods of determining timeouts, such as applying aggressive timeout measures for
tasks executed in the commit queue or on required build variants. In cases of conflict, the commit
queue and required build variant limits take precedence over the previous two methods.
The timeout that was calculated by the script can be retrieved from the logs:
@ -38,4 +55,8 @@ The timeout that was calculated by the script can be retrieved from the logs:
### Compile tasks: [evergreen/generate_override_timeout.py](../../evergreen/generate_override_timeout.py)
This script is used for compile tasks defined in files such as `etc/evergreen_yml_components/tasks/compile_tasks.yml` and `etc/evergreen_yml_components/tasks/compile_tasks_shared.yml`. The script reads the `etc/evergreen_timeouts.yml` file and calculates appropriate timeouts. The Evergreen function `override task timeout` then runs this script to update the timeouts accordingly.
This script is used for compile tasks defined in files such as
`etc/evergreen_yml_components/tasks/compile_tasks.yml` and
`etc/evergreen_yml_components/tasks/compile_tasks_shared.yml`. The script reads the
`etc/evergreen_timeouts.yml` file and calculates appropriate timeouts. The Evergreen function
`override task timeout` then runs this script to update the timeouts accordingly.

View File

@ -1,37 +1,47 @@
# Build Variants
This document describes build variants (a.k.a. variants, or builds, or buildvariants) that are used in `mongodb-mongo-*` projects.
To know more about build variants, please refer to the [Build Variants](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#build-variants) section of the Evergreen wiki.
This document describes build variants (a.k.a. variants, or builds, or buildvariants) that are used
in `mongodb-mongo-*` projects. To know more about build variants, please refer to the
[Build Variants](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#build-variants)
section of the Evergreen wiki.
## YAML files structure
Build variant configuration files are in `etc/evergreen_yml_components/variants` directory.
They are merged into `etc/evergreen.yml` and `etc/evergreen_nightly.yml` with Evergreen's [include](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#include) feature.
Build variant configuration files are in `etc/evergreen_yml_components/variants` directory. They are
merged into `etc/evergreen.yml` and `etc/evergreen_nightly.yml` with Evergreen's
[include](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#include)
feature.
Inside `etc/evergreen_yml_components/variants` directory there are more directories,
which are in most cases platform names (e.g. amazon, rhel etc.) or build variant group names (e.g. sanitizer etc.).
Inside `etc/evergreen_yml_components/variants` directory there are more directories, which are in
most cases platform names (e.g. amazon, rhel etc.) or build variant group names (e.g. sanitizer
etc.).
Be aware that some of these files could be also used or re-used to be merged into `etc/system_perf.yml` which is used for `sys-perf` project.
Be aware that some of these files could be also used or re-used to be merged into
`etc/system_perf.yml` which is used for `sys-perf` project.
## Build Variants in `mongodb-mongo-master` and `mongodb-mongo-master-nightly`
`mongodb-mongo-master` evergreen project uses `etc/evergreen.yml` and contains all build variants for development, including all feature-specific, patch build required, and suggested variants.
`mongodb-mongo-master` evergreen project uses `etc/evergreen.yml` and contains all build variants
for development, including all feature-specific, patch build required, and suggested variants.
`mongodb-mongo-master-nightly` evergreen project uses `etc/evergreen_nightly.yml` and contains build variants for public nightly builds.
`mongodb-mongo-master-nightly` evergreen project uses `etc/evergreen_nightly.yml` and contains build
variants for public nightly builds.
## Required and Suggested Build Variants
"Required" build variants are defined as any build variant with a `!` at the front of its display name in Evergreen.
These build variants also have `required` tag.
"Required" build variants are defined as any build variant with a `!` at the front of its display
name in Evergreen. These build variants also have `required` tag.
[Required Patch Builds Policy](https://wiki.corp.mongodb.com/display/KERNEL/Required+Patch+Builds+Policy)
"Suggested" build variants are defined as any build variant with a `*` at the front of its display name in Evergreen.
These build variants also have `suggested` tag.
"Suggested" build variants are defined as any build variant with a `*` at the front of its display
name in Evergreen. These build variants also have `suggested` tag.
## Build Variants with forbid_tasks_tagged_with_experimental
Build variants with the `forbid_tasks_tagged_with_experimental` tag indicate that they do not allow tasks tagged as `experimental` to run. This tag is used in conjunction with the `forbid-tasks-with-tag-on-variants` evergreen lint rule to enforce this restriction.
Build variants with the `forbid_tasks_tagged_with_experimental` tag indicate that they do not allow
tasks tagged as `experimental` to run. This tag is used in conjunction with the
`forbid-tasks-with-tag-on-variants` evergreen lint rule to enforce this restriction.
## Build Variants after branching
@ -39,34 +49,48 @@ In each of platform or build variant group directory there can be these files:
- `test_dev.yml`
- these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project on master branch
- after branching on all new branches these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project
- these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project
on master branch
- after branching on all new branches these files are merged into `etc/evergreen_nightly.yml`
which is used for a new branch `mongodb-mongo-vX.Y` project
- `test_dev_master_and_lts_branches_only.yml`
- these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project on master branch
- after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project
- **important**: all tests that are running on these build variants will NOT run on a new Rapid release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects
- these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project
on master branch
- after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into
`etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project
- **important**: all tests that are running on these build variants will NOT run on a new Rapid
release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects
- `test_dev_master_branch_only.yml`
- these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project on master branch
- these files are merged into `etc/evergreen.yml` which is used for `mongodb-mongo-master` project
on master branch
- after branching on all new branches these files are NOT used
- **important**: all tests that are running on these build variants will NOT run on a new branch `mongodb-mongo-vX.Y` project
- **important**: all tests that are running on these build variants will NOT run on a new branch
`mongodb-mongo-vX.Y` project
- `test_release.yml`
- these files are merged into `etc/evergreen_nightly.yml` which is used for `mongodb-mongo-master-nightly` project on master branch
- after branching on all new branches these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project
- these files are merged into `etc/evergreen_nightly.yml` which is used for
`mongodb-mongo-master-nightly` project on master branch
- after branching on all new branches these files are merged into `etc/evergreen_nightly.yml`
which is used for a new branch `mongodb-mongo-vX.Y` project
- `test_release_master_and_lts_branches_only.yml`
- these files are merged into `etc/evergreen_nightly.yml` which is used for `mongodb-mongo-master-nightly` project on master branch
- after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into `etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project
- **important**: all tests that are running on these build variants will NOT run on a new Rapid release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects
- these files are merged into `etc/evergreen_nightly.yml` which is used for
`mongodb-mongo-master-nightly` project on master branch
- after branching for LTS release (v7.0, v8.0 etc.) on a new branch these files are merged into
`etc/evergreen_nightly.yml` which is used for a new branch `mongodb-mongo-vX.Y` project
- **important**: all tests that are running on these build variants will NOT run on a new Rapid
release (v7.1, v7.2, v7.3, v8.1, v8.2, v8.3 etc.) branch projects
- `test_release_master_branch_only.yml`
- these files are merged into `etc/evergreen_nightly.yml` which is used for `mongodb-mongo-master-nightly` project on master branch
- these files are merged into `etc/evergreen_nightly.yml` which is used for
`mongodb-mongo-master-nightly` project on master branch
- after branching on all new branches these files are NOT used
- **important**: all tests that are running on these build variants will NOT run on a new branch `mongodb-mongo-vX.Y` project
- **important**: all tests that are running on these build variants will NOT run on a new branch
`mongodb-mongo-vX.Y` project

View File

@ -11,14 +11,14 @@ section of the Evergreen wiki.
### `mongodb-mongo-master`
The main project for testing MongoDB's dev environments with a number build variants,
each one corresponding to a particular compile or testing environment to support development.
Each build variant runs a set of tasks; each task ususally runs one or more tests.
The main project for testing MongoDB's dev environments with a number build variants, each one
corresponding to a particular compile or testing environment to support development. Each build
variant runs a set of tasks; each task ususally runs one or more tests.
### `mongodb-mongo-master-nightly`
Tracks the same branch as `mongodb-mongo-master`, each build variant corresponds to a
(version, OS, architecure) triplet for a supported MongoDB nightly release.
Tracks the same branch as `mongodb-mongo-master`, each build variant corresponds to a (version, OS,
architecure) triplet for a supported MongoDB nightly release.
### `sys_perf`
@ -28,22 +28,23 @@ The system performance project.
The above Evergreen projects are defined in the following files:
- `etc/evergreen_yml_components/**.yml`. YAML files containing definitions for tasks, functions, buildvariants, etc.
They are copied from the existing evergreen.yml file.
- `etc/evergreen_yml_components/**.yml`. YAML files containing definitions for tasks, functions,
buildvariants, etc. They are copied from the existing evergreen.yml file.
- `etc/evergreen.yml`. Imports components from above and serves as the project config for mongodb-mongo-master,
containing all build variants for development, including all feature-specific, patch build required, and suggested
variants.
- `etc/evergreen.yml`. Imports components from above and serves as the project config for
mongodb-mongo-master, containing all build variants for development, including all
feature-specific, patch build required, and suggested variants.
- `etc/evergreen_nightly.yml`. The project configuration for mongodb-mongo-master-nightly, containing only build
variants for public nightly builds, imports similar components as evergreen.yml to ensure consistency.
- `etc/evergreen_nightly.yml`. The project configuration for mongodb-mongo-master-nightly,
containing only build variants for public nightly builds, imports similar components as
evergreen.yml to ensure consistency.
- `etc/sys_perf.yml`. Configuration file for the system performance project.
## Release Branching Process
Only the `mongodb-mongo-master-nightly` project will be branched with required and other
necessary variants (e.g. sanitizers) added back in. Most variants in `mongodb-mongo-master`
would be dropped by default but can be re-introduced to the release branches manually on an
as-needed basis. For Rapid releases, all but the variants relevant to Atlas in
`mongodb-mongo-master-nightly` may be dropped as well.
Only the `mongodb-mongo-master-nightly` project will be branched with required and other necessary
variants (e.g. sanitizers) added back in. Most variants in `mongodb-mongo-master` would be dropped
by default but can be re-introduced to the release branches manually on an as-needed basis. For
Rapid releases, all but the variants relevant to Atlas in `mongodb-mongo-master-nightly` may be
dropped as well.

View File

@ -1,11 +1,15 @@
# Task ownership tags
This document describes task ownership tags that are used in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects.
This document describes task ownership tags that are used in `mongodb-mongo-master` and
`mongodb-mongo-master-nightly` projects.
Every task in in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects should be tag with exactly one `assigned_to_jira_team_.+` tag.
Team names (the part after `assigned_to_jira_team_`) should match `evergreen_tag_name` from team configurations in [mothra](https://github.com/10gen/mothra/tree/main/mothra/teams).
Every task in in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects should be tag
with exactly one `assigned_to_jira_team_.+` tag. Team names (the part after
`assigned_to_jira_team_`) should match `evergreen_tag_name` from team configurations in
[mothra](https://github.com/10gen/mothra/tree/main/mothra/teams).
This is enforced by linter. YAML linter configuration could be found [here](../../../etc/evergreen_lint.yml).
This is enforced by linter. YAML linter configuration could be found
[here](../../../etc/evergreen_lint.yml).
If the linter configuration is missing your team:
@ -13,4 +17,7 @@ If the linter configuration is missing your team:
2. Make sure that your team configuration in mothra has `evergreen_tag_name`
3. Update the tag list with `assigned_to_jira_team_{evergreen_tag_name}` tag for your team
Dynamically generated tasks for resmoke suites (i.e. the ones named like `//buildscripts/resmokeconfig:core`) will set the ownership tag based on a best effort lookup from the codeowner of the test's definition to a team name from mothra, picking the first encountered in case of multiple possible assignments.
Dynamically generated tasks for resmoke suites (i.e. the ones named like
`//buildscripts/resmokeconfig:core`) will set the ownership tag based on a best effort lookup from
the codeowner of the test's definition to a team name from mothra, picking the first encountered in
case of multiple possible assignments.

View File

@ -1,49 +1,58 @@
# Task selection tags
This document describes task selection tags that are used in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` projects.
To know more about task tags, please refer to the [Task and Variant Tags](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#task-and-variant-tags) section of the Evergreen wiki.
This document describes task selection tags that are used in `mongodb-mongo-master` and
`mongodb-mongo-master-nightly` projects. To know more about task tags, please refer to the
[Task and Variant Tags](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#task-and-variant-tags)
section of the Evergreen wiki.
The majority of variants in `mongodb-mongo-master-nightly` project and the most significat variants in `mongodb-mongo-master` project are using required and optional groups of task selection tags.
In order to add tasks to those variants, please use them as described in the following sections.
The majority of variants in `mongodb-mongo-master-nightly` project and the most significat variants
in `mongodb-mongo-master` project are using required and optional groups of task selection tags. In
order to add tasks to those variants, please use them as described in the following sections.
## Required task selection tags
Every task in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` project must be tagged with exactly one required selection tag.
This is enforced by linter. YAML linter configuration could be found [here](../../../etc/evergreen_lint.yml).
Every task in `mongodb-mongo-master` and `mongodb-mongo-master-nightly` project must be tagged with
exactly one required selection tag. This is enforced by linter. YAML linter configuration could be
found [here](../../../etc/evergreen_lint.yml).
- `development_critical` - these tasks should be green prior to the merge and will block merging if failing, e.g. jsCore.
We run these tasks on all variants and in the commit-queue.
- `development_critical` - these tasks should be green prior to the merge and will block merging if
failing, e.g. jsCore. We run these tasks on all variants and in the commit-queue.
- `development_critical_single_variant` - the same as `development_critical` but these tasks do not require to run on multiple variants, e.g. clang-tidy, formatters, linters etc.
We run these tasks on the required variant and in the commit-queue.
- `development_critical_single_variant` - the same as `development_critical` but these tasks do not
require to run on multiple variants, e.g. clang-tidy, formatters, linters etc. We run these tasks
on the required variant and in the commit-queue.
- `no_commit_queue` - add this to tasks in development_critical that you do not want in the commit-queue
- `no_commit_queue` - add this to tasks in development_critical that you do not want in the
commit-queue
- `release_critical` - these tasks should be green prior to the release.
We run these tasks on all release and development (required and suggested) variants.
It should be uncommon to add tasks to this tag but if your task needs to run on many different OSes and it is extremely broad in coverage then you can add it to this tag.
- `release_critical` - these tasks should be green prior to the release. We run these tasks on all
release and development (required and suggested) variants. It should be uncommon to add tasks to
this tag but if your task needs to run on many different OSes and it is extremely broad in
coverage then you can add it to this tag.
- `default` - these tasks are running as part of a required patch build.
We run these tasks on the most significant development variants (required patches, tsan, aubsan, etc.).
Use this tag if you are not sure which tag to use for your new task.
- `default` - these tasks are running as part of a required patch build. We run these tasks on the
most significant development variants (required patches, tsan, aubsan, etc.). Use this tag if you
are not sure which tag to use for your new task.
- `non_deterministic` - these tasks depend significantly on randomization and we expect to see some unique failures, e.g. fuzzers etc.
We run these tasks on non-required development variants.
- `non_deterministic` - these tasks depend significantly on randomization and we expect to see some
unique failures, e.g. fuzzers etc. We run these tasks on non-required development variants.
- `experimental` - these tasks are not running anywhere regularly.
We do not use this tag for selecting tasks to run on variants.
This tag could be used for tasks that you would like to run on your own custom variants.
- `experimental` - these tasks are not running anywhere regularly. We do not use this tag for
selecting tasks to run on variants. This tag could be used for tasks that you would like to run on
your own custom variants.
- `auxiliary` - these are various setup, helper, etc. tasks and should be mostly owned by infrastructure team.
You should almost never use this tag.
Please reach out to [#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N) before adding tasks with this tag.
- `auxiliary` - these are various setup, helper, etc. tasks and should be mostly owned by
infrastructure team. You should almost never use this tag. Please reach out to
[#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N) before adding tasks
with this tag.
**Important**: Do not change anything in this list without talking to [#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N).
**Important**: Do not change anything in this list without talking to
[#ask-devprod-build](https://mongodb.enterprise.slack.com/archives/CR8SNBY0N).
## Optional task selection tags
In addition to the required task selection tags there is a list of optional selection tags.
Every task could be tagged with any number of the following tags:
In addition to the required task selection tags there is a list of optional selection tags. Every
task could be tagged with any number of the following tags:
- `incompatible_community` - the task should be excluded from the community variants.
- `incompatible_windows` - the task should be excluded from Windows variants.
@ -55,16 +64,20 @@ Every task could be tagged with any number of the following tags:
- `incompatible_aubsan` - the task should be excluded from {A,UB}SAN variants.
- `incompatible_tsan` - the task should be excluded from TSAN variants.
- `incompatible_debug_mode` - the task should be excluded from Debug Mode variants.
- `incompatible_system_allocator` - the task should be excluded from variants that use the system allocator.
- `incompatible_system_allocator` - the task should be excluded from variants that use the system
allocator.
- `incompatible_all_feature_flags` - the task should be excluded from all-feature-flags variants.
- `incompatible_development_variant` - the task should be excluded from the development variants.
- `incompatible_oscrypto` - the task should be excluded from variants unsupported by oscrypto.
- `requires_compile_variant` - the task can (or should) only run on variants that has compile releated expansions.
- `requires_compile_variant` - the task can (or should) only run on variants that has compile
releated expansions.
- `requires_large_host` - the task requires a large host to run.
- `requires_large_host_aubsan` - the task requires a large host to run on {A,UB}SAN variants.
- `requires_large_host_tsan` - the task requires a large host to run on TSAN variants.
- `requires_large_host_debug_mode` - the task requires a large host to run on Debug Mode variants.
- `requires_large_host_commit_queue` - the task requires a large host to run on in the commit-queue.
- `requires_all_feature_flags` - the task can only run on variants that has all-feature-flags configuration.
- `requires_execution_on_windows_patch_build` - the task should be run on the required Windows build variant on each patch
build. See [SERVER-79037](https://jira.mongodb.org/browse/SERVER-79037) for how this was calculated.
- `requires_all_feature_flags` - the task can only run on variants that has all-feature-flags
configuration.
- `requires_execution_on_windows_patch_build` - the task should be run on the required Windows build
variant on each patch build. See [SERVER-79037](https://jira.mongodb.org/browse/SERVER-79037) for
how this was calculated.

View File

@ -5,16 +5,16 @@ MongoDB code uses the following types of assertions that are available for use:
- `uassert` and `iassert`
- Checks for per-operation user errors. Operation-fatal.
- `tassert`
- Like uassert in that it checks for per-operation user errors, but inhibits clean shutdown
in tests. Operation-fatal, but process-fatal in testing environments during shutdown.
- Like uassert in that it checks for per-operation user errors, but inhibits clean shutdown in
tests. Operation-fatal, but process-fatal in testing environments during shutdown.
- `massert`
- Checks per-operation invariants. Operation-fatal.
- `fassert`
- Checks fatal process invariants. Process-fatal. Use to detect unexpected situations (such
as a system function returning an unexpected error status).
- Checks fatal process invariants. Process-fatal. Use to detect unexpected situations (such as a
system function returning an unexpected error status).
- `invariant`
- Checks process invariant. Process-fatal. Use to detect code logic errors ("pointer should
never be null", "we should always be locked").
- Checks process invariant. Process-fatal. Use to detect code logic errors ("pointer should never
be null", "we should always be locked").
**Note**: Calling C function `assert` is not allowed. Use one of the above instead.
@ -50,8 +50,8 @@ Some assertions will increment an assertion counter. The `serverStatus` command
- `tripwire`
- Incremented by `tassert`.
- `rollovers`
- When any counter reaches a value of `1 << 30`, all of the counters are reset and
the "rollovers" counter is incremented.
- When any counter reaches a value of `1 << 30`, all of the counters are reset and the "rollovers"
counter is incremented.
## Considerations
@ -61,52 +61,53 @@ terminate the current operation, not the whole process. Be careful not to corrup
mistakenly using these assertions midway through mutating process state.
`fassert` failures will terminate the entire process; this is used for low-level checks where
continuing might lead to corrupt data or loss of data on disk. Additionally, `fassert` will log
a generic assertion message with fatal severity and add a breakpoint before terminating.
continuing might lead to corrupt data or loss of data on disk. Additionally, `fassert` will log a
generic assertion message with fatal severity and add a breakpoint before terminating.
To log a custom assertion message and terminate the server, use `LOGV2_FATAL`.
To avoid printing a stacktrace on failure use `fassertNoTrace` or `LOGV2_FATAL_NO_TRACE`.
Consider using them if there is only one way to reach this fatal point in code.
To log a custom assertion message and terminate the server, use `LOGV2_FATAL`. To avoid printing a
stacktrace on failure use `fassertNoTrace` or `LOGV2_FATAL_NO_TRACE`. Consider using them if there
is only one way to reach this fatal point in code.
`tassert` will fail the operation like `uassert`, but also triggers a "deferred-fatality tripwire
flag". In testing environments, if the tripwire flag is set during shutdown, the process will
invoke the tripwire fatal assertion. In non-testing environments, there will only be a warning
during shutdown that tripwire assertions have failed.
flag". In testing environments, if the tripwire flag is set during shutdown, the process will invoke
the tripwire fatal assertion. In non-testing environments, there will only be a warning during
shutdown that tripwire assertions have failed.
`tassert` presents more diagnostics than `uassert`. `tassert` will log the assertion as an error,
log scoped debug info (for more info, see ScopedDebugInfoStack defined in
[mongo/util/assert_util.h][assert_util_h]), print the stack trace, and add a breakpoint.
The purpose of `tassert` is to ensure that operation failures will cause a test suite to fail
without resorting to different behavior during testing. `tassert` should only be used to check
for unexpected values produced by defined behavior.
[mongo/util/assert_util.h][assert_util_h]), print the stack trace, and add a breakpoint. The purpose
of `tassert` is to ensure that operation failures will cause a test suite to fail without resorting
to different behavior during testing. `tassert` should only be used to check for unexpected values
produced by defined behavior.
Both `massert` and `uassert` take error codes, so that all assertions have codes associated with
them. Currently, programmers are free to provide the error code by either [using a unique location
number](#choosing-a-unique-location-number) or choosing a named code from `ErrorCodes`. Unique location
numbers have no meaning other than a way to associate a log message with a line of code.
them. Currently, programmers are free to provide the error code by either
[using a unique location number](#choosing-a-unique-location-number) or choosing a named code from
`ErrorCodes`. Unique location numbers have no meaning other than a way to associate a log message
with a line of code.
`massert` will log the assertion message as an error, while `uassert` will log the message with
debug level of 1 (for more info about log debug level, see [docs/logging.md][logging_md]).
`iassert` provides similar functionality to `uassert`, but it logs at a debug level of 3 and
does not increment user assertion counters. We should always choose `iassert` over `uassert`
when we expect a failure, a failure might be recoverable, or failure accounting is not interesting.
`iassert` provides similar functionality to `uassert`, but it logs at a debug level of 3 and does
not increment user assertion counters. We should always choose `iassert` over `uassert` when we
expect a failure, a failure might be recoverable, or failure accounting is not interesting.
### Choosing a unique location number
The current convention for choosing a unique location number is to use the 5 or 6 digit SERVER ticket number
for the ticket being addressed when the assertion is added, followed by a two digit counter to distinguish
between codes added as part of the same ticket. For example, if you're working on SERVER-12345, the first
error code would be 1234500, the second would be 1234501, etc. This convention can also be used for LOGV2
logging id numbers.
The current convention for choosing a unique location number is to use the 5 or 6 digit SERVER
ticket number for the ticket being addressed when the assertion is added, followed by a two digit
counter to distinguish between codes added as part of the same ticket. For example, if you're
working on SERVER-12345, the first error code would be 1234500, the second would be 1234501, etc.
This convention can also be used for LOGV2 logging id numbers.
The only real constraint for unique location numbers is that they must be unique across the codebase. This is
verified at compile time with a [python script][errorcodes_py].
The only real constraint for unique location numbers is that they must be unique across the
codebase. This is verified at compile time with a [python script][errorcodes_py].
## Exception
A failed operation-fatal assertion throws an `AssertionException` or a child of that.
The inheritance hierarchy resembles:
A failed operation-fatal assertion throws an `AssertionException` or a child of that. The
inheritance hierarchy resembles:
- `std::exception`
- `mongo::DBException`
@ -123,14 +124,14 @@ upwards harmlessly. The code should also expect, and properly handle, `UserExcep
## ErrorCodes and Status
MongoDB uses `ErrorCodes` both internally and externally: a subset of error codes (e.g.,
`BadValue`) are used externally to pass errors over the wire and to clients. These error codes are
the means for MongoDB processes (e.g., _mongod_ and _mongo_) to communicate errors, and are visible
to client applications. Other error codes are used internally to indicate the underlying reason for
a failed operation. For instance, `PeriodicJobIsStopped` is an internal error code that is passed
to callback functions running inside a [`PeriodicRunner`][periodic_runner_h] once the runner is
stopped. The internal error codes are for internal use only and must never be returned to clients
(i.e., in a network response).
MongoDB uses `ErrorCodes` both internally and externally: a subset of error codes (e.g., `BadValue`)
are used externally to pass errors over the wire and to clients. These error codes are the means for
MongoDB processes (e.g., _mongod_ and _mongo_) to communicate errors, and are visible to client
applications. Other error codes are used internally to indicate the underlying reason for a failed
operation. For instance, `PeriodicJobIsStopped` is an internal error code that is passed to callback
functions running inside a [`PeriodicRunner`][periodic_runner_h] once the runner is stopped. The
internal error codes are for internal use only and must never be returned to clients (i.e., in a
network response).
Zero or more error categories can be assigned to `ErrorCodes`, which allows a single handler to
serve a group of `ErrorCodes`. `RetriableError`, for instance, is an `ErrorCategory` that includes
@ -140,10 +141,10 @@ operation that fails with any error code in this category can be safely retried.
we can use `ErrorCodes::is${category}(${error})` to check error categories. Both methods provide
similar functionality.
To represent the status of an executed operation (e.g., a command or a function invocation), we
use `Status` objects, which represent an error state or the absence thereof. A `Status` uses the
standardized `ErrorCodes` to determine the underlying cause of an error. It also allows assigning
a textual description, as well as code-specific extra info, to the error code for further
To represent the status of an executed operation (e.g., a command or a function invocation), we use
`Status` objects, which represent an error state or the absence thereof. A `Status` uses the
standardized `ErrorCodes` to determine the underlying cause of an error. It also allows assigning a
textual description, as well as code-specific extra info, to the error code for further
clarification. The extra info is a subclass of `ErrorExtraInfo` and specific to `ErrorCodes`. Look
for `extra` in [here][error_codes_yml] for reference.
@ -153,28 +154,26 @@ functions with multiple out parameters. We can either pass an error code or an a
`StatusWith` object, indicating failure or success of the operation. For examples of the proper
usage of `StatusWith`, see [mongo/base/status_with.h][status_with_h] and
[mongo/base/status_with_test.cpp][status_with_test_cpp]. It is highly recommended to use `uassert`
or `iassert` over `StatusWith`, and catch exceptions instead of checking `Status` objects
returned from functions. Using `StatusWith` to indicate exceptions, instead of throwing via
`uassert` and `iassert`, makes it very difficult to identify that an error has occurred, and
could lead to the wrong error being propagated.
or `iassert` over `StatusWith`, and catch exceptions instead of checking `Status` objects returned
from functions. Using `StatusWith` to indicate exceptions, instead of throwing via `uassert` and
`iassert`, makes it very difficult to identify that an error has occurred, and could lead to the
wrong error being propagated.
## Using noexcept
Server code should generally be written to be exception safe. Historically,
we've had bugs due to code being overzealously marked `noexcept`. In such
contexts, throwing an exception crashes the server, which can compromise
availability. However, _just_ removing `noexcept` from such code is not a viable
solution \- exception unsafe code may _need_ to crash in order to avoid causing
an even worse failure. We want to work towards ensuring that functions that
ought to be are in fact exception safe, and remove `noexcept` usage where it's
not warranted. Here, we outline guidelines for doing so.
Server code should generally be written to be exception safe. Historically, we've had bugs due to
code being overzealously marked `noexcept`. In such contexts, throwing an exception crashes the
server, which can compromise availability. However, _just_ removing `noexcept` from such code is not
a viable solution \- exception unsafe code may _need_ to crash in order to avoid causing an even
worse failure. We want to work towards ensuring that functions that ought to be are in fact
exception safe, and remove `noexcept` usage where it's not warranted. Here, we outline guidelines
for doing so.
Noexcept is a runtime check that terminates the process rather than allowing
the function to exit because of a throw. Noexcept may be used when it can be
thought of as a bug for any uncaught exception to be thrown. There is no
compile-time check that exceptions will not be thrown within a `noexcept`
function. Instead, putting `noexcept` on a function may be thought of as similar
to using invariant in the following way:
Noexcept is a runtime check that terminates the process rather than allowing the function to exit
because of a throw. Noexcept may be used when it can be thought of as a bug for any uncaught
exception to be thrown. There is no compile-time check that exceptions will not be thrown within a
`noexcept` function. Instead, putting `noexcept` on a function may be thought of as similar to using
invariant in the following way:
```c
// Example noexcept code.
@ -190,92 +189,80 @@ void func() try {
}
```
**As with invariant, be very careful when putting `noexcept` on a function that
interacts with untrusted input.** This has been the root cause of serious past
bugs.
**As with invariant, be very careful when putting `noexcept` on a function that interacts with
untrusted input.** This has been the root cause of serious past bugs.
### Adding or Removing noexcept
When considering removing `noexcept` from a function, the author of that change
must ensure that the functions implementation and its callsites are not
relying on the function not throwing for correctness. Because of this, **be
careful putting `noexcept` on a function** if theres a chance it may need to be
removed later. `noexcept` generally **should not be used** solely for reasons of
performance optimization. Aside from the cases listed in the next section, it
should not be assumed to improve performance without solid evidence.
When considering removing `noexcept` from a function, the author of that change must ensure that the
functions implementation and its callsites are not relying on the function not throwing for
correctness. Because of this, **be careful putting `noexcept` on a function** if theres a chance it
may need to be removed later. `noexcept` generally **should not be used** solely for reasons of
performance optimization. Aside from the cases listed in the next section, it should not be assumed
to improve performance without solid evidence.
If a part of the implementation would benefit from relying on not throwing, but
`noexcept` is not meant to be a part of the functions contract, it is acceptable
to use a try/catch/invariant construction similar to the example above or an
internal `noexcept` helper function.
If a part of the implementation would benefit from relying on not throwing, but `noexcept` is not
meant to be a part of the functions contract, it is acceptable to use a try/catch/invariant
construction similar to the example above or an internal `noexcept` helper function.
When adding or removing `noexcept`, also consider what types of exceptions are
possible in that context and in our codebase. Refer to the “Where Exceptions
are Possible” section for more details.
When adding or removing `noexcept`, also consider what types of exceptions are possible in that
context and in our codebase. Refer to the “Where Exceptions are Possible” section for more details.
If you are uncertain about adding or removing `noexcept` in a given situation,
reach out to \#server-programmability on slack.
If you are uncertain about adding or removing `noexcept` in a given situation, reach out to
\#server-programmability on slack.
### Cases Where noexcept is Encouraged
This list is not exhaustive and there are cases not enumerated here that are
valid uses of `noexcept`.
This list is not exhaustive and there are cases not enumerated here that are valid uses of
`noexcept`.
#### Move operations
Using `noexcept` with move operations allows operations to skip generating
exception handling code. If a types move operation will not throw exceptions,
it is strictly worse not to use `noexcept`. For instance, std::vector\<T\> can
use optimized versions of certain operations when T has `noexcept` move
operations. In these cases, **`noexcept` can be considered a requirement**. Of
course, if a move operation genuinely needs to throw exceptions, then dont
mark it `noexcept`. This should be very rare moves should be non-throwing in
almost all cases.
Using `noexcept` with move operations allows operations to skip generating exception handling code.
If a types move operation will not throw exceptions, it is strictly worse not to use `noexcept`.
For instance, std::vector\<T\> can use optimized versions of certain operations when T has
`noexcept` move operations. In these cases, **`noexcept` can be considered a requirement**. Of
course, if a move operation genuinely needs to throw exceptions, then dont mark it `noexcept`. This
should be very rare moves should be non-throwing in almost all cases.
#### Swap operations
Allows callers to optimize for an exception-free pathway. **Swap operations
should follow the same `noexcept` guidelines as move operations**.
Allows callers to optimize for an exception-free pathway. **Swap operations should follow the same
`noexcept` guidelines as move operations**.
#### Hash functions
Allows some hashing library types to optimize for an exception-free pathway.
This can even affect the behavior, performance, and even layout of certain
container types (such as libstdc++s
[unordered_map](https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html)).
**Hash functions should follow the `noexcept` guidelines as move operations.**
Allows some hashing library types to optimize for an exception-free pathway. This can even affect
the behavior, performance, and even layout of certain container types (such as libstdc++s
[unordered_map](https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html)). **Hash
functions should follow the `noexcept` guidelines as move operations.**
#### Destructors and “Destructor-Safe” Functions
Destructors are generally implicitly `noexcept`, and are encouraged to remain
implicitly `noexcept` \- that is, by not marking them with `noexcept(false)`.
Functions where “destructor safety” is a core part of their functionality **may
be marked `noexcept`**. This is not a requirement destructors are allowed to
call potentially-throwing functions. It is also not a blanket recommendation to
consider `noexcept` for all functions called from destructors. When calling a
potentially-throwing function from a destructor, think about whether or not it
can indeed throw in that context, and if exceptions need to be handled. If it
can indeed throw in that context, exceptions almost certainly need to be
handled \- otherwise the server will crash.
Destructors are generally implicitly `noexcept`, and are encouraged to remain implicitly `noexcept`
\- that is, by not marking them with `noexcept(false)`. Functions where “destructor safety” is a
core part of their functionality **may be marked `noexcept`**. This is not a requirement
destructors are allowed to call potentially-throwing functions. It is also not a blanket
recommendation to consider `noexcept` for all functions called from destructors. When calling a
potentially-throwing function from a destructor, think about whether or not it can indeed throw in
that context, and if exceptions need to be handled. If it can indeed throw in that context,
exceptions almost certainly need to be handled \- otherwise the server will crash.
The lambda passed to `ON_BLOCK_EXIT()` and `ScopeGuard()` should be treated
similarly to destructors: it is executed in a `noexcept` context (a destructor)
and marking it as such is discouraged as being noisy. But code intended to be
called from them can be.
The lambda passed to `ON_BLOCK_EXIT()` and `ScopeGuard()` should be treated similarly to
destructors: it is executed in a `noexcept` context (a destructor) and marking it as such is
discouraged as being noisy. But code intended to be called from them can be.
### Where Exceptions are Possible
In our codebase, generally DBException is the only type of exception that
should be crossing API boundaries. If an exception other than a DBException
does cross an API boundary, it should be considered a bug. Whichever component
throws the exception should handle it locally, even if only by translating it
to a DBException. Generally any caller you would consider to be an external
caller should be able to rely on DBException being the only exception type your
function will throw.
In our codebase, generally DBException is the only type of exception that should be crossing API
boundaries. If an exception other than a DBException does cross an API boundary, it should be
considered a bug. Whichever component throws the exception should handle it locally, even if only by
translating it to a DBException. Generally any caller you would consider to be an external caller
should be able to rely on DBException being the only exception type your function will throw.
Allocations using the global new allocator or std::allocator in our codebase do
not throw, instead terminating the process directly when OOM conditions are
encountered. As such, there is no need to handle exceptions from these sources.
Allocations using the global new allocator or std::allocator in our codebase do not throw, instead
terminating the process directly when OOM conditions are encountered. As such, there is no need to
handle exceptions from these sources.
## Gotchas
@ -284,10 +271,10 @@ Gotchas to watch out for:
- Generally, do not throw an `AssertionException` directly. Functions like `uasserted()` do work
beyond just that. In particular, it makes sure that the `getLastError` structures are set up
properly.
- Think about the location of your asserts in constructors, as the destructor would not be
called. But at a minimum, use `wassert` a lot therein, we want to know if something is wrong.
- Do **not** throw in destructors or allow exceptions to leak out (if you call a function that
may throw).
- Think about the location of your asserts in constructors, as the destructor would not be called.
But at a minimum, use `wassert` a lot therein, we want to know if something is wrong.
- Do **not** throw in destructors or allow exceptions to leak out (if you call a function that may
throw).
[raii]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization
[error_codes_yml]: ../src/mongo/base/error_codes.yml

View File

@ -6,18 +6,17 @@ branches, enhance diagnostics, or achieve any number of other aims. Fail points
configured, and disabled via command request to a remote process or via an API within the same
process.
For more on what test-only means and how to enable the `configureFailPoint` command, see [test_commands][test_only].
For more on what test-only means and how to enable the `configureFailPoint` command, see
[test_commands][test_only].
## Using Fail Points
A fail point must first be defined using `MONGO_FAIL_POINT_DEFINE(myFailPoint)`. This statement
adds the fail point to a registry and allows it to be evaluated in code. There are three common
patterns for evaluating a fail point:
A fail point must first be defined using `MONGO_FAIL_POINT_DEFINE(myFailPoint)`. This statement adds
the fail point to a registry and allows it to be evaluated in code. There are three common patterns
for evaluating a fail point:
- Exercise a rarely used branch:
`if (whenPigsFly || myFailPoint.shouldFail()) { ... }`
- Block until the fail point is unset:
`myFailPoint.pauseWhileSet();`
- Exercise a rarely used branch: `if (whenPigsFly || myFailPoint.shouldFail()) { ... }`
- Block until the fail point is unset: `myFailPoint.pauseWhileSet();`
- Use the fail point's payload to perform custom behavior:
`myFailPoint.execute([](const BSONObj& data) { useMyPayload(data); };`
@ -30,9 +29,9 @@ Fail point configuration involves choosing a "mode" for activation (e.g., "alway
providing additional data in the form of a BSON object. For the vast majority of cases, this is done
by issuing a `configureFailPoint` command request. This is made easier in JavaScript using the
`configureFailPoint` helper from [fail_point_util.js][fail_point_util]. Fail points can also be
useful in C++ unit tests and integration tests. To configure fail points on the local process, use
a `FailPointEnableBlock` to enable and configure the fail point for a given block scope. Finally,
a fail point can also be set via setParameter by its name prefixed with "failpoint." (e.g.,
useful in C++ unit tests and integration tests. To configure fail points on the local process, use a
`FailPointEnableBlock` to enable and configure the fail point for a given block scope. Finally, a
fail point can also be set via setParameter by its name prefixed with "failpoint." (e.g.,
"failpoint.myFailPoint").
Users can also wait until a fail point has been evaluated a certain number of times **_over its
@ -50,8 +49,8 @@ command implementations, see [here][fail_point_commands].
The `failCommand` fail point is a special fail point used to mock arbitrary response behaviors to
requests filtered by command, appName, etc. It is most often used to simulate specific conditions
between nodes like invalid replica set configurations. For examples of use, see the
[failCommand JavaScript tests][fail_command_javascript_test].
between nodes like invalid replica set configurations. For examples of use, see the [failCommand
JavaScript tests][fail_command_javascript_test].
[fail_point]: ../src/mongo/util/fail_point.h
[fail_point_test]: ../src/mongo/util/fail_point_test.cpp

View File

@ -68,11 +68,11 @@ Future<Message> call(Message& toSend) {
First, notice that our calls to `TransportSession::sourceMessage` and
`TransportSession::sinkMessage` have been replaced with calls to asynchronous versions of those
functions. These asynchronous versions are future-returning; they don't block, but also don't return
a result right away. Instead, they return a future that we can chain continuations onto; `then,
onError` and `onCompletion` are all member functions of `Future<T>` that take a callable as argument
and invoke that callable when the chained-to future is ready. Unsurprisingly, continuations chained
with `.then` are run when the future is readied successfully with a `T`, and therefore callables
chained with `.then` should take a `T` as argument. Mirroring this behavior, `.onError`
a result right away. Instead, they return a future that we can chain continuations onto;
`then, onError` and `onCompletion` are all member functions of `Future<T>` that take a callable as
argument and invoke that callable when the chained-to future is ready. Unsurprisingly, continuations
chained with `.then` are run when the future is readied successfully with a `T`, and therefore
callables chained with `.then` should take a `T` as argument. Mirroring this behavior, `.onError`
continuations are run only when the future is readied with an error, and continuations chained this
way take a `Status` as argument which they can inspect to discover the error explaining why a `T`
could not be delivered. Continuations chained with `.onCompletion` are run when the future resolves,
@ -107,18 +107,17 @@ associated Futures exactly one time, and must do so before being destroyed (othe
will be set with the `ErrorCodes::BrokenPromise` error, which is considered a programmer error and
may crash debug builds of the server in the future).
To create a `Promise` that has a Future, you may use the [`PromiseAndFuture<T>`][pf]
utility type. Upon construction, it contains a created `Promise<T>` and its
corresponding `Future<T>`. The perhaps-familiar `makePromiseFuture<T>` factory
function now simply returns `PromiseAndFuture<T>{}`.
To create a `Promise` that has a Future, you may use the [`PromiseAndFuture<T>`][pf] utility type.
Upon construction, it contains a created `Promise<T>` and its corresponding `Future<T>`. The
perhaps-familiar `makePromiseFuture<T>` factory function now simply returns `PromiseAndFuture<T>{}`.
As was previously alluded to, it's
also possible to make a "ready future" - one that has no associated promise and is already filled
with a value or error. These might be useful in cases where the code that produces values in a way
that's normally asynchronous happens to have one available already when a request comes in, and
would like to return it right away. To create such a ready future, use `Future<T>::makeReady()`, or
the helper function [makeReadyFutureWith(Func&& func)][mrfw] which will call the specified `func`
and create a ready `Future` from its returned value.
As was previously alluded to, it's also possible to make a "ready future" - one that has no
associated promise and is already filled with a value or error. These might be useful in cases where
the code that produces values in a way that's normally asynchronous happens to have one available
already when a request comes in, and would like to return it right away. To create such a ready
future, use `Future<T>::makeReady()`, or the helper function [makeReadyFutureWith(Func&&
func)][mrfw] which will call the specified `func` and create a ready `Future` from its returned
value.
Lastly, there might be occasions when multiple futures should be fulfilled with the same value, at
the same time. This use case is best served by `SharedPromise` and the associated `SharedSemiFuture`
@ -144,8 +143,8 @@ calling threads, and return `Future<T>`s to those threads that will be readied o
available. The service may have its own internal threads it uses to produce `T`s, and doesn't want
to lend out its internal threads to do the work chained via continuations to the `Future<T>`s it's
given to calling threads. Instead, it needs to insist that continuations are not chained onto the
futures it gives out, or that the caller receiving the future
arranges for some _other_ thread to run continuations.
futures it gives out, or that the caller receiving the future arranges for some _other_ thread to
run continuations.
Fortunately, the service can enforce these guarantees using two types closely related to
`Future<T>`: the types `SemiFuture<T>` and `ExecutorFuture<T>`.
@ -270,33 +269,32 @@ will traverse the remaining continuation chain, and find the continuation chaine
is run.
Note that all of the continuation-chaining functions we've discussed, like `.then()`, return future-
like types themselves (i.e. `Future<T>`, `SemiFuture<T>`, and the like). When we chain
continuations in the manner we've been discussing here, subsequent continuations run when the future
returned by the previous continuation is ready, and the future-like type is "unwrapped" such that
the type wrapped by the future (or, in the case of failure, the error) is passed directly to the
subsequent continuation. For more detail on this topic, see the block comment above the
continuation-chaining member functions in [future.h][future], starting above the definition for
`then()`.
like types themselves (i.e. `Future<T>`, `SemiFuture<T>`, and the like). When we chain continuations
in the manner we've been discussing here, subsequent continuations run when the future returned by
the previous continuation is ready, and the future-like type is "unwrapped" such that the type
wrapped by the future (or, in the case of failure, the error) is passed directly to the subsequent
continuation. For more detail on this topic, see the block comment above the continuation-chaining
member functions in [future.h][future], starting above the definition for `then()`.
At some point, we may have no more continuations to add to a future chain, and will want to either
synchronously extract the value or error held in the last future of the chain, or add a callback to
asynchronously consume this value. The `.get()` and `.getAsync()` members of future-like types
provide these facilities for terminating a future chain by extracting or asynchronously
consuming the result of the chain. The `.getAsync()` function works much like `.onCompletion()`,
taking a `Status` or `StatusWith<T>` and running regardless of whether or not the previous link in
the chain resolved with error or success, and running asynchronously when the previous results are
ready (to determine what thread `.getAsync()` will run on, follow the rules laid out in the previous
"Where Do Continuations Run?" section.) Conversely, `.get()` takes no arguments, and blocks when it
is called until the entirety of the continuation chain is resolved, with the final result given back
to the blocking caller. Note that if the final result of the chain was an error that can be
converted to a MongoDB `Status` type (i.e. either a `Status`-family type or `DBException`), it will
be re-thrown as a `DBException` at the site where `.get()` is called when it is available. If the
code calling `.get()` is not capable of handling an exception, use `.getNoThrow()` instead to
extract the same error in the form of a `Status`. In the case of `.getAsync()`, all errors are
converted to `Status`, and crucially, callables chained as continuations via `.getAsync()` cannot
throw any exceptions, as there is no appropriate context with which to handle an asynchronous
exception. If an exception is thrown from a continuation chained via `.getAsync()`, the entire
process will be terminated (i.e. the program will crash).
provide these facilities for terminating a future chain by extracting or asynchronously consuming
the result of the chain. The `.getAsync()` function works much like `.onCompletion()`, taking a
`Status` or `StatusWith<T>` and running regardless of whether or not the previous link in the chain
resolved with error or success, and running asynchronously when the previous results are ready (to
determine what thread `.getAsync()` will run on, follow the rules laid out in the previous "Where Do
Continuations Run?" section.) Conversely, `.get()` takes no arguments, and blocks when it is called
until the entirety of the continuation chain is resolved, with the final result given back to the
blocking caller. Note that if the final result of the chain was an error that can be converted to a
MongoDB `Status` type (i.e. either a `Status`-family type or `DBException`), it will be re-thrown as
a `DBException` at the site where `.get()` is called when it is available. If the code calling
`.get()` is not capable of handling an exception, use `.getNoThrow()` instead to extract the same
error in the form of a `Status`. In the case of `.getAsync()`, all errors are converted to `Status`,
and crucially, callables chained as continuations via `.getAsync()` cannot throw any exceptions, as
there is no appropriate context with which to handle an asynchronous exception. If an exception is
thrown from a continuation chained via `.getAsync()`, the entire process will be terminated (i.e.
the program will crash).
## Notes and Links

View File

@ -2,31 +2,27 @@
title: FuzzTest
---
FuzzTest is a coverage-guided fuzzing framework for C++ that integrates
directly with GoogleTest. FuzzTest lets you write _property-based tests_: you
describe the shape of your inputs using typed _domains_, and the framework
generates and mutates values that satisfy those constraints. FuzzTest
uses Centipede as its fuzzing engine and AUBSAN to surface undefined
behavior.
FuzzTest is a coverage-guided fuzzing framework for C++ that integrates directly with GoogleTest.
FuzzTest lets you write _property-based tests_: you describe the shape of your inputs using typed
_domains_, and the framework generates and mutates values that satisfy those constraints. FuzzTest
uses Centipede as its fuzzing engine and AUBSAN to surface undefined behavior.
# When to use FuzzTest
- Your function under test accepts structured inputs (integers, strings,
custom types, BSON objects, etc.) rather than an opaque byte blob.
- You want to express correctness properties beyond "does not crash", such
as API invariants, differential equivalence, or roundtrip symmetry.
- You want a fuzz test that also runs cleanly as a unit test in normal CI,
without needing a special fuzzer build variant.
- Your function under test accepts structured inputs (integers, strings, custom types, BSON objects,
etc.) rather than an opaque byte blob.
- You want to express correctness properties beyond "does not crash", such as API invariants,
differential equivalence, or roundtrip symmetry.
- You want a fuzz test that also runs cleanly as a unit test in normal CI, without needing a special
fuzzer build variant.
# How to use FuzzTest
## The property function and FUZZ_TEST macro
A FuzzTest consists of a _property function_ and a registration macro.
The property function is a plain C++ function whose parameters define the
inputs to fuzz. The framework calls it repeatedly with generated values,
looking for any call that triggers an assertion failure or sanitizer
error.
A FuzzTest consists of a _property function_ and a registration macro. The property function is a
plain C++ function whose parameters define the inputs to fuzz. The framework calls it repeatedly
with generated values, looking for any call that triggers an assertion failure or sanitizer error.
```cpp
#include "fuzztest/fuzztest.h"
@ -38,14 +34,16 @@ void MyFunctionFuzzer(const std::string& input) {
FUZZ_TEST(MyTestSuite, MyFunctionFuzzer);
```
When no `.WithDomains()` clause is provided, each parameter defaults to
`fuzztest::Arbitrary<T>()`, which covers most standard library types.
When no `.WithDomains()` clause is provided, each parameter defaults to `fuzztest::Arbitrary<T>()`,
which covers most standard library types.
## Specifying input domains
Use `.WithDomains()` to constrain the generated inputs:
> ⚠️ **Warning:** Never initialize input domains with global objects initialized in other compilation units. For more information see [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md)
> ⚠️ **Warning:** Never initialize input domains with global objects initialized in other
> compilation units. For more information see
> [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md)
```cpp
void ProcessRequestFuzzer(int opcode, const std::string& payload) {
@ -56,14 +54,18 @@ FUZZ_TEST(MyTestSuite, ProcessRequestFuzzer)
/*payload=*/fuzztest::Arbitrary<std::string>());
```
FuzzTest ships with a rich set of built-in domains. A complete list of default types implemented in fuzztest can be found in the [Fuzztest Domain Reference](https://github.com/google/fuzztest/blob/main/doc/domains-reference.md). Also see [BSON Fuzzing](#fuzzing-bson).
FuzzTest ships with a rich set of built-in domains. A complete list of default types implemented in
fuzztest can be found in the
[Fuzztest Domain Reference](https://github.com/google/fuzztest/blob/main/doc/domains-reference.md).
Also see [BSON Fuzzing](#fuzzing-bson).
## Providing seeds
Seed values give the fuzzer a head start by providing known-interesting
inputs to mutate:
Seed values give the fuzzer a head start by providing known-interesting inputs to mutate:
> ⚠️ **Warning:** Never initialize seeds with global objects initialized in other compilation units. For more information see [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md)
> ⚠️ **Warning:** Never initialize seeds with global objects initialized in other compilation units.
> For more information see
> [Fuzz_Test Macro](https://github.com/google/fuzztest/blob/main/doc/fuzz-test-macro.md)
```cpp
FUZZ_TEST(MyTestSuite, ProcessRequestFuzzer)
@ -82,11 +84,9 @@ FUZZ_TEST(MyTestSuite, ProcessRequestFuzzer)
## Common correctness patterns
Beyond "does not crash", FuzzTest makes it easy to assert higher-level
properties.
Beyond "does not crash", FuzzTest makes it easy to assert higher-level properties.
**Roundtrip**: verify that encode→decode (or serialize→parse) is the
identity:
**Roundtrip**: verify that encode→decode (or serialize→parse) is the identity:
```cpp
void SerializeRoundtrips(const MyMessage& msg) {
@ -97,8 +97,7 @@ void SerializeRoundtrips(const MyMessage& msg) {
FUZZ_TEST(MyTestSuite, SerializeRoundtrips);
```
**Differential fuzzing**: compare two implementations of the same
operation:
**Differential fuzzing**: compare two implementations of the same operation:
```cpp
void ImplementationsAgree(const std::string& input) {
@ -109,10 +108,11 @@ FUZZ_TEST(MyTestSuite, ImplementationsAgree);
## Using fixtures
If your test requires expensive one-time setup (e.g. starting a service),
use a fixture with `FUZZ_TEST_F`. Any default-constructible class can be
a fixture; the constructor and destructor run once for the whole fuzz test,
not once per iteration. When using fixtures, care should be taken to ensure that only the initial fixture state is retained. Program state created during a test _**must**_ not affect or be affected by subsequent iterations.
If your test requires expensive one-time setup (e.g. starting a service), use a fixture with
`FUZZ_TEST_F`. Any default-constructible class can be a fixture; the constructor and destructor run
once for the whole fuzz test, not once per iteration. When using fixtures, care should be taken to
ensure that only the initial fixture state is retained. Program state created during a test
_**must**_ not affect or be affected by subsequent iterations.
```cpp
class MyServiceFuzzTest {
@ -132,10 +132,10 @@ FUZZ_TEST_F(MyServiceFuzzTest, RequestFuzzer);
## Fuzzing BSON
MongoDB provides a custom FuzzTest domain for generating valid BSON
objects: `mongo::bson_mutator::BSONObjImpl`. It is registered as the
`Arbitrary<ConstSharedBuffer>` specialization, so any fuzz test that
accepts a `ConstSharedBuffer` will automatically receive well-formed BSON.
MongoDB provides a custom FuzzTest domain for generating valid BSON objects:
`mongo::bson_mutator::BSONObjImpl`. It is registered as the `Arbitrary<ConstSharedBuffer>`
specialization, so any fuzz test that accepts a `ConstSharedBuffer` will automatically receive
well-formed BSON.
```cpp
#include "mongo/bson/bson_mutator/bson_mutator.h"
@ -147,8 +147,7 @@ void MyCommandFuzzer(ConstSharedBuffer input) {
FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer);
```
To constrain which fields are present and their types, use the
`.With<Type>()` builders:
To constrain which fields are present and their types, use the `.With<Type>()` builders:
```cpp
FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer)
@ -158,8 +157,8 @@ FUZZ_TEST(MyCommandFuzzTest, MyCommandFuzzer)
.WithLong("limit", fuzztest::InRange(0LL, 1000LL)));
```
Fields added via `.With<Type>()` are not guaranteed to appear in every
generated object, which exercises missing-field error handling as well.
Fields added via `.With<Type>()` are not guaranteed to appear in every generated object, which
exercises missing-field error handling as well.
Use `.WithVariant()` when a field may legally hold more than one type:
@ -171,8 +170,7 @@ fuzztest::Arbitrary<mongo::ConstSharedBuffer>()
});
```
Use `.WithAny()` when a key should be present but its type is
unconstrained:
Use `.WithAny()` when a key should be present but its type is unconstrained:
```cpp
fuzztest::Arbitrary<mongo::ConstSharedBuffer>().WithAny("filter");
@ -180,8 +178,8 @@ fuzztest::Arbitrary<mongo::ConstSharedBuffer>().WithAny("filter");
## Bazel target
Use `mongo_cc_fuzztest` (from `//bazel:mongo_src_rules.bzl`) to declare a
fuzz test target. It links in FuzzTest and GoogleTest automatically:
Use `mongo_cc_fuzztest` (from `//bazel:mongo_src_rules.bzl`) to declare a fuzz test target. It links
in FuzzTest and GoogleTest automatically:
```python
mongo_cc_fuzztest(
@ -198,8 +196,8 @@ mongo_cc_fuzztest(
## Unit test mode
Every `FUZZ_TEST` is also a regular GoogleTest test. In unit test mode,
the property function is called a small number of times with minimal inputs. This lets fuzz tests run in ordinary CI
Every `FUZZ_TEST` is also a regular GoogleTest test. In unit test mode, the property function is
called a small number of times with minimal inputs. This lets fuzz tests run in ordinary CI
alongside unit tests:
```
@ -208,10 +206,9 @@ bazel test --compiler_type=clang --config=fuzztest --fsan --opt=debug --allocato
## Fuzzing mode
Fuzzing mode enables sanitizer and coverage instrumentation and runs the
test indefinitely (or until a crash is found). It requires the `fsan`
build configuration. Check our Evergreen configuration for the current
bazel arguments, or run:
Fuzzing mode enables sanitizer and coverage instrumentation and runs the test indefinitely (or until
a crash is found). It requires the `fsan` build configuration. Check our Evergreen configuration for
the current bazel arguments, or run:
```
bazel run --compiler_type=clang --config=fuzztest --fsan --opt=debug --allocator=system +my_command_fuzztest -- \
@ -226,7 +223,9 @@ bazel run --compiler_type=clang --config=fuzztest --fsan --opt=debug --allocator
## Evergreen
Fuzz tests defined in bazel using `mongo_cc_fuzztest` will periodically run on the master branch in evergreen. The compiled tests and their associated corpus are saved to S3 and can be downloaded for debugging issues. The corpus is reused between evergreen runs in order to increase fuzzing coverage.
Fuzz tests defined in bazel using `mongo_cc_fuzztest` will periodically run on the master branch in
evergreen. The compiled tests and their associated corpus are saved to S3 and can be downloaded for
debugging issues. The corpus is reused between evergreen runs in order to increase fuzzing coverage.
## Useful flags

View File

@ -33,24 +33,24 @@ outputs.
code changes.
- Multiple test variations MAY be bundled into a single test. Recommended when testing same feature
with different inputs. This helps reviewing the outputs by grouping similar tests together, and also
reduces the number of output files.
with different inputs. This helps reviewing the outputs by grouping similar tests together, and
also reduces the number of output files.
- Changes to test fixture or test code that affect non-trivial amount test outputs MUST BE done in
separate pull request from production code changes:
- Pull request for test code only changes can be easily reviewed, even if large number of test
outputs are modified. While such changes can still introduce merge conflicts, they don't introduce
risk of regression (if outputs were valid
outputs are modified. While such changes can still introduce merge conflicts, they don't
introduce risk of regression (if outputs were valid
- Pull requests with mixed production
- Tests in the same suite SHOULD share the fixtures when appropriate. This reduces cost of adding
new tests to the suite. Changes to the fixture may only affect expected outputs from that fixtures,
and those output can be updated in bulk.
new tests to the suite. Changes to the fixture may only affect expected outputs from that
fixtures, and those output can be updated in bulk.
- Tests in different suites SHOULD NOT reuse/share fixtures. Changes to the fixture can affect large
number of expected outputs.
There are exceptions to that rule, and tests in different suites MAY reuse/share fixtures if:
number of expected outputs. There are exceptions to that rule, and tests in different suites MAY
reuse/share fixtures if:
- Test fixture is considered stable and changes rarely.
- Tests suites are related, either by sharing tests, or testing similar components.
@ -59,9 +59,8 @@ outputs.
- Tests SHOULD print both inputs and outputs of the tested code. This makes it easy for reviewers to
verify of the expected outputs are indeed correct by having both input and output next to each
other.
Otherwise finding the input used to produce the new output may not be practical, and might not even
be included in the diff.
other. Otherwise finding the input used to produce the new output may not be practical, and might
not even be included in the diff.
- When resolving merge conflicts on the expected output files, one of the approaches below SHOULD be
used:
@ -71,8 +70,8 @@ outputs.
hanges done by local branch.
- "Accept yours", rerun the tests and verify the new outputs. This approach requires knowledge of
production/test code changes in "theirs" branch. However, if such changes resulted in
straightforward and repetitive output changes, like due to printing code change or fixture change,
it may be easier to verify than reinspecting local changes.
straightforward and repetitive output changes, like due to printing code change or fixture
change, it may be easier to verify than reinspecting local changes.
- Expected test outputs SHOULD be reused across tightly-coupled test suites. The suites are
tightly-coupled if:
@ -92,8 +91,8 @@ outputs.
- Versioned tests, where expected behavior is the same for majority of test inputs/scenarios.
- AVOID manually modifying expected output files. Those files are considered to be auto generated.
Instead, run the tests and then copy the generated output as a new expected output file. See "How to
diff and accept new test outputs" section for instructions.
Instead, run the tests and then copy the generated output as a new expected output file. See "How
to diff and accept new test outputs" section for instructions.
# How to use write Golden Data tests?
@ -121,9 +120,10 @@ outputs. Verifies the output with the expected output that is in the source repo
See: [golden_test.h](../src/mongo/unittest/golden_test.h)
Before running `bazel test`, set up the golden test framework as described in the `Setup` section below.
This will ensure that the C++ test outputs are written to a location where `buildscripts/golden_test.py`
can find them so that the `diff` and `accept` functions work as expected.
Before running `bazel test`, set up the golden test framework as described in the `Setup` section
below. This will ensure that the C++ test outputs are written to a location where
`buildscripts/golden_test.py` can find them so that the `diff` and `accept` functions work as
expected.
**Example:**
@ -160,8 +160,7 @@ TEST_F(MySuiteFixture, MyFeatureBTest) {
}
```
Also see self-test:
[golden_test_test.cpp](../src/mongo/unittest/golden_test_test.cpp)
Also see self-test: [golden_test_test.cpp](../src/mongo/unittest/golden_test_test.cpp)
# How to diff and accept new test outputs on a workstation
@ -177,13 +176,15 @@ buildscripts/golden_test.py requires a one-time workstation setup.
Note: this setup is only required to use buildscripts/golden_test.py itself. It is NOT required to
just run the Golden Data tests when not using buildscripts/golden_test.py.
1. Create a yaml config file, as described by [Appendix - Config file reference](#appendix---config-file-reference).
1. Create a yaml config file, as described by
[Appendix - Config file reference](#appendix---config-file-reference).
2. Set GOLDEN_TEST_CONFIG_PATH environment variable to config file location, so that is available
when running tests and when running buildscripts/golden_test.py tool.
### Automatic Setup
Use buildscripts/golden_test.py builtin setup to initialize default config for your current platform.
Use buildscripts/golden_test.py builtin setup to initialize default config for your current
platform.
**Instructions for Linux**
@ -195,8 +196,8 @@ buildscripts/golden_test.py setup
**Instructions for Windows**
Run buildscripts/golden_test.py setup utility.
You may be asked for a password, when not running in "Run as administrator" shell.
Run buildscripts/golden_test.py setup utility. You may be asked for a password, when not running in
"Run as administrator" shell.
```cmd
c:\python\python310\python.exe buildscripts/golden_test.py setup
@ -295,7 +296,8 @@ $> buildscripts/golden_test.py --help
### Update multiple expected files at once
Some tests will run in multiple passthroughs or build variants, so they have multiple expected files.
Some tests will run in multiple passthroughs or build variants, so they have multiple expected
files.
Whenever the test is updated, all the expected files should be updated together as well.
@ -306,8 +308,8 @@ buildscripts/golden_test.py --verbose clean-run-accept jstests/query_golden/NAME
This option uses `resmoke.py find-suites` to determine the passthrough suites a test belongs to and
runs them.
If the test is found to only belong to the `query_golden_classic` passthrough, it is assumed that
it can have multiple expected results due to being run under multiple build variants with a different
If the test is found to only belong to the `query_golden_classic` passthrough, it is assumed that it
can have multiple expected results due to being run under multiple build variants with a different
`internalQueryFrameworkControl` settings. So the test will be run with various values for
`internalQueryFrameworkControl`.
@ -348,22 +350,21 @@ outputRootPattern:
type: String
optional: true
description:
Root path patten that will be used to write expected and actual test outputs for all tests
in the test run.
If not specified a temporary folder location will be used.
Path pattern string may use '%' characters in the last part of the path. '%' characters in
the last part of the path will be replaced with random lowercase hexadecimal digits.
examples: /var/tmp/test_output/out-%%%%-%%%%-%%%%-%%%%
/var/tmp/test_output
Root path patten that will be used to write expected and actual test outputs for all tests in
the test run. If not specified a temporary folder location will be used. Path pattern string may
use '%' characters in the last part of the path. '%' characters in the last part of the path
will be replaced with random lowercase hexadecimal digits.
examples: /var/tmp/test_output/out-%%%%-%%%%-%%%%-%%%% /var/tmp/test_output
diffCmd:
type: String
optional: true
description: Shell command to diff a single golden test run output.
{{expected}} and {{actual}} variables should be used and will be replaced with expected and
actual output folder paths respectively.
This property is not used to decide whether the test passes or fails; it is only used to
display differences once we've decided that a test failed.
examples: git diff --no-index "{{expected}}" "{{actual}}"
diff -ruN --unidirectional-new-file --color=always "{{expected}}" "{{actual}}"
description:
Shell command to diff a single golden test run output. {{expected}} and {{actual}} variables
should be used and will be replaced with expected and actual output folder paths respectively.
This property is not used to decide whether the test passes or fails; it is only used to display
differences once we've decided that a test failed.
examples:
git diff --no-index "{{expected}}" "{{actual}}" diff -ruN --unidirectional-new-file
--color=always "{{expected}}" "{{actual}}"
```

View File

@ -142,8 +142,8 @@ mongo_idl_library(
```
Bazel knows how to invoke the IDL compiler and generate files in the build directory with the C++
code. This code can also be generated by `--build_tag_filters=gen_source` tag in bazel which is useful for
code navigation.
code. This code can also be generated by `--build_tag_filters=gen_source` tag in bazel which is
useful for code navigation.
The generated IDL code looks something like the simplified code below.
@ -206,17 +206,17 @@ fields on the `commands` object.
The special features/requirements of commands:
1. First element must match the name of the command, and the parsing rules of this element
can be customized via the `namespace` field.
1. First element must match the name of the command, and the parsing rules of this element can be
customized via the `namespace` field.
2. In `OP_MSG`, `$db` must be present or defaults to `admin`
3. Commands may have a `struct` as a reply
4. Commands may be a part of API Version 1
5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files
will automatically be chained to all commands. The IDL compiler imports
[`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct
defined in that file will be chained to all commands by default.
6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc
during parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl).
5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files will
automatically be chained to all commands. The IDL compiler imports
[`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct defined
in that file will be chained to all commands by default.
6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc during
parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl).
Example Command:
@ -388,7 +388,8 @@ void idlDeserialize(StringEnumEnum& en, ::mongo::StringData value, const IDLPars
constexpr ::mongo::StringData idlGetDefaultParserFieldName(StringEnumEnum) { return "StringEnumEnum"; }
```
These ADL hooks are not intended to be used directly by user code. See [Serialization/Deserialization API](#serializationdeserialization-api).
These ADL hooks are not intended to be used directly by user code. See
[Serialization/Deserialization API](#serializationdeserialization-api).
### Integer Enums
@ -420,7 +421,8 @@ std::int32_t idlSerialize(IntEnum value);
constexpr ::mongo::StringData idlGetDefaultParserFieldName(IntEnum) { return "IntEnum"; }
```
These ADL hooks are not intended to be used directly by user code. See [Serialization/Deserialization API](#serializationdeserialization-api).
These ADL hooks are not intended to be used directly by user code. See
[Serialization/Deserialization API](#serializationdeserialization-api).
### Serialization/Deserialization API
@ -432,9 +434,9 @@ The public API to serialize and deserialize IDL-generated enums is defined in
auto parsedEnum = idl::deserialize<IdlEnum>(value);
```
The definitions of `idl::serialize()` and `idl::deserialize()` rely on the autogenerated ADL hooks to
find the serializer/deserializer implementations for each enum. User code should use this public API
and not the ADL hooks directly.
The definitions of `idl::serialize()` and `idl::deserialize()` rely on the autogenerated ADL hooks
to find the serializer/deserializer implementations for each enum. User code should use this public
API and not the ADL hooks directly.
### Reference
@ -482,8 +484,8 @@ types allow users to customize IDL parsing for their own unique needs.
A field in a struct or command can be defined as a type but a field can also be an array, enum,
struct or variant. Declaring a field as something other then a type preferred to using types since
it allows more type information to be represented in IDL over C++. See `type` in the [field
reference](#struct-fields-attribute-reference) for more information.
it allows more type information to be represented in IDL over C++. See `type` in the
[field reference](#struct-fields-attribute-reference) for more information.
Type supports builtin BSON types like int32, int64, and string. These are types built into
`BSONElement`/`BSONObjBuilder`. It also supports custom types to give the code full control of
@ -529,11 +531,11 @@ The five key things to note in this example:
`BSONElement` as a parameter. The IDL generator has custom rules for `BSONElement`.
- `serializer` - omitted in this example because `BSONObjBuilder` has builtin support for
`std::string`
- `is_view` - indicates whether the type is a view or not. If the type is a view, then it's
possible that objects of the type will not own all of its members. If the type is not a view,
then objects of the type are guaranteed to own all of its members. This field is optional and
defaults to True. To reduce the size of the C++ representation of structs including this type,
you can specify this field as False if the type is not a view type.
- `is_view` - indicates whether the type is a view or not. If the type is a view, then it's possible
that objects of the type will not own all of its members. If the type is not a view, then objects
of the type are guaranteed to own all of its members. This field is optional and defaults to True.
To reduce the size of the C++ representation of structs including this type, you can specify this
field as False if the type is not a view type.
### Custom Types
@ -590,22 +592,29 @@ IDLAnyType:
- `std::vector<_>` - When using `std::vector<->`, the getters/setters using
`mongo::ConstDataRange` instead
- `deserializer` - string - a method name to all deserialize the type. Typically this is a function
that takes `BSONElement` as a parameter. The IDL generator has custom rules for `BSONElement`. - By default, IDL assumes it is a instance methods of `cpp_type`. - If prefixed with `::`, assumes the function is a global static function - By default, the deserializer's function signature is `<function_name>(<cpp_type>)`. - For `object` types, the deserializer's function signature is `<function_name>(const BSONObj&
obj)` - For `any` types, the deserializer's function signature is `<function_name>(BSONElement
element)`.
- `serializer` - string -a method name to all serialize the type. - By default, IDL assumes it is a instance methods of `cpp_type`. - If prefixed with `::`, assumes the function is a global static function - By default, the deserializer's function signature is `<type_append> <function_name>(const
<cpp_type>&)` where `type_append` is a type `BSONObjBuilder` understands. - For `object` types, the deserializer's function signature is `<function_name>(const BSONObj&
obj)` - For `any` types that are not in an array, the serializer's function signature is
`<function_name>(StringData fieldName, BSONObjBuilder* builder)`. - For `any` types that are in an array, the serializer's function signature is
that takes `BSONElement` as a parameter. The IDL generator has custom rules for `BSONElement`. -
By default, IDL assumes it is a instance methods of `cpp_type`. - If prefixed with `::`, assumes
the function is a global static function - By default, the deserializer's function signature is
`<function_name>(<cpp_type>)`. - For `object` types, the deserializer's function signature is
`<function_name>(const BSONObj& obj)` - For `any` types, the deserializer's function signature is
`<function_name>(BSONElement element)`.
- `serializer` - string -a method name to all serialize the type. - By default, IDL assumes it is a
instance methods of `cpp_type`. - If prefixed with `::`, assumes the function is a global static
function - By default, the deserializer's function signature is
`<type_append> <function_name>(const <cpp_type>&)` where `type_append` is a type `BSONObjBuilder`
understands. - For `object` types, the deserializer's function signature is
`<function_name>(const BSONObj& obj)` - For `any` types that are not in an array, the serializer's
function signature is `<function_name>(StringData fieldName, BSONObjBuilder* builder)`. - For
`any` types that are in an array, the serializer's function signature is
`<function_name>(BSONArrayBuilder* builder)`.
- `deserialize_with_tenant` - bool - if set, adds `TenantId` as the first parameter to
`deserializer`
- `internal_only` - bool - undocumented, DO NOT USE
- `default` - string - default value for a type. A field in a struct inherits this value if a field
does not set a default. See struct's `default` rules for more information.
- `is_view` - indicates whether the type is a view or not. If the type is a view, then it's
possible that objects of the type will not own all of its members. If the type is not a view,
then objects of the type are guaranteed to own all of its members.
- `is_view` - indicates whether the type is a view or not. If the type is a view, then it's possible
that objects of the type will not own all of its members. If the type is not a view, then objects
of the type are guaranteed to own all of its members.
## Structs
@ -638,9 +647,8 @@ exampleStruct:
optional: true
defaultedField:
description: >-
Most callers should rely on 42
as it is the answer to the question
of life the universe and everything.
Most callers should rely on 42 as it is the answer to the question of life the universe and
everything.
type: long
validator:
gt: 0
@ -762,8 +770,8 @@ multi level chained structs.
- `is_command_reply` - bool - if true, marks the struct as a command reply. A struct marked a
`is_command_reply` generates a parser that ignores known generic or common fields across all
replies when parsing replies (i.e. `ok`, `errmsg`, etc)
- `is_generic_cmd_list` - string - choice [`arg`, `reply`], if set, generates functions `bool
hasField(StringData)` and `bool shouldForwardToShards(StringData)` for each field in the
- `is_generic_cmd_list` - string - choice [`arg`, `reply`], if set, generates functions
`bool hasField(StringData)` and `bool shouldForwardToShards(StringData)` for each field in the
struct. If set to `arg`, the struct will automatically be chained to every `command`.
- `query_shape_component` - bool - true indicates this special serialization code will be generated
to serialize as a query shape
@ -784,10 +792,10 @@ hasField(StringData)` and `bool shouldForwardToShards(StringData)` for each fiel
have a variant of strings and structs.
- Variant string support differentiates the type to choose based on the BSON type.
- Variant struct support differentiates the type to choose based on the _first_ field of the
struct. The first field must be unique in each struct across the structs. When parsing a
BSON object as a variant of multiple structs, the parser assumes that the first field
declared in the IDL struct is always the first field in its BSON representation.
See `bulkWrite` for an example.
struct. The first field must be unique in each struct across the structs. When parsing a BSON
object as a variant of multiple structs, the parser assumes that the first field declared in
the IDL struct is always the first field in its BSON representation. See `bulkWrite` for an
example.
- `ignore` - bool - true means field generates no code but is ignored by the generated deserializer.
Used to deprecate fields that no longer have an affect but allow strict parsers to ignore them.
- `optional` - bool - true means the field is optional. Generated C++ type is
@ -819,8 +827,9 @@ Comparisons are generated with C++ operators for these comparisons
- `lt` - string - Validates field is less than or equal to `string`
- `gte` - string - Validates field is greater than `string`
- `lte` - string - Validates field is less than or equal to `string`
- `callback` - string - A static function to call of the shape `Status <function_name>(const
<cpp_type> value)`. For non-simple types, `value` is passed by const-reference.
- `callback` - string - A static function to call of the shape
`Status <function_name>(const <cpp_type> value)`. For non-simple types, `value` is passed by
const-reference.
## Commands
@ -830,24 +839,24 @@ the `command` object when compared to `struct`.
The special features:
1. First element must match the name of the command, and the parsing rules of this element
can be customized via the `namespace` field.
1. First element must match the name of the command, and the parsing rules of this element can be
customized via the `namespace` field.
2. In `OP_MSG`, `$db` must be present or defaults to `admin`
3. Commands may have a `struct` as a reply
4. Commands may be a part of API Version 1
5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files
will automatically be chained to all commands. The IDL compiler imports
[`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct
defined in that file will be chained to all commands by default.
6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc
during parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl).
5. Any structs marked with `is_generic_cmd_list: "arg"` that are in imported IDL files will
automatically be chained to all commands. The IDL compiler imports
[`generic_argument.idl`](generic_argument.idl) by default, so any generic argument struct defined
in that file will be chained to all commands by default.
6. Command replies ignore the generic arguments fields like `$clusterTime`, `ok`, etc during
parsing. The list of these fields is in [`generic_argument.idl`](generic_argument.idl).
The `namespace` field is the field that describes one kind of parameter a command takes.
1. `concatenate_with_db` - takes a collection name. Generates a method `const NamespaceString
getNamespace()`. Examples: `insert`, `update`, `delete`
2. `concatenate_with_db_or_uuid` - takes a collection name. Generates a method `const
NamespaceStringOrUUID& getNamespaceOrUUID()`. Examples: `find`, `count`
1. `concatenate_with_db` - takes a collection name. Generates a method
`const NamespaceString getNamespace()`. Examples: `insert`, `update`, `delete`
2. `concatenate_with_db_or_uuid` - takes a collection name. Generates a method
`const NamespaceStringOrUUID& getNamespaceOrUUID()`. Examples: `find`, `count`
3. `ignored` - ignores the first argument entirely. Examples: `hello`, `setParameter`, `ping`
4. `type` - takes a struct as the first argument. Examples: `getLog`, `clearLog`, `renameCollection`
@ -866,15 +875,16 @@ Commands can also specify their replies that they return. Replies are regular `s
- `immutable` - [see structs](#struct-reference)
- `non_const_getter` - [see structs](#struct-reference)
- `namespace` - string - choice of a string [`concatenate_with_db`, `concatenate_with_db_or_uuid`,
`ignored`, `type`]. Instructs how the value of command field should be parsed - `concatenate_with_db` - Indicates the command field is a string and should be treated as a
collection name. Typically used by commands that deal with collections. Automatically
concatenated with `$db` by the IDL parser. Adds a method `const NamespaceString getNamespace()`
to the generated class. - `concatenate_with_db_or_uuid` - Indicates the command field is a string or uuid, and should be
treated as a collection name. Typically used by commands that deal with collections.
Automatically concatenated with `$db` by the IDL parser. Adds a method `const
NamespaceStringOrUUID& getNamespaceOrUUID()` to the generated class. - `ignored` - Ignores the value of the command field. Used by commands that ignore their command
argument entirely - `type` - Indicates the command takes a custom type for the first field. `type` field must be
set.
`ignored`, `type`]. Instructs how the value of command field should be parsed -
`concatenate_with_db` - Indicates the command field is a string and should be treated as a
collection name. Typically used by commands that deal with collections. Automatically concatenated
with `$db` by the IDL parser. Adds a method `const NamespaceString getNamespace()` to the
generated class. - `concatenate_with_db_or_uuid` - Indicates the command field is a string or
uuid, and should be treated as a collection name. Typically used by commands that deal with
collections. Automatically concatenated with `$db` by the IDL parser. Adds a method
`const NamespaceStringOrUUID& getNamespaceOrUUID()` to the generated class. - `ignored` - Ignores
the value of the command field. Used by commands that ignore their command argument entirely -
`type` - Indicates the command takes a custom type for the first field. `type` field must be set.
- `type` - string - name of IDL type or struct to parse the command field as
- `command_name` - string - IDL generated parser expects the command to be named the name of YAML
map. This can be overwritten with `command_name`. Commands should be `camelCase`
@ -893,8 +903,8 @@ NamespaceStringOrUUID& getNamespaceOrUUID()` to the generated class. - `ignored`
### Access Check Reference
A list of privileges the command checks. Only applicable for commands that are a part of
API Version 1. Checked at runtime when test commands are enabled.
A list of privileges the command checks. Only applicable for commands that are a part of API
Version 1. Checked at runtime when test commands are enabled.
- `none` - bool - No privileges required
- `simple` - mapping - single [check or privilege](#check-or-privilege)
@ -1002,28 +1012,29 @@ unit tests exercise all features and combinations IDL can handle.
#### BSONObj Anchor
The parsing method a struct is initialized with indicates what type of ownership the constructed
object has on the `BSONObj` parameter. An internal `BSONObj` anchor ensures that the lifetime of
the `BSONObj` matches the lifetime of the object in the cases that the `BSONObj` parameter is
owned or shared.
object has on the `BSONObj` parameter. An internal `BSONObj` anchor ensures that the lifetime of the
`BSONObj` matches the lifetime of the object in the cases that the `BSONObj` parameter is owned or
shared.
#### View Types
If the struct is a view, then it's possible that objects of the type will not own all of its
members. If the struct is not a view, then objects of the type are guaranteed to own all of its
members. This is determined by recursively checking the fields of a struct. This info is used
during generation to determine whether or not a struct will need a `BSONObj` anchor.
members. This is determined by recursively checking the fields of a struct. This info is used during
generation to determine whether or not a struct will need a `BSONObj` anchor.
## Best Practices
IDL has been in use since 2017. In that time, here are a few best practices:
1. strict or non-strict parsers - Structs that are persisted to disk should set `strict: false`.
It's better for upgrade/downgrade. Commands should set `strict: true` or omit it as `strict:
true` is the default. 1. For persistance: For upgrade/downgrade, if a persisted document with a strict parser has a
field added in new version N+1 and then the user downgrades to old version N, the strict
parser will throw an exception and reject the document. If this document was part of the
storage catalog for instance, the server would fail to start. 2. For commands: By using strict parsers, it gives the server the ability to add fields without
the risk of clients accidentally sending fields with the same name that had been ignored.
It's better for upgrade/downgrade. Commands should set `strict: true` or omit it as
`strict: true` is the default. 1. For persistance: For upgrade/downgrade, if a persisted document
with a strict parser has a field added in new version N+1 and then the user downgrades to old
version N, the strict parser will throw an exception and reject the document. If this document
was part of the storage catalog for instance, the server would fail to start. 2. For commands: By
using strict parsers, it gives the server the ability to add fields without the risk of clients
accidentally sending fields with the same name that had been ignored.
2. Extending existing structs/commands - all new fields in a struct/command must be marked optional
to support backwards compatibility. For new structs/commands, there should be some required
fields. It does not matter if the struct is not persisted, non-optional fields break backwards

View File

@ -2,28 +2,26 @@
title: LibFuzzer
---
> **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See [FuzzTest](fuzztest.md) for new fuzzing implementations
> **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See
> [FuzzTest](fuzztest.md) for new fuzzing implementations
LibFuzzer is a tool for performing coverage guided fuzzing of C/C++
code. LibFuzzer will try to trigger AUBSAN failures in a function you
provide, by repeatedly calling it with a carefully crafted byte array as
input. Each input will be assigned a "score". Byte arrays which exercise
new or more regions of code will score better. LibFuzzer will merge and
mutate high scoring inputs in order to gradually cover more and more
possible behavior.
LibFuzzer is a tool for performing coverage guided fuzzing of C/C++ code. LibFuzzer will try to
trigger AUBSAN failures in a function you provide, by repeatedly calling it with a carefully crafted
byte array as input. Each input will be assigned a "score". Byte arrays which exercise new or more
regions of code will score better. LibFuzzer will merge and mutate high scoring inputs in order to
gradually cover more and more possible behavior.
# When to use LibFuzzer
> **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See [FuzzTest](fuzztest.md) for new fuzzing implementations
> **!!NOTE!!**: LibFuzzer is deprecated and should not be used for new fuzz tests. See
> [FuzzTest](fuzztest.md) for new fuzzing implementations
LibFuzzer is great for testing functions which accept a opaque blob of
untrusted user-provided data.
LibFuzzer is great for testing functions which accept a opaque blob of untrusted user-provided data.
# How to use LibFuzzer
LibFuzzer implements `int main`, and expects to be linked with an object
file which provides the function under test. You will achieve this by
writing a cpp file which implements
LibFuzzer implements `int main`, and expects to be linked with an object file which provides the
function under test. You will achieve this by writing a cpp file which implements
```cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
@ -31,26 +29,22 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
}
```
`LLVMFuzzerTestOneInput` will be called repeatedly, with fuzzer
generated bytes in `Data`. `Size` will always truthfully tell your
implementation how many bytes are in `Data`. If your function crashes or
induces an AUBSAN fault, LibFuzzer will consider that to be a finding
worth reporting.
`LLVMFuzzerTestOneInput` will be called repeatedly, with fuzzer generated bytes in `Data`. `Size`
will always truthfully tell your implementation how many bytes are in `Data`. If your function
crashes or induces an AUBSAN fault, LibFuzzer will consider that to be a finding worth reporting.
Keep in mind that your function will often "just" be adapting `Data` to
whatever format our internal C++ functions requires. However, you have a
lot of freedom in exactly what you choose to do. Just make sure your
function crashes or produces an invariant when something interesting
happens! As just a few ideas:
Keep in mind that your function will often "just" be adapting `Data` to whatever format our internal
C++ functions requires. However, you have a lot of freedom in exactly what you choose to do. Just
make sure your function crashes or produces an invariant when something interesting happens! As just
a few ideas:
- You might choose to call multiple implementations of a single
operation, and validate that they produce the same output when
presented the same input.
- You could tease out individual bytes from `Data` and provide them as
different arguments to the function under test.
- You might choose to call multiple implementations of a single operation, and validate that they
produce the same output when presented the same input.
- You could tease out individual bytes from `Data` and provide them as different arguments to the
function under test.
Finally, your cpp file will need a bazel target. There is a method which
defines fuzzer targets, much like how we define unittests. For example:
Finally, your cpp file will need a bazel target. There is a method which defines fuzzer targets,
much like how we define unittests. For example:
```python
mongo_cc_fuzzer_test(
@ -70,25 +64,21 @@ defines fuzzer targets, much like how we define unittests. For example:
# Running LibFuzzer
Your test's object file and **all** of its dependencies must be compiled
with the "fuzzer" sanitizer, plus a set of sanitizers which might
produce interesting runtime errors like AUBSAN. Evergreen has a build
variant, whose name will include the string "FUZZER", which will compile
and run all of the fuzzer tests.
Your test's object file and **all** of its dependencies must be compiled with the "fuzzer"
sanitizer, plus a set of sanitizers which might produce interesting runtime errors like AUBSAN.
Evergreen has a build variant, whose name will include the string "FUZZER", which will compile and
run all of the fuzzer tests.
The fuzzers can be built locally, for development and debugging. Check
our Evergreen configuration for the current bazel arguments.
The fuzzers can be built locally, for development and debugging. Check our Evergreen configuration
for the current bazel arguments.
LibFuzzer binaries will accept a path to a directory containing its
"corpus". A corpus is a list of examples known to produce interesting
outputs. LibFuzzer will start producing interesting results more quickly
if starts off with a set of inputs which it can begin mutating. When its
done, it will write down any new inputs it discovered into its corpus.
Re-using a corpus across executions is a good way to make LibFuzzer
return more results in less time. Our Evergreen tasks will try to
acquire and re-use a corpus from an earlier commit, if it can.
LibFuzzer binaries will accept a path to a directory containing its "corpus". A corpus is a list of
examples known to produce interesting outputs. LibFuzzer will start producing interesting results
more quickly if starts off with a set of inputs which it can begin mutating. When its done, it will
write down any new inputs it discovered into its corpus. Re-using a corpus across executions is a
good way to make LibFuzzer return more results in less time. Our Evergreen tasks will try to acquire
and re-use a corpus from an earlier commit, if it can.
# References
- [LibFuzzer's official
documentation](https://llvm.org/docs/LibFuzzer.html)
- [LibFuzzer's official documentation](https://llvm.org/docs/LibFuzzer.html)

View File

@ -60,9 +60,8 @@ Ex: `bash buildscripts/yamllinters.sh`
## Python Linters
The `bazel run lint` command runs all Python linters as well as several other linters in our code base. You can
run auto-remediations via:
`bazel run lint --fix`.
The `bazel run lint` command runs all Python linters as well as several other linters in our code
base. You can run auto-remediations via: `bazel run lint --fix`.
Ex: `bazel run lint`

View File

@ -1,18 +1,18 @@
# Proxy protocol support
`mongod` and `mongos` have built-in support for connections made via L4 load balancers using
the [proxy protocol][proxy-protocol-url] header. Placing `mongos` or `mongod` behind load balancers
`mongod` and `mongos` have built-in support for connections made via L4 load balancers using the
[proxy protocol][proxy-protocol-url] header. Placing `mongos` or `mongod` behind load balancers
requires proper configuration of the load balancers, `mongos`, and `mongod`.
# Configuring mongod
To use `mongod` with a L4 load balancer (or reverse proxy) it _must_ be configured with the
`proxyPort` config option whose value can be specified at program start in any of the ways
mentioned in the server config documentation. This config option opens a new port to which the
L4 load balancer _must_ connect.
`proxyPort` config option whose value can be specified at program start in any of the ways mentioned
in the server config documentation. This config option opens a new port to which the L4 load
balancer _must_ connect.
The L4 load balancer (or reverse proxy) _must_ emit a [proxy protocol][proxy-protocol-url] header
at the start of its connection stream. `mongod` supports both version 1 and version 2 of the proxy
The L4 load balancer (or reverse proxy) _must_ emit a [proxy protocol][proxy-protocol-url] header at
the start of its connection stream. `mongod` supports both version 1 and version 2 of the proxy
standard.
# Reverse proxy vs load balancer
@ -20,8 +20,8 @@ standard.
Sharded clusters might be configured to work with either a L4 load balancer or a reverse proxy. In
both cases the proxy or load balancer _must_ connect to the `mongos`'s load-balancer port.
Placing `mongos` behind a reverse proxy does not hide the list of `mongos`. The driver will choose
a specific `mongos` to connect to via the reverse proxy.
Placing `mongos` behind a reverse proxy does not hide the list of `mongos`. The driver will choose a
specific `mongos` to connect to via the reverse proxy.
Placing `mongos` behind an L4 load balancer hides the list of `mongos`. The driver only sees the
load balancer and, the connections it makes are routed by the load balancer to a `mongos`. There is
@ -33,11 +33,18 @@ that connections from a driver are distributed among multiple `mongos`.
When a sharded cluster is deployed with a reverse proxy, there are two conditions that must be
fulfilled :
- `mongos` must be configured with the [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/) `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in the server parameter documentation.
This option causes `mongos` to open a second port. All connections made from reverse proxy _must_ be made over this port, and no regular connections (without HAProxy protocol header) may be made over this port.
- The reverse proxy _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header
at the [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85). `mongos` [supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both version 1 and version 2 of the proxy
protocol standard.
- `mongos` must be configured with the
[MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/)
`loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in
the server parameter documentation. This option causes `mongos` to open a second port. All
connections made from reverse proxy _must_ be made over this port, and no regular connections
(without HAProxy protocol header) may be made over this port.
- The reverse proxy _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header at
the
[start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85).
`mongos`
[supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both
version 1 and version 2 of the proxy protocol standard.
The driver does not require any configuration change compared to a cluster without a reverse proxy.
@ -46,22 +53,32 @@ The driver does not require any configuration change compared to a cluster witho
When a sharded cluster is deployed with an L4 load balancer there are three conditions that must be
fulfilled :
- `mongos` must be configured with the [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/) `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in the server parameter documentation.
This option causes `mongos` to open a second port. All connections made from load
balancers _must_ be made over this port, and no regular connections (without HAProxy protocol header) may be made over this port.
- The L4 load balancer _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header
at the [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85). `mongos` [supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both version 1 and version 2 of the proxy
protocol standard.
- Clients (drivers or shells) connecting to a `mongos` through the load balancer must set the `loadBalanced` option,
e.g., when connecting to a local `mongos` instance through the load balancer, if the `loadBalancerPort` server parameter was set to 20100, the
connection string must be of the form `"mongodb://localhost:20100/?loadBalanced=true"`.
- `mongos` must be configured with the
[MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/)
`loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in
the server parameter documentation. This option causes `mongos` to open a second port. All
connections made from load balancers _must_ be made over this port, and no regular connections
(without HAProxy protocol header) may be made over this port.
- The L4 load balancer _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header at
the
[start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85).
`mongos`
[supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both
version 1 and version 2 of the proxy protocol standard.
- Clients (drivers or shells) connecting to a `mongos` through the load balancer must set the
`loadBalanced` option, e.g., when connecting to a local `mongos` instance through the load
balancer, if the `loadBalancerPort` server parameter was set to 20100, the connection string must
be of the form `"mongodb://localhost:20100/?loadBalanced=true"`.
There are some subtle behavioral differences that the load balancer options enable, chief of
which is how `mongos` deals with open cursors on client disconnection. Over a normal connection,
`mongos` will keep open cursors alive for a short while after client disconnection in case the
client reconnects and continues to request more from the given cursor. Since client reconnections
aren't expected behind a load balancer (as the load balancer will likely redirect a given client
to a different `mongos` instance upon reconnection), we eagerly [close cursors](https://github.com/mongodb/mongo/commit/b429d5dda98bbe18ab0851ffd1729d3b57fc8a4e) on load balanced
client disconnects. We also [abort any in-progress transactions](https://github.com/mongodb/mongo/commit/74628ed4e314dfe0fd69d3fbae1411981a869f6b) that were initiated by the load balanced client.
There are some subtle behavioral differences that the load balancer options enable, chief of which
is how `mongos` deals with open cursors on client disconnection. Over a normal connection, `mongos`
will keep open cursors alive for a short while after client disconnection in case the client
reconnects and continues to request more from the given cursor. Since client reconnections aren't
expected behind a load balancer (as the load balancer will likely redirect a given client to a
different `mongos` instance upon reconnection), we eagerly
[close cursors](https://github.com/mongodb/mongo/commit/b429d5dda98bbe18ab0851ffd1729d3b57fc8a4e) on
load balanced client disconnects. We also
[abort any in-progress transactions](https://github.com/mongodb/mongo/commit/74628ed4e314dfe0fd69d3fbae1411981a869f6b)
that were initiated by the load balanced client.
[proxy-protocol-url]: https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

View File

@ -1,9 +1,9 @@
# Log System Overview
The new log system adds capability to produce structured logs in the [Relaxed
Extended JSON 2.0.0][relaxed_json_2] format. The new API requires names to be
given to variables, forming field names for the variables in structured JSON
logs. Named variables are called attributes in the log system.
The new log system adds capability to produce structured logs in the [Relaxed Extended JSON
2.0.0][relaxed_json_2] format. The new API requires names to be given to variables, forming field
names for the variables in structured JSON logs. Named variables are called attributes in the log
system.
# Style guide
@ -13,43 +13,38 @@ Log lines are composed primarily of a message (`msg`) and attributes (`attr` fie
## Philosophy
As you write log messages, keep the following in mind: A big thing that makes
JSON and BSON useful as data formats is the ability to provide rich field names.
As you write log messages, keep the following in mind: A big thing that makes JSON and BSON useful
as data formats is the ability to provide rich field names.
What makes logv2 machine readable is that we write an intact Extended BSON
format.
What makes logv2 machine readable is that we write an intact Extended BSON format.
But, what makes these lines human readable is that the `msg` provides a simple,
clear context for interpreting well-formed field names and values in the `attr`
subdocument.
But, what makes these lines human readable is that the `msg` provides a simple, clear context for
interpreting well-formed field names and values in the `attr` subdocument.
## Specific Guidance
For maximum readability, a log message additionally has the least amount of
repetition possible, and shares attribute names with other related log lines.
For maximum readability, a log message additionally has the least amount of repetition possible, and
shares attribute names with other related log lines.
### Message (the msg field)
The `msg` field predicates a reader's interpretation of the log line. It should
be crafted with care and attention.
The `msg` field predicates a reader's interpretation of the log line. It should be crafted with care
and attention.
- Concisely describe what the log line is reporting, providing enough
context necessary for interpreting attribute field names and values
- Concisely describe what the log line is reporting, providing enough context necessary for
interpreting attribute field names and values
- Capitalize the first letter, as in a sentence
- Avoid unnecessary punctuation, but punctuate between sentences if using
multiple sentences
- Avoid unnecessary punctuation, but punctuate between sentences if using multiple sentences
- Do not conclude with punctuation
- You may occasionally encounter `msg` strings containing fmt-style
`{expr}` braces. These are legacy artifacts and should be rephrased
according to these guidelines.
- You may occasionally encounter `msg` strings containing fmt-style `{expr}` braces. These are
legacy artifacts and should be rephrased according to these guidelines.
### Attributes (fields in the attr subdocument)
The `attr` subdocument includes important metrics/statistics about the logged
event for the purposes of debugging or performance analysis. These variables
should be named very well, as though intended for a very human-readable portion
of the codebase (like config variable declaration, abstract class definitions,
etc.)
The `attr` subdocument includes important metrics/statistics about the logged event for the purposes
of debugging or performance analysis. These variables should be named very well, as though intended
for a very human-readable portion of the codebase (like config variable declaration, abstract class
definitions, etc.)
For `attr` field names, do the following:
@ -57,40 +52,38 @@ For `attr` field names, do the following:
The bar for understanding should be:
- Someone with reasonable understanding of mongod behavior should understand
immediately what is being logged
- Someone with reasonable troubleshooting skill should be able to extract doc-
or code-searchable phrases to learn about what is being logged
- Someone with reasonable understanding of mongod behavior should understand immediately what is
being logged
- Someone with reasonable troubleshooting skill should be able to extract doc- or code-searchable
phrases to learn about what is being logged
#### Precisely describe values and units
Exception: Do not add a unit suffix when logging a Duration type. The system
automatically adds this unit.
Exception: Do not add a unit suffix when logging a Duration type. The system automatically adds this
unit.
#### When providing an execution time attribute, ensure it is named "durationMillis"
To describe the execution time of an operation using our preferred method:
Specify an `attr` name of “duration” and provide a value using the Milliseconds
Duration type. The log system will automatically append "Millis" to the
attribute name.
To describe the execution time of an operation using our preferred method: Specify an `attr` name of
“duration” and provide a value using the Milliseconds Duration type. The log system will
automatically append "Millis" to the attribute name.
Alternatively, specify an `attr` name of “durationMillis” and provide the
number of milliseconds as an integer type.
Alternatively, specify an `attr` name of “durationMillis” and provide the number of milliseconds as
an integer type.
**Importantly**: downstream analysis tools will rely on this convention, as a
replacement for the "[0-9]+ms$" format of prior logs.
**Importantly**: downstream analysis tools will rely on this convention, as a replacement for the
"[0-9]+ms$" format of prior logs.
#### Use certain specific terms whenever possible
When logging the below information, do so with these specific terms:
- **namespace** - when logging a value of the form
"\<db name\>.\<collection name\>". Do not use "collection" or abbreviate to "ns"
- **namespace** - when logging a value of the form "\<db name\>.\<collection name\>". Do not use
"collection" or abbreviate to "ns"
- **db** - instead of "database"
- **error** - when an error occurs, instead of "status". Use this for objects
of type Status and DBException
- **reason** - to provide rationale for an event/action when "error" isn't
appropriate
- **error** - when an error occurs, instead of "status". Use this for objects of type Status and
DBException
- **reason** - to provide rationale for an event/action when "error" isn't appropriate
### Examples
@ -122,11 +115,10 @@ The log system is made available with the following header:
#include "mongo/logv2/log.h"
The macro `MONGO_LOGV2_DEFAULT_COMPONENT` is expanded by all logging macros.
This configuration macro must expand at their point of use to a `LogComponent`
expression, which is implicitly attached to the emitted message. It is
conventionally defined near the top of a `.cpp` file after headers are included,
and before any logging macros are invoked. Example:
The macro `MONGO_LOGV2_DEFAULT_COMPONENT` is expanded by all logging macros. This configuration
macro must expand at their point of use to a `LogComponent` expression, which is implicitly attached
to the emitted message. It is conventionally defined near the top of a `.cpp` file after headers are
included, and before any logging macros are invoked. Example:
#define MONGO_LOGV2_DEFAULT_COMPONENT ::mongo::logv2::LogComponent::kDefault
@ -138,22 +130,19 @@ Logging is performed using function style macros:
...,
"nameN"_attr = varN);
The ID is a signed 32bit integer in the same number space as the error code
numbers. It is used to uniquely identify a log statement. If changing existing
code, using a new ID is strongly advised to avoid any parsing ambiguity. When
selecting ID during work on JIRA ticket `SERVER-ABCDE` you can use the JIRA
ticket number to avoid ID collisions with other engineers by taking ID from the
range `ABCDE00` - `ABCDE99`.
The ID is a signed 32bit integer in the same number space as the error code numbers. It is used to
uniquely identify a log statement. If changing existing code, using a new ID is strongly advised to
avoid any parsing ambiguity. When selecting ID during work on JIRA ticket `SERVER-ABCDE` you can use
the JIRA ticket number to avoid ID collisions with other engineers by taking ID from the range
`ABCDE00` - `ABCDE99`.
Attributes are created with the `_attr` user-defined literal. The intermediate
object that gets instantiated provides the assignment operator `=` for
assigning a value to the attribute.
Attributes are created with the `_attr` user-defined literal. The intermediate object that gets
instantiated provides the assignment operator `=` for assigning a value to the attribute.
The message string must be a compile time constant.
This is to avoid dynamic attribute names in the log output and to be able to
add compile time verification of log statements in the future. If the string
needs to be shared with anything else (like constructing a Status object) you
can use this pattern:
The message string must be a compile time constant. This is to avoid dynamic attribute names in the
log output and to be able to add compile time verification of log statements in the future. If the
string needs to be shared with anything else (like constructing a Status object) you can use this
pattern:
static constexpr char str[] = "the string";
@ -172,13 +161,12 @@ can use this pattern:
### Log Component
To override the default component, a separate logging API can be used that
takes a `LogOptions` structure:
To override the default component, a separate logging API can be used that takes a `LogOptions`
structure:
LOGV2_OPTIONS(options, message-string, attr0, ...);
`LogOptions` can be constructed with a `LogComponent` to avoid verbosity in the
log statement.
`LogOptions` can be constructed with a `LogComponent` to avoid verbosity in the log statement.
##### Example
@ -186,9 +174,8 @@ log statement.
### Log Severity
`LOGV2` is the logging macro for the default informational (0) severity. To log
to different severities there are separate logging macros to be used, they all
take paramaters like `LOGV2`:
`LOGV2` is the logging macro for the default informational (0) severity. To log to different
severities there are separate logging macros to be used, they all take paramaters like `LOGV2`:
- `LOGV2_WARNING`
- `LOGV2_ERROR`
@ -202,18 +189,17 @@ There is also variations that take `LogOptions` if needed:
- `LOGV2_ERROR_OPTIONS`
- `LOGV2_FATAL_OPTIONS`
Fatal level log statements using `LOGV2_FATAL` perform `fassert` after logging,
using the provided ID as assert id. `LOGV2_FATAL_NOTRACE` perform
`fassertNoTrace` and `LOGV2_FATAL_CONTINUE` does not `fassert` allowing for
continued execution. `LOGV2_FATAL_CONTINUE` is meant to be used when a fatal
error has occurred but a different way of halting execution is desired such as
`std::terminate` or `fassertFailedWithStatus`.
Fatal level log statements using `LOGV2_FATAL` perform `fassert` after logging, using the provided
ID as assert id. `LOGV2_FATAL_NOTRACE` perform `fassertNoTrace` and `LOGV2_FATAL_CONTINUE` does not
`fassert` allowing for continued execution. `LOGV2_FATAL_CONTINUE` is meant to be used when a fatal
error has occurred but a different way of halting execution is desired such as `std::terminate` or
`fassertFailedWithStatus`.
`LOGV2_FATAL_OPTIONS` performs `fassert` by default like `LOGV2_FATAL` but this
can be changed by setting the `FatalMode` on the `LogOptions`.
`LOGV2_FATAL_OPTIONS` performs `fassert` by default like `LOGV2_FATAL` but this can be changed by
setting the `FatalMode` on the `LogOptions`.
Debug-level logging is slightly different where an additional parameter (as
integer) required to indicate the desired debug level:
Debug-level logging is slightly different where an additional parameter (as integer) required to
indicate the desired debug level:
LOGV2_DEBUG(ID, debug-level, message-string, attr0, ...);
@ -224,17 +210,15 @@ integer) required to indicate the desired debug level:
message-string,
attr0, ...);
`LOGV2_PROD_ONLY` logs like a default `LOGV2` log in production, but debug-1 log
in internal testing. It accepts the same arguments as `LOGV2`. This log level is
for log lines that may be spammy in testing but are more rare in production. As
such, they may be useful in investigations. This level also preserves backwards
compatibility for logs that are no longer as useful as when they were introduced.
To determine whether to log, this macro uses the `LogSeverity::ProdOnly()`
level, which returns level `LogSeverity::Debug(1)` when in a testing environment
and `LogSeverity::Log()` otherwise. Whether the server is in a testing
environment is determined using the `enableTestCommands` server parameter.
It is preferred to use other macros over this one as it introduces a difference
between testing and production. There is also the `LOGV2_PROD_ONLY_OPTIONS`
`LOGV2_PROD_ONLY` logs like a default `LOGV2` log in production, but debug-1 log in internal
testing. It accepts the same arguments as `LOGV2`. This log level is for log lines that may be
spammy in testing but are more rare in production. As such, they may be useful in investigations.
This level also preserves backwards compatibility for logs that are no longer as useful as when they
were introduced. To determine whether to log, this macro uses the `LogSeverity::ProdOnly()` level,
which returns level `LogSeverity::Debug(1)` when in a testing environment and `LogSeverity::Log()`
otherwise. Whether the server is in a testing environment is determined using the
`enableTestCommands` server parameter. It is preferred to use other macros over this one as it
introduces a difference between testing and production. There is also the `LOGV2_PROD_ONLY_OPTIONS`
variation that takes `LogOptions`.
##### Example
@ -248,15 +232,13 @@ variation that takes `LogOptions`.
### Log Tags
Log tags are replacing the Tee from the old log system as the way to indicate
that the log should also be written to a `RamLog` (accessible with the `getLog`
command).
Log tags are replacing the Tee from the old log system as the way to indicate that the log should
also be written to a `RamLog` (accessible with the `getLog` command).
Tags are added to a log statement with the options API similarly to how
non-default components are specified by constructing a `LogOptions`.
Tags are added to a log statement with the options API similarly to how non-default components are
specified by constructing a `LogOptions`.
Multiple tags can be attached to a log statement using the bitwise or operator
`|`.
Multiple tags can be attached to a log statement using the bitwise or operator `|`.
##### Example
@ -267,19 +249,18 @@ Multiple tags can be attached to a log statement using the bitwise or operator
### Dynamic attributes
Sometimes there is a need to add attributes depending on runtime conditionals.
To support this there is the `DynamicAttributes` class that has an `add` method
to add named attributes one by one. This class is meant to be used when you
have this specific requirement and is not the general logging API.
Sometimes there is a need to add attributes depending on runtime conditionals. To support this there
is the `DynamicAttributes` class that has an `add` method to add named attributes one by one. This
class is meant to be used when you have this specific requirement and is not the general logging
API.
When finished, it is logged using the regular logging API but the
`DynamicAttributes` instance is passed as the first attribute parameter. Mixing
`_attr` literals with the `DynamicAttributes` is not supported.
When finished, it is logged using the regular logging API but the `DynamicAttributes` instance is
passed as the first attribute parameter. Mixing `_attr` literals with the `DynamicAttributes` is not
supported.
When using the `DynamicAttributes` you need to be careful about parameter
lifetimes. The `DynamicAttributes` binds attributes _by reference_ and the
reference must be valid when passing the `DynamicAttributes` to the log
statement.
When using the `DynamicAttributes` you need to be careful about parameter lifetimes. The
`DynamicAttributes` binds attributes _by reference_ and the reference must be valid when passing the
`DynamicAttributes` to the log statement.
##### Example
@ -321,11 +302,11 @@ Many basic types have built in support:
### User-defined types
To make a user-defined type loggable it needs a serialization member function
that the log system can bind to.
To make a user-defined type loggable it needs a serialization member function that the log system
can bind to.
The system binds and uses serialization functions by looking for functions in
the following priority order:
The system binds and uses serialization functions by looking for functions in the following priority
order:
- Structured serialization functions
- `void x.serialize(BSONObjBuilder*) const` (member)
@ -338,19 +319,18 @@ the following priority order:
- `x.toString() ` (member)
- `toString(x)` (non-member)
Enums cannot have member functions, but they will still try to bind to the
`toStringForLogging(e)` or `toString(e)` non-members. If neither is available,
the enum value will be logged as its underlying integral type.
Enums cannot have member functions, but they will still try to bind to the `toStringForLogging(e)`
or `toString(e)` non-members. If neither is available, the enum value will be logged as its
underlying integral type.
In order to offer structured serialization and output, a type would need to
supply a structured serialization function. Otherwise, if only stringification
is provided, the output will be an escaped string.
In order to offer structured serialization and output, a type would need to supply a structured
serialization function. Otherwise, if only stringification is provided, the output will be an
escaped string.
The `toStringForLogging` non-member is an ADL customization hook used to
override `toString` for very rare cases where `toString` is inappropriate for
logging perhaps because it's needed for other non-logging formatting. Usually a
`toString` (member or nonmember) is a sufficient customization point and should
be preferred as a canonical stringification of the object.
The `toStringForLogging` non-member is an ADL customization hook used to override `toString` for
very rare cases where `toString` is inappropriate for logging perhaps because it's needed for other
non-logging formatting. Usually a `toString` (member or nonmember) is a sufficient customization
point and should be preferred as a canonical stringification of the object.
_NOTE: No `operator<<` overload is used even if available_
@ -370,20 +350,19 @@ _NOTE: No `operator<<` overload is used even if available_
### Container support
STL containers and data structures that have STL like interfaces are loggable
as long as they contain loggable elements (built-in, user-defined or other
containers).
STL containers and data structures that have STL like interfaces are loggable as long as they
contain loggable elements (built-in, user-defined or other containers).
#### Sequential containers
Sequential containers like `std::vector`, `std::deque` and `std::list` are
loggable and the elements get formatted as JSON array in structured output.
Sequential containers like `std::vector`, `std::deque` and `std::list` are loggable and the elements
get formatted as JSON array in structured output.
#### Associative containers
Associative containers such as `std::map` and `stdx::unordered_map` loggable
with the requirement that they key is of a string type. The structured format
is a JSON object where the field names are the key.
Associative containers such as `std::map` and `stdx::unordered_map` loggable with the requirement
that they key is of a string type. The structured format is a JSON object where the field names are
the key.
#### Ranges
@ -392,11 +371,10 @@ Ranges is loggable via helpers to indicate what type of range it is
- `seqLog(begin, end)`
- `mapLog(begin, end)`
seqLog indicates that it is a sequential range where the iterators point to
loggable value directly.
seqLog indicates that it is a sequential range where the iterators point to loggable value directly.
mapLog indicates that it is a range coming from an associative container where
the iterators point to a key-value pair.
mapLog indicates that it is a range coming from an associative container where the iterators point
to a key-value pair.
##### Examples
@ -425,10 +403,9 @@ the iterators point to a key-value pair.
#### Containers and `uint64_t`
Logging of containers uses `BSONObj` as an internal representation and
`uint64_t` is not a supported type with `BSONObjBuilder::append()`. As a user
you can use `boost::transform_iterator` to cast the `uint64_t` to a supported
type.
Logging of containers uses `BSONObj` as an internal representation and `uint64_t` is not a supported
type with `BSONObjBuilder::append()`. As a user you can use `boost::transform_iterator` to cast the
`uint64_t` to a supported type.
##### Example
@ -448,17 +425,14 @@ type.
### Duration types
Duration types have special formatting to match existing practices in the
server code base. Their resulting format depends on the context they are
logged.
Duration types have special formatting to match existing practices in the server code base. Their
resulting format depends on the context they are logged.
When durations are formatted as JSON or BSON a unit suffix is added to the
attribute name when building the field name. The value will be count of the
duration as a number.
When durations are formatted as JSON or BSON a unit suffix is added to the attribute name when
building the field name. The value will be count of the duration as a number.
When logging containers with durations there is no attribute per duration
instance that can have the suffix added. In this case durations are instead
formatted as a BSON object.
When logging containers with durations there is no attribute per duration instance that can have the
suffix added. In this case durations are instead formatted as a BSON object.
##### Examples
@ -485,9 +459,9 @@ formatted as a BSON object.
# Attribute naming abstraction
The style guide contains recommendations for attribute naming in certain cases.
To make abstraction of attribute naming possible a `logAttrs` function can be
implemented as a friend function in a class with the following signature:
The style guide contains recommendations for attribute naming in certain cases. To make abstraction
of attribute naming possible a `logAttrs` function can be implemented as a friend function in a
class with the following signature:
class AnyUserType {
public:
@ -505,15 +479,13 @@ implemented as a friend function in a class with the following signature:
## Multiple attributes
In some cases a loggable type might be composed as a hierarchy in the C++ type
system which would lead to a very verbose structured log output as every level
in the hierarcy needs a name when outputted as JSON. The attribute naming
abstraction system can also be used to collapse such hierarchies. Instead of
making a type loggable it can instead return one or more attributes from its
In some cases a loggable type might be composed as a hierarchy in the C++ type system which would
lead to a very verbose structured log output as every level in the hierarcy needs a name when
outputted as JSON. The attribute naming abstraction system can also be used to collapse such
hierarchies. Instead of making a type loggable it can instead return one or more attributes from its
members by using `multipleAttrs` in `logAttrs` functions.
`multipleAttrs(...)` accepts attributes or instances of types with `logAttrs`
functions implemented.
`multipleAttrs(...)` accepts attributes or instances of types with `logAttrs` functions implemented.
##### Examples
@ -535,12 +507,11 @@ functions implemented.
## Handling temporary lifetime with multiple attributes
To avoid lifetime issues (log attributes bind their values by reference) it is
recommended to **not** create attributes when using `multipleAttrs` unless
attributes are created for members directly. If `logAttrs` or `""_attr=` is
used inside a `logAttrs` function on the return of a function returning by
value it will result in a dangling reference. The following example illustrates
the problem:
To avoid lifetime issues (log attributes bind their values by reference) it is recommended to
**not** create attributes when using `multipleAttrs` unless attributes are created for members
directly. If `logAttrs` or `""_attr=` is used inside a `logAttrs` function on the return of a
function returning by value it will result in a dangling reference. The following example
illustrates the problem:
class SomeSubType {
public:
@ -566,10 +537,9 @@ the problem:
std::string name_;
};
The better implementation would be to let the log system control the
lifetime by passing the instance to `multipleAttrs` without creating the
attribute. The log system will detect that it is not an attribute and will
attempt to create attributes by calling `logAttrs`:
The better implementation would be to let the log system control the lifetime by passing the
instance to `multipleAttrs` without creating the attribute. The log system will detect that it is
not an attribute and will attempt to create attributes by calling `logAttrs`:
friend auto logAttrs(const SomeType& type) {
return logv2::multipleAttrs("name"_attr=type.name(), type.sub());
@ -579,11 +549,10 @@ attempt to create attributes by calling `logAttrs`:
## Combining uassert with log statement
Code that emits a high severity log statement may also need to emit a `uassert`
after the log. There is the `UserAssertAfterLog` logging option that allows you
to re-use the log statement to do the formatting required for the `uassert`.
The assertion id can be either the logging ID by passing `UserAssertAfterLog`
with no arguments or the assertion id can set by constructing
Code that emits a high severity log statement may also need to emit a `uassert` after the log. There
is the `UserAssertAfterLog` logging option that allows you to re-use the log statement to do the
formatting required for the `uassert`. The assertion id can be either the logging ID by passing
`UserAssertAfterLog` with no arguments or the assertion id can set by constructing
`UserAssertAfterLog` with an `ErrorCodes::Error`.
The assertion reason string will be a plain text log and can be provided with additional attribute
@ -614,26 +583,23 @@ Would emit a `uassert` after performing the log that is equivalent to:
## Unstructured logging for local development
To make it easier to use the log system for tracing in local development, there
is a special API that does not use IDs or attribute names:
To make it easier to use the log system for tracing in local development, there is a special API
that does not use IDs or attribute names:
logd(format-string, value0, ..., valueN);
It formats the string using libfmt similarly to what
`fmt::format(format-string, value0, ..., valueN)` would produce but using the
regular log system type support on how types are made loggable. The formatted
string is logged as the `msg` field in the JSON output, with no `attr`
subobject.
`fmt::format(format-string, value0, ..., valueN)` would produce but using the regular log system
type support on how types are made loggable. The formatted string is logged as the `msg` field in
the JSON output, with no `attr` subobject.
When using `logd` the log will emitted with standard severity and the default
component.
When using `logd` the log will emitted with standard severity and the default component.
A difference from regular logging, `logd` is allowed to be used in header files
by including `logv2/log_debug.h`.
A difference from regular logging, `logd` is allowed to be used in header files by including
`logv2/log_debug.h`.
Unstructured logging is not allowed to be used in code committed to master,
there is a lint check to validate this. It is however allowed to be used in
Evergreen patch builds.
Unstructured logging is not allowed to be used in code committed to master, there is a lint check to
validate this. It is however allowed to be used in Evergreen patch builds.
##### Examples
@ -642,8 +608,8 @@ Evergreen patch builds.
## Rate limiting
Rate limiting logs is useful to reduce the impact of logging on database throughput. At high
rate and concurrency, logging can be expensive and reduce performance. Attention should be paid
Rate limiting logs is useful to reduce the impact of logging on database throughput. At high rate
and concurrency, logging can be expensive and reduce performance. Attention should be paid
specifically to logs that can occur on every operation, whether they fail or succeed.
The rate limiting feature is implemented by `SeveritySuppressor` (see
@ -653,8 +619,8 @@ severity; subsequent logs within that interval are emitted at a "quiet" severity
level). This ensures logs are not always written unless the logging level is increased for the
component.
`SeveritySuppressor` is typically used with `StaticImmortal` for static storage. The interval can
be configured with a server parameter when constructing SeveritySuppressor.
`SeveritySuppressor` is typically used with `StaticImmortal` for static storage. The interval can be
configured with a server parameter when constructing SeveritySuppressor.
##### Example
@ -666,18 +632,17 @@ be configured with a server parameter when constructing SeveritySuppressor.
"Slow network response send time",
"elapsed"_attr = bob.obj());
In this example, the first log within each gSlowNetworkLogRate-second window is emitted at Info level;
subsequent logs within that window are emitted at Debug(2), which requires increasing the component's
log level to be visible.
In this example, the first log within each gSlowNetworkLogRate-second window is emitted at Info
level; subsequent logs within that window are emitted at Debug(2), which requires increasing the
component's log level to be visible.
For per-key rate limiting (e.g., one log per key per interval), use `KeyedSeveritySuppressor`
instead.
# JSON output format
Produces structured logs of the [Relaxed Extended JSON 2.0.0][relaxed_json_2]
format. Below is an example of a log statement in C++ and a pretty-printed JSON
output:
Produces structured logs of the [Relaxed Extended JSON 2.0.0][relaxed_json_2] format. Below is an
example of a log statement in C++ and a pretty-printed JSON output:
C++ statement:
@ -717,5 +682,7 @@ Output:
---
[relaxed_json_2]: https://github.com/mongodb/specifications/blob/master/source/extended-json.rst
[_lastOplogEntryFetcherCallbackForStopTimestamp]: https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/initial_syncer.cpp#L1500-L1512
[_summarizeRollback]: https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/rollback_impl.cpp#L1263-L1305
[_lastOplogEntryFetcherCallbackForStopTimestamp]:
https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/initial_syncer.cpp#L1500-L1512
[_summarizeRollback]:
https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/rollback_impl.cpp#L1263-L1305

View File

@ -2,5 +2,5 @@
- Avoid using bare pointers for dynamically allocated objects. Prefer `std::unique_ptr`,
`std::shared_ptr`, or another RAII class such as `BSONObj`.
- If you assign the output of `new/malloc()` directly to a bare pointer you should document where
it gets deleted/freed, who owns it along the way, and how exception safety is ensured.
- If you assign the output of `new/malloc()` directly to a bare pointer you should document where it
gets deleted/freed, who owns it along the way, and how exception safety is ensured.

View File

@ -15,86 +15,87 @@ TODO
## Why are we doing this?
Having a clear delineation between public and private APIs for each module will improve the
maintainability and velocity of our codebase. Teams will have more freedom to evolve their
internal implementation details without affecting consumers. Consumers will benefit from
knowing what APIs are intended for their consumption.
maintainability and velocity of our codebase. Teams will have more freedom to evolve their internal
implementation details without affecting consumers. Consumers will benefit from knowing what APIs
are intended for their consumption.
## Assigning files to modules
The file `modules_poc/modules.yaml` contains a list of modules, each containing
a list of files. Each file must be contained in only one module. Note that
module assignment is not required to map neatly to team ownership.
The file `modules_poc/modules.yaml` contains a list of modules, each containing a list of files.
Each file must be contained in only one module. Note that module assignment is not required to map
neatly to team ownership.
In cases where multiple globs match a file, the current rule is that the
longest glob wins. This is used as a simpler-to-implement version of
most-specific glob wins, which we may switch to in the future.
In cases where multiple globs match a file, the current rule is that the longest glob wins. This is
used as a simpler-to-implement version of most-specific glob wins, which we may switch to in the
future.
## How do I mark API visibility?
This section will just describe the basic process. Later sections will cover the tooling
available to help, along with caveats to be aware of.
This section will just describe the basic process. Later sections will cover the tooling available
to help, along with caveats to be aware of.
First read the documentation in [src/mongo/util/modules.h](https://github.com/mongodb/mongo/blob/master/src/mongo/util/modules.h)
for the canonical list and description of visibility levels. As a brief overview of the main
levels from least to most restrictive:
First read the documentation in
[src/mongo/util/modules.h](https://github.com/mongodb/mongo/blob/master/src/mongo/util/modules.h) for
the canonical list and description of visibility levels. As a brief overview of the main levels from
least to most restrictive:
- `OPEN`: This is available for usage _and inheritance_ from anywhere in the codebase
- `PUBLIC`: This is available for usage from anywhere in the codebase. For types, subclasses may
only be defined in the same module.
- `NEEDS_REPLACEMENT` and `USE_REPLACEMENT(...)`: These are collectively considered
"unfortunately public" and are available for use, but should be avoided
- `NEEDS_REPLACEMENT` and `USE_REPLACEMENT(...)`: These are collectively considered "unfortunately
public" and are available for use, but should be avoided
- `PARENT_PRIVATE`: This is similar to `PRIVATE`, but allows usage from any file in the parent
module, including other submodules
- `PRIVATE`: This may only be used from the current module or one of its submodules
- `FILE_PRIVATE`: This may only be used from the current "file family" (roughly, header \+ cpp
\+ tests). It may not be used by other files, even from the same module.
- `FILE_PRIVATE`: This may only be used from the current "file family" (roughly, header \+ cpp \+
tests). It may not be used by other files, even from the same module.
You can think of public vs private similarly to how you would the sections of a `class`: they
indicate whether something is intended to be part of the API or an implementation detail. The
difference is that they apply at a wider granularity of code than a single class, with
implementation details available to either the full module (and its submodules) for `PRIVATE`
or the file family for `FILE_PRIVATE`.
implementation details available to either the full module (and its submodules) for `PRIVATE` or the
file family for `FILE_PRIVATE`.
The macros in that header file are attached to declarations and set the visibility level for
that declaration and all of its "semantic children"[^1]. The macros are C++ attributes which
means that they need to go in specific places that differ based on what is being marked (for
templates, the location does not change and is always somewhere after the `template <...>` part):
The macros in that header file are attached to declarations and set the visibility level for that
declaration and all of its "semantic children"[^1]. The macros are C++ attributes which means that
they need to go in specific places that differ based on what is being marked (for templates, the
location does not change and is always somewhere after the `template <...>` part):
- `MONGO_MOD_PUBLIC;` by itself as the first line after includes in a header sets the default
for that header (only `PUBLIC`, `PARENT_PRIVATE`, and `FILE_PRIVATE` are allowed here)
- `namespace MONGO_MOD mongo {` (this does not work with nested namespaces in a single
declaration like `namespace mongo::repl`)
- `MONGO_MOD_PUBLIC;` by itself as the first line after includes in a header sets the default for
that header (only `PUBLIC`, `PARENT_PRIVATE`, and `FILE_PRIVATE` are allowed here)
- `namespace MONGO_MOD mongo {` (this does not work with nested namespaces in a single declaration
like `namespace mongo::repl`)
- `class MONGO_MOD Foo {` (Ditto for `enum`, `struct`, and `union`)
- `MONGO_MOD void func(...);`
- `MONGO_MOD int var;`
- `concept isFooable MONGO_MOD {`
For the cases where it goes at the beginning of the line, if clang-format chooses an unfortunate
place to break the line, it usually helps to undo the formatting then put the macro on its own
line above the declaration.
place to break the line, it usually helps to undo the formatting then put the macro on its own line
above the declaration.
APIs are marked one header at a time, by including `"mongo/util/modules.h"` in the header.
This causes the header to be treated as "modularized" which has the following effects:
APIs are marked one header at a time, by including `"mongo/util/modules.h"` in the header. This
causes the header to be treated as "modularized" which has the following effects:
- All declarations in that header (not transitive includes) default to `PRIVATE`, meaning that
the public API is what must be marked.
- Members in `private:` sections in classes default to `PRIVATE`, regardless of the visibility
of the class. The only way the language would allow them to be used from outside of the module
is if you have cross-module friendships, which should generally be avoided. If needed
temporarily, favor `NEEDS_REPLACEMENT` over `PUBLIC` for these declarations.
- Declarations ending in `_forTest` default to `FILE_PRIVATE` to support the common case where
they are only intended for testing that class. If they are actually intended to support testing
of consumers, not just the type they are defined on, they can be explicitly given `PUBLIC` or
- All declarations in that header (not transitive includes) default to `PRIVATE`, meaning that the
public API is what must be marked.
- Members in `private:` sections in classes default to `PRIVATE`, regardless of the visibility of
the class. The only way the language would allow them to be used from outside of the module is if
you have cross-module friendships, which should generally be avoided. If needed temporarily, favor
`NEEDS_REPLACEMENT` over `PUBLIC` for these declarations.
- Declarations ending in `_forTest` default to `FILE_PRIVATE` to support the common case where they
are only intended for testing that class. If they are actually intended to support testing of
consumers, not just the type they are defined on, they can be explicitly given `PUBLIC` or
`PRIVATE` visibility.
- Internal and detail namespaces default to `PRIVATE` and cannot be made less restricted, but
can still be marked as `FILE_PRIVATE`. Individual declarations within the namespace can be
exposed as necessary, but they cannot be exposed in bulk without changing the name of the
namespace to something that doesn't imply private.
- Internal and detail namespaces default to `PRIVATE` and cannot be made less restricted, but can
still be marked as `FILE_PRIVATE`. Individual declarations within the namespace can be exposed as
necessary, but they cannot be exposed in bulk without changing the name of the namespace to
something that doesn't imply private.
For internal headers of a module which do not contribute to its public API, simply including
`modules.h` is sufficient. There is a [tool](#the-private-header-marker) to automate this
process. You may additionally want to consider whether any APIs should be marked `FILE_PRIVATE`,
but that is optional.
`modules.h` is sufficient. There is a [tool](#the-private-header-marker) to automate this process.
You may additionally want to consider whether any APIs should be marked `FILE_PRIVATE`, but that is
optional.
For IDL files, you mark visibility of whole types (`struct`, `enum`, and `command`) with the
`mod_visibility` option. The value should be the same as one of the `MONGO_MOD` macros, but
@ -105,17 +106,17 @@ compelling use case for this.
## What tooling exists to help me?
Note that all tooling should be run from within a properly set-up python virtual environment.
This includes running `buildscripts/poetry_sync.sh` to ensure you have the correct dependencies.
Note that all tooling should be run from within a properly set-up python virtual environment. This
includes running `buildscripts/poetry_sync.sh` to ensure you have the correct dependencies.
### The scanner and merger
The merger generates a cross reference of all first-party usages of first-party code and stores
it in `merged_decls.json`, which is used by the rest of our tooling. It is also where we validate
that there are no disallowed accesses. It will be invoked for you by the browser when you ask it
to rescan, or you can also manually run it as `modules_poc/merge_decls.py`. If you are interested
in analyzing that file, [`jq`](https://jqlang.org/) is a powerful tool, or you can just write
some python.
The merger generates a cross reference of all first-party usages of first-party code and stores it
in `merged_decls.json`, which is used by the rest of our tooling. It is also where we validate that
there are no disallowed accesses. It will be invoked for you by the browser when you ask it to
rescan, or you can also manually run it as `modules_poc/merge_decls.py`. If you are interested in
analyzing that file, [`jq`](https://jqlang.org/) is a powerful tool, or you can just write some
python.
As a rather extreme example of what you can do with `jq`, here is how the progress reports are
generated:
@ -129,43 +130,43 @@ generated:
jq 'map(., .mod = "TOTAL") | group_by(.mod)[] | group_by(.loc | split(":")[0]) | {mod: .[0].[0].mod, total: length, marked: map(select(any(.visibility == "UNKNOWN") | not)) | length} | .done = (1000 * .marked / .total | round) / 10 | "\(.mod): \(" " * (.mod | 40-length)) \(.done)% (\(.marked) / \(.total))"' -r merged_decls.json
```
Internally, the merger will internally invoke `bazel build --config=mod-scanner //src/mongo/...`
to run the scanner over the whole codebase (or the parts that have changed since the last scan),
taking advantage of bazel remote execution to achieve very high levels of parallelism.
Internally, the merger will internally invoke `bazel build --config=mod-scanner //src/mongo/...` to
run the scanner over the whole codebase (or the parts that have changed since the last scan), taking
advantage of bazel remote execution to achieve very high levels of parallelism.
### The browser
The main piece of tooling to run is the browser, which is launched by running
`modules_poc/browse.py`. If you haven't scanned the codebase recently, it will offer to run it
for you which will take a few minutes. After modifying the source code, you can rescan at any
time by pressing `r`. It will only rescan files that have been modified or that transitively
include modified headers.
`modules_poc/browse.py`. If you haven't scanned the codebase recently, it will offer to run it for
you which will take a few minutes. After modifying the source code, you can rescan at any time by
pressing `r`. It will only rescan files that have been modified or that transitively include
modified headers.
The browser is primarily intended to assist in labeling public APIs, so the files are sorted
with the most number of unlabeled declarations ("unknowns") first. You can search for a file
by pressing `f` or press `m` to filter the files by module.
The browser is primarily intended to assist in labeling public APIs, so the files are sorted with
the most number of unlabeled declarations ("unknowns") first. You can search for a file by pressing
`f` or press `m` to filter the files by module.
The list of available key bindings is shown on the right. You can toggle that by pressing `?`.
Other keybinding of note are that you can press `g` to go to the currently highlighted
declaration or location in your editor (only when running in the vscode or nvim terminal),
and `p` to toggle an inline preview of the location within the browser. You can press `Tab ↹`
to toggle between the tree and the code preview. The mouse is fully supported for scrolling
and expanding rows in the tree, and there are aliases for some basic vim keybinds (`hjkl/`).
The list of available key bindings is shown on the right. You can toggle that by pressing `?`. Other
keybinding of note are that you can press `g` to go to the currently highlighted declaration or
location in your editor (only when running in the vscode or nvim terminal), and `p` to toggle an
inline preview of the location within the browser. You can press `Tab ↹` to toggle between the tree
and the code preview. The mouse is fully supported for scrolling and expanding rows in the tree, and
there are aliases for some basic vim keybinds (`hjkl/`).
### The private header marker
Once you have scanned the codebase and produced a `merged_decls.json`,
`modules_poc/private_headers.py` can be used to find all header and IDL files where there are
no currently detected external usages and automatically mark them as fully private to the
module. This does not necessarily mean that all automatically marked headers are intended to
be private. A human should review to ensure that the marked headers match intent. You can pass
flags to filter on any/all of module, owning team, or path glob. For headers matching the filter,
the script will also warn of usages of `_forTest` external to the file family that may need to
be marked `PRIVATE` to make them available to the whole module since they default to only being
available to the file family for marked headers.
`modules_poc/private_headers.py` can be used to find all header and IDL files where there are no
currently detected external usages and automatically mark them as fully private to the module. This
does not necessarily mean that all automatically marked headers are intended to be private. A human
should review to ensure that the marked headers match intent. You can pass flags to filter on
any/all of module, owning team, or path glob. For headers matching the filter, the script will also
warn of usages of `_forTest` external to the file family that may need to be marked `PRIVATE` to
make them available to the whole module since they default to only being available to the file
family for marked headers.
Make sure to run `buildscripts/clang_format.py format-my` or `bazel run format` after using it
to modify any C++ files.
Make sure to run `buildscripts/clang_format.py format-my` or `bazel run format` after using it to
modify any C++ files.
Example usage:
@ -178,13 +179,12 @@ Example usage:
### The PR comment generator
You can run `modules_poc/mod_diff.py` to output a brief summary of all of the API (including
visibility levels and usages counts) for each file modified in your branch. When putting up a PR
to mark API visibility, you should add a comment with its output to the PR as an aide to
reviewers. The output is intended to be close enough to C++ that you should put it in a
` ```cpp ` block when making your PR comment to make it more readable. You can also
pipe it through `bat -lcpp` to make it colorful locally. Note that it will use the last
scan output, so if you've modified any headers, you should run a rescan prior to running this
tool.
visibility levels and usages counts) for each file modified in your branch. When putting up a PR to
mark API visibility, you should add a comment with its output to the PR as an aide to reviewers. The
output is intended to be close enough to C++ that you should put it in a ` ```cpp ` block when
making your PR comment to make it more readable. You can also pipe it through `bat -lcpp` to make it
colorful locally. Note that it will use the last scan output, so if you've modified any headers, you
should run a rescan prior to running this tool.
## Workflow
@ -198,24 +198,23 @@ The general workflow for each PR will generally be the same:
5. Run [the pr comment generator](#the-pr-comment-generator) to show the APIs that you have marked
- Look through this to ensure that everything is as you expect.
6. Put up a PR and include the generated comment in a ` ```cpp ` block
- I suggest keeping PRs small (say, no more than 10 files at a time) so that they are
manageable by reviewers. As an exception it seems reasonable to auto-mark many headers as
private in a single PR, as long as those PRs are separate from those containing any manual
marking.
- I suggest keeping PRs small (say, no more than 10 files at a time) so that they are manageable
by reviewers. As an exception it seems reasonable to auto-mark many headers as private in a
single PR, as long as those PRs are separate from those containing any manual marking.
When first starting to mark a module, I suggest running the [`modules_poc/private_headers.py`](#the-private-header-marker)
script with `--dry-run` (or `-n`) and `--module=YOUR_MODULE`. For larger modules (in particular,
the `query` mega module) you may want to pass a `--glob` so that you can focus on a smaller
subset of the code initially. That will give you an overview of the files that are used from
outside your module (which contain defacto public APIs today) and those that do not (which can
automatically be marked as private implementation details).
When first starting to mark a module, I suggest running the
[`modules_poc/private_headers.py`](#the-private-header-marker) script with `--dry-run` (or `-n`) and
`--module=YOUR_MODULE`. For larger modules (in particular, the `query` mega module) you may want to
pass a `--glob` so that you can focus on a smaller subset of the code initially. That will give you
an overview of the files that are used from outside your module (which contain defacto public APIs
today) and those that do not (which can automatically be marked as private implementation details).
If all of the defacto private headers seem like they should be private, you can remove the
dry-run flag to have it automatically mark them as private. Be sure to validate that their
contents are actually intended to be private. Remember that the point of having a human doing
the marking is to ensure that we correctly capture intent. You can optionally mark implementation
details within each header as `FILE_PRIVATE`, if you would like to prevent them from being used
elsewhere even within the module.
If all of the defacto private headers seem like they should be private, you can remove the dry-run
flag to have it automatically mark them as private. Be sure to validate that their contents are
actually intended to be private. Remember that the point of having a human doing the marking is to
ensure that we correctly capture intent. You can optionally mark implementation details within each
header as `FILE_PRIVATE`, if you would like to prevent them from being used elsewhere even within
the module.
You can then open [the browser](#the-browser) (`modules_poc/browse.py`) to look at the remaining
headers. It will show you what is used and from where. It will be particularly useful for things
@ -229,137 +228,136 @@ that seem like they should be private, but are being used externally.
`modules_poc/modules.yaml` to move them.
2. If there is already a public API that callers should use instead, mark it as
`USE_REPLACEMENT(better_api)`. The argument accepts any C++ tokens, but the intent is where
possible to use the name of the replacement. This will generate a ticket for all teams using
that code.
possible to use the name of the replacement. This will generate a ticket for all teams using that
code.
1. If there are very few users, consider just cleaning them up.
3. Reconsider making this API public if other modules need its functionality, and this is
the only way to get it.
4. Otherwise, if there is no public API that fulfills the needs of the callers, but you
don't want the current API to remain public long-term, use `NEEDS_REPLACEMENT`. This will
generate a ticket for the team that owns that code.
1. If the API was "obviously" intended to be private (eg it is in a `details` namespace)
and callers would be reasonably able to implement the functionality themselves, possibly
by writing their own version, it seems acceptable to use
3. Reconsider making this API public if other modules need its functionality, and this is the only
way to get it.
4. Otherwise, if there is no public API that fulfills the needs of the callers, but you don't want
the current API to remain public long-term, use `NEEDS_REPLACEMENT`. This will generate a ticket
for the team that owns that code.
1. If the API was "obviously" intended to be private (eg it is in a `details` namespace) and
callers would be reasonably able to implement the functionality themselves, possibly by
writing their own version, it seems acceptable to use
`USE_REPLACEMENT(do not use internal details)`
## Caveats and Limitations
**OVERARCHING GUIDELINE**: Always try to mark declarations correctly according to intent,
even if it will not be enforced by the current tooling. This is both to provide the correct
information to human readers, as well as to avoid issues if we improve the tooling in the
future to eliminate these limitations
**OVERARCHING GUIDELINE**: Always try to mark declarations correctly according to intent, even if it
will not be enforced by the current tooling. This is both to provide the correct information to
human readers, as well as to avoid issues if we improve the tooling in the future to eliminate these
limitations
The rest of this section is fairly technical and probably not necessary for most readers unless
they notice something "weird" going on and want to dive into why. Most of these limitations are
more likely to affect the core modules since most of the rest of our code does not expose APIs
via macros and templates or have APIs only consumed by templates, and those are where most of
these issues come up.
The rest of this section is fairly technical and probably not necessary for most readers unless they
notice something "weird" going on and want to dive into why. Most of these limitations are more
likely to affect the core modules since most of the rest of our code does not expose APIs via macros
and templates or have APIs only consumed by templates, and those are where most of these issues come
up.
- We do not track usages of namespaces at all, only the declarations within namespaces. When
a namespace is marked with a visibility, it does not affect the visibility of the namespace
itself (since it doesn't have one), it sets the default visibility for all declarations within
**that namespace block**. Each time a namespace is reopened it is a separate block and the
visibility markers on other blocks of the same namespace do not apply.
- The scanner only knows about declarations that it sees being used. For implementation reasons,
it only discovers declarations by seeing what every usage is using. This can either cause or be
- We do not track usages of namespaces at all, only the declarations within namespaces. When a
namespace is marked with a visibility, it does not affect the visibility of the namespace itself
(since it doesn't have one), it sets the default visibility for all declarations within **that
namespace block**. Each time a namespace is reopened it is a separate block and the visibility
markers on other blocks of the same namespace do not apply.
- The scanner only knows about declarations that it sees being used. For implementation reasons, it
only discovers declarations by seeing what every usage is using. This can either cause or be
caused by other limitations.
- Usages in templates may not be seen. This is especially the case for "dependent types and
values" which are things that are not known by the compiler before the template is instantiated.
- This is a problem for functions where any arguments are dependent if it can't figure out
which overload will be selected. It is even worse for free-functions called unqualified
(`f(blah)` rather than `ns::f(blah)` or `x.f(blah)`) since due to ADL, overload resolution
is _always_ delayed for them.
- Everything that results from a macro expansion is treated as-if it was written at the point
of expansion. This applies to both declarations and usages. If you have an API that should
only be used via the defined macros, mark it as `MOD_PUBLIC_FOR_TECHNICAL_REASONS` to signal
to readers that they should avoid direct usage, even if the tooling won't prevent it. We may
improve this in the future.
- Template variables are completely ignored due to some unfortunate clang bugs. Still, try
to mark them correctly since we may change this in the future.
- Usages in templates may not be seen. This is especially the case for "dependent types and values"
which are things that are not known by the compiler before the template is instantiated.
- This is a problem for functions where any arguments are dependent if it can't figure out which
overload will be selected. It is even worse for free-functions called unqualified (`f(blah)`
rather than `ns::f(blah)` or `x.f(blah)`) since due to ADL, overload resolution is _always_
delayed for them.
- Everything that results from a macro expansion is treated as-if it was written at the point of
expansion. This applies to both declarations and usages. If you have an API that should only be
used via the defined macros, mark it as `MOD_PUBLIC_FOR_TECHNICAL_REASONS` to signal to readers
that they should avoid direct usage, even if the tooling won't prevent it. We may improve this in
the future.
- Template variables are completely ignored due to some unfortunate clang bugs. Still, try to mark
them correctly since we may change this in the future.
- Method calls are assigned to the static type at the call site. This has two important effects:
- A subclass's overridden method may seem unused if it is only used via calls through a base
class pointer/reference
- Calls through a base class pointer/reference count as calls of that class's method, not of
the interface's
- Defaulted members (methods, ctors, dtors) are treated as usages of the class itself,
regardless of whether they implicitly or explicitly defaulted. This is because clang does not
provide an API to distinguish between those cases.
- Template normalization woes: we try really hard to report declarations as the template
`foo<T>` rather than separate instantiations like `foo<int>`, `foo<string>`, etc, **unless**
they are explicitly specialized, meaning that the instantiation has its own definition different
from the main template. Unfortunately, clang does a bad job at this and we have a number of
kludgy workarounds. The most important effects:
- Explicit specializations of function and variable templates are ignored and always converted
to the primary template.
- A subclass's overridden method may seem unused if it is only used via calls through a base class
pointer/reference
- Calls through a base class pointer/reference count as calls of that class's method, not of the
interface's
- Defaulted members (methods, ctors, dtors) are treated as usages of the class itself, regardless of
whether they implicitly or explicitly defaulted. This is because clang does not provide an API to
distinguish between those cases.
- Template normalization woes: we try really hard to report declarations as the template `foo<T>`
rather than separate instantiations like `foo<int>`, `foo<string>`, etc, **unless** they are
explicitly specialized, meaning that the instantiation has its own definition different from the
main template. Unfortunately, clang does a bad job at this and we have a number of kludgy
workarounds. The most important effects:
- Explicit specializations of function and variable templates are ignored and always converted to
the primary template.
- We do treat explicit specializations of types as separate (using the heuristic of having a
separate location than the main template), because they can have a different shape and API than
the main template. In general they should probably have the same visibility though, unless the
instantiation is using a private type which should be unavailable to consumers anyway.
- Clang assigns many locations to the site of explicit template instantiations and extern
template declarations, even when there is a better location that it can see. Luckily these
are fairly rare.
- Clang assigns many locations to the site of explicit template instantiations and extern template
declarations, even when there is a better location that it can see. Luckily these are fairly
rare.
- Sometimes clang reports the resolved destination of `using` declarations and type alias, but
usually it reports the `using` declaration itself. A few notable cases (these are trends and
may not be absolute\!)
usually it reports the `using` declaration itself. A few notable cases (these are trends and may
not be absolute\!)
- `using Base::foo;` to expose a member of a base class is resolved as a usage of `Base::foo`
rather than `Derived::foo`. This is especially notable when the `Base` class is intended to be
a private implementation detail. You will need to mark all exposed methods as public.
- `using Base::Base;` to pull in the base constructors is the opposite and is recorded as a
usage of `Derived::Base(args)`, which is odd because such a declaration doesn't actually exist.
rather than `Derived::foo`. This is especially notable when the `Base` class is intended to be a
private implementation detail. You will need to mark all exposed methods as public.
- `using Base::Base;` to pull in the base constructors is the opposite and is recorded as a usage
of `Derived::Base(args)`, which is odd because such a declaration doesn't actually exist.
- Internal/details namespaces (currently defined as matching the regex `(detail|internal)s?$`)
implicitly have implicit default visibility of private if `modules.h` is included. It is not
possible to give the namespace a public visibility, but you can restrict it further with
`FILE_PRIVATE`. If you want declarations inside it to be usable from outside your module you
must mark children of the namespace explicitly, or rename it to not use a name that implies
that it is for internal usage only. A somewhat common case will be marking internal declarations
that are only intended to be used via macros with `PUBLIC_FOR_TECHNICAL_REASONS`.
- Be very careful with forward declarations. Try to avoid them wherever possible (unless there
is a significant benefit). Especially avoid forward declaring anything from another module\!
Where forward declarations must be used, make sure that they have the same visibility as the
real definition. As an exception, if every TU that sees the forward declaration will also see
the definition it is OK to omit marking the forward definition. This may happen when they are
both in the same header, or the forward declaration is in a private implementation detail header
which is included by the defining header. Be aware of the implicit visibility marking which also
applies to forward declaration, if they are the only declaration seen in the TU.
- Never forward declare functions to avoid including a header. They are much more problematic
than types, both in general in C++ and specifically for this tooling.
- We try to use the definition location for types defined in headers, but the "canonical"
location (clang's term for the first declaration seen in the current TU) for everything else.
If the type is defined in a .cpp, we use the canonical location.
`FILE_PRIVATE`. If you want declarations inside it to be usable from outside your module you must
mark children of the namespace explicitly, or rename it to not use a name that implies that it is
for internal usage only. A somewhat common case will be marking internal declarations that are
only intended to be used via macros with `PUBLIC_FOR_TECHNICAL_REASONS`.
- Be very careful with forward declarations. Try to avoid them wherever possible (unless there is a
significant benefit). Especially avoid forward declaring anything from another module\! Where
forward declarations must be used, make sure that they have the same visibility as the real
definition. As an exception, if every TU that sees the forward declaration will also see the
definition it is OK to omit marking the forward definition. This may happen when they are both in
the same header, or the forward declaration is in a private implementation detail header which is
included by the defining header. Be aware of the implicit visibility marking which also applies to
forward declaration, if they are the only declaration seen in the TU.
- Never forward declare functions to avoid including a header. They are much more problematic than
types, both in general in C++ and specifically for this tooling.
- We try to use the definition location for types defined in headers, but the "canonical" location
(clang's term for the first declaration seen in the current TU) for everything else. If the type
is defined in a .cpp, we use the canonical location.
- We only consider declarations in headers, never in .cpp files.
- Be mindful of `_forTest` functions. They default to `FILE_PRIVATE` since they are typically
intended only for use when testing the type they are defined on, not when testing consumers.
In the cases where they _are_ intended as part of the API for testing consumers, you can
explicitly mark them `PUBLIC` or `PRIVATE` depending on whether they should be usable from
outside your module or not.
- Things used implicitly (eg implicit conversion operators) are still counted as usages even
if they are not specifically named at the call site
- When merging information from multiple TUs, definitions always replace the metadata gathered
from TUs that only saw a declaration.
- Note that we aren't guaranteed to see every definition, in particular for functions that
are not called from the TU that they are defined in. So this cannot be used to find places
where we deleted the definition but forgot to delete the declaration (we wouldn't see them
anyway, since we only track things that are used, and undefined things can't really be used,
except trivially, without breaking the build).
- `private` members of classes are implicitly `PRIVATE`, and must be explicitly marked otherwise
if desired. They should probably never be made `PUBLIC` since that implies cross-module
friendship. In the few places where we have that today, they have been made one of the flavors
of unfortunately public: `NEEDS_REPLACEMENT` or `USE_INSTEAD`.
- `public` members of `private` types do not inherit the implicit `PRIVATE` and follow the
normal rule of looking for their nearest semantic parent with an explicit marker. That means
that they may be `PUBLIC`. However, the language rules still apply and as long as an
instance of the type is never handed to consumers they will have no way of accessing those
members.
intended only for use when testing the type they are defined on, not when testing consumers. In
the cases where they _are_ intended as part of the API for testing consumers, you can explicitly
mark them `PUBLIC` or `PRIVATE` depending on whether they should be usable from outside your
module or not.
- Things used implicitly (eg implicit conversion operators) are still counted as usages even if they
are not specifically named at the call site
- When merging information from multiple TUs, definitions always replace the metadata gathered from
TUs that only saw a declaration.
- Note that we aren't guaranteed to see every definition, in particular for functions that are not
called from the TU that they are defined in. So this cannot be used to find places where we
deleted the definition but forgot to delete the declaration (we wouldn't see them anyway, since
we only track things that are used, and undefined things can't really be used, except trivially,
without breaking the build).
- `private` members of classes are implicitly `PRIVATE`, and must be explicitly marked otherwise if
desired. They should probably never be made `PUBLIC` since that implies cross-module friendship.
In the few places where we have that today, they have been made one of the flavors of
unfortunately public: `NEEDS_REPLACEMENT` or `USE_INSTEAD`.
- `public` members of `private` types do not inherit the implicit `PRIVATE` and follow the normal
rule of looking for their nearest semantic parent with an explicit marker. That means that they
may be `PUBLIC`. However, the language rules still apply and as long as an instance of the type
is never handed to consumers they will have no way of accessing those members.
- `protected` members do not default to `PRIVATE`, but because we only allow subclassing from
`OPEN` classes, the language visibility rules will disallow access from outside the module
unless you choose to allow it by use `OPEN` classes or `friend`s. Note that making any
subclass `OPEN` exposes all `protected` members of parents unless they are marked `PRIVATE`.
- `friend` declarations are mostly ignored, except when they are a definition. So the
definitions using the "hidden friend" pattern are tracked, but we ignore it if the definition
is in a cpp file.
unless you choose to allow it by use `OPEN` classes or `friend`s. Note that making any subclass
`OPEN` exposes all `protected` members of parents unless they are marked `PRIVATE`.
- `friend` declarations are mostly ignored, except when they are a definition. So the definitions
using the "hidden friend" pattern are tracked, but we ignore it if the definition is in a cpp
file.
[^1]:
Clang distinguishes between "semantic" and "lexical" parents. The primary differences
are that members of classes (including member types) are semantic children of the class even
when defined out of line, and conversely `friend` declarations are not, and instead are
considered semantic children of the nearest namespace.
Clang distinguishes between "semantic" and "lexical" parents. The primary differences are that
members of classes (including member types) are semantic children of the class even when defined
out of line, and conversely `friend` declarations are not, and instead are considered semantic
children of the nearest namespace.

View File

@ -2,15 +2,20 @@
## ALLOWED_UNOWNED_FILES.yml File Format
This file is for repos that require all files be owned. Some files may be listed here as an exception and will be added to the end of the CODEOWNERS.
This file is for repos that require all files be owned. Some files may be listed here as an
exception and will be added to the end of the CODEOWNERS.
`version` is the current version of the `ALLOWED_UNOWNED_FILES.yml` file format. The only version is `1.0.0`.
`version` is the current version of the `ALLOWED_UNOWNED_FILES.yml` file format. The only version is
`1.0.0`.
`filters` are a list of filters that each have a `filter` and `justificaiton` field.
`filter` is a file path. This file path must start with a `/` and is relative to the root repo directory. Directories or globs are not supported at the moment to ensure careful selection of files allowed to be unowned. This can be reconsidered if proper usecases appear.
`filter` is a file path. This file path must start with a `/` and is relative to the root repo
directory. Directories or globs are not supported at the moment to ensure careful selection of files
allowed to be unowned. This can be reconsidered if proper usecases appear.
`justification` is the reason why this file should be unowned. A common case is that this is a generated file that has checks in CI to ensure it is in the correct format.
`justification` is the reason why this file should be unowned. A common case is that this is a
generated file that has checks in CI to ensure it is in the correct format.
### Example file
@ -23,7 +28,8 @@ filters: # List of all filters
### Configuration
This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your `.bazelrc` file:
This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your
`.bazelrc` file:
```
common --define codeowners_have_allowed_unowned_files=True

View File

@ -15,7 +15,8 @@ Banned owners should be separated by newlines. Empty lines and lines starting wi
### Configuration
This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your `.bazelrc` file:
This can be configured in any repo with `bazel_rules_mongo` by putting the following lines in your
`.bazelrc` file:
```
common --define codeowners_have_banned_codeowners=True

View File

@ -1,23 +1,40 @@
# Code Owners
After modifying any OWNERS files, the overall ownership database (`.github/CODEOWNERS`) must be rebuilt.
This is done by running `bazel run codeowners`.
After modifying any OWNERS files, the overall ownership database (`.github/CODEOWNERS`) must be
rebuilt. This is done by running `bazel run codeowners`.
## OWNERS.yml File Format
This is loosely based on [kubernetes](https://www.kubernetes.dev/docs/guide/owners/) and [chromium](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/code_reviews.md) OWNERS files.
This is loosely based on [kubernetes](https://www.kubernetes.dev/docs/guide/owners/) and
[chromium](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/code_reviews.md) OWNERS files.
`version` is the current version of the `OWNERS.yml` file format. The latest version is `2.0.0`. For previous versions, see the [changelog](#owners-changelog).
`version` is the current version of the `OWNERS.yml` file format. The latest version is `2.0.0`. For
previous versions, see the [changelog](#owners-changelog).
`aliases` point to yaml files files that list aliases that can be used in this OWNERS.yml file.
`filters` are a list of globs that match [gitignore syntax](https://git-scm.com/docs/gitignore#_pattern_format). The filter must match at least once file and be unique to the file. Each filter must have a list of `approvers`. An approval from any single approver will allow the code to be merged. `NOOWNER` can be specified to mark a filter as unowned. Each filter can optionally have a `metadata` tag. Inside that tag a user can put whatever tags they want. We have reserved two meaningful tags `emeritus_approvers` and `owning_team`. This is not an exhaustive list and more documented and undocumented options can be added later. There is no linting done on the metadata tag.
`filters` are a list of globs that match
[gitignore syntax](https://git-scm.com/docs/gitignore#_pattern_format). The filter must match at
least once file and be unique to the file. Each filter must have a list of `approvers`. An approval
from any single approver will allow the code to be merged. `NOOWNER` can be specified to mark a
filter as unowned. Each filter can optionally have a `metadata` tag. Inside that tag a user can put
whatever tags they want. We have reserved two meaningful tags `emeritus_approvers` and
`owning_team`. This is not an exhaustive list and more documented and undocumented options can be
added later. There is no linting done on the metadata tag.
`emeritus_approvers` are folks that used to be approvers that no longer have approver privileges. This allows us to keep track of folks who built up a knowledge base of this code that might need to be consulted in a critical situation. Both `approvers` and `emeritus_approvers` should be either github usernames, emails, or aliases.
`emeritus_approvers` are folks that used to be approvers that no longer have approver privileges.
This allows us to keep track of folks who built up a knowledge base of this code that might need to
be consulted in a critical situation. Both `approvers` and `emeritus_approvers` should be either
github usernames, emails, or aliases.
`owning_team` is a team that owns the files, however this team does not have approval privileges. Instead this team should be looked to for asking questions. This metadata can also be used programmatically to, for example, generate a report of all the files owned by a particular team, even though that team has nominated specific engineers as approvers.
`owning_team` is a team that owns the files, however this team does not have approval privileges.
Instead this team should be looked to for asking questions. This metadata can also be used
programmatically to, for example, generate a report of all the files owned by a particular team,
even though that team has nominated specific engineers as approvers.
`options` are not required and are various options about how to use this OWNERS.yml file. Currently there is only a single option `no_parent_owners` which is defaulted to false. If this option is set to true it will stop upwards OWNERS resolution.
`options` are not required and are various options about how to use this OWNERS.yml file. Currently
there is only a single option `no_parent_owners` which is defaulted to false. If this option is set
to true it will stop upwards OWNERS resolution.
### Example file
@ -70,7 +87,8 @@ options: # All options for this file
`version` is the current version of the aliases file format. This should always be `1.0.0`.
`aliases` are a list of group names. Each group name must have one or more reviewers. Reviewers should be github usernames.
`aliases` are a list of group names. Each group name must have one or more reviewers. Reviewers
should be github usernames.
## Example File
@ -133,18 +151,26 @@ filters:
### Example 1
If someone changes `a/b/c/file.py` the owner resolution will select teamC since the first file searched is `a/b/c/OWNERS.yml` First we compare if `file.py` matches `*.md`. It does not so we now check if `file.py` matches `*`. It does match so teamC is selected for review.
If someone changes `a/b/c/file.py` the owner resolution will select teamC since the first file
searched is `a/b/c/OWNERS.yml` First we compare if `file.py` matches `*.md`. It does not so we now
check if `file.py` matches `*`. It does match so teamC is selected for review.
### Example 2
If someone changes `a/b/c/file.yaml` the owner resolution will not find a team. The first file searched is `a/b/c/OWNERS.yml`. No filters match file.yaml. Next we search in `a/b/OWNERS.yml`. No filters match there either. We stop searching up because `no_parent_owners` is set to true.
If someone changes `a/b/c/file.yaml` the owner resolution will not find a team. The first file
searched is `a/b/c/OWNERS.yml`. No filters match file.yaml. Next we search in `a/b/OWNERS.yml`. No
filters match there either. We stop searching up because `no_parent_owners` is set to true.
## OWNERS Changelog
### v2.0.0
See the [previous version](https://github.com/mongodb/mongo/blob/79590effe86c471cc15d91c6785599ec2085d7c0/docs/owners/owners_format.md) of this documentation for details on v1.0.0.
See the
[previous version](https://github.com/mongodb/mongo/blob/79590effe86c471cc15d91c6785599ec2085d7c0/docs/owners/owners_format.md)
of this documentation for details on v1.0.0.
Patterns without a slash are no longer prepended with `**/` to make them apply recursively. If you want your pattern you apply recursively you must add the `**/` yourself now.
Patterns without a slash are no longer prepended with `**/` to make them apply recursively. If you
want your pattern you apply recursively you must add the `**/` yourself now.
The `*` pattern is now resolved as the directory name to ensure it applies recursively by default. You can use the `/*` pattern to only match inside the current directory.
The `*` pattern is now resolved as the directory name to ensure it applies recursively by default.
You can use the `/*` pattern to only match inside the current directory.

View File

@ -12,16 +12,16 @@ To find the correct binary for a specific log you need to:
curl -O http://s3.amazonaws.com/downloads.mongodb.org/linux/mongodb-linux-x86_64-debugsymbols-1.x.x.tgz
```
You can also get the debugsymbols archive for official builds through [the Downloads page][1]. In the
Archived Releases section, click on the appropriate platform link to view the available archives.
Select the appropriate debug symbols archive.
You can also get the debugsymbols archive for official builds through [the Downloads page][1]. In
the Archived Releases section, click on the appropriate platform link to view the available
archives. Select the appropriate debug symbols archive.
## Using mongosymb.py to get file and line numbers
Stacktraces are logged on a line with `msg` `BACKTRACE`. The full backtrace contents are available in
an attribute named `bt`. To convert this into a list of source locations with file and line numbers,
copy the contents of the `bt` JSON blob into a file, then direct the contents of that file into
the standard input of `buildscripts/mongosymb.py`:
Stacktraces are logged on a line with `msg` `BACKTRACE`. The full backtrace contents are available
in an attribute named `bt`. To convert this into a list of source locations with file and line
numbers, copy the contents of the `bt` JSON blob into a file, then direct the contents of that file
into the standard input of `buildscripts/mongosymb.py`:
```
cat bt | buildscripts/mongosymb.py --debug-file-resolver=path path/to/debug/symbols/file
@ -55,8 +55,8 @@ $ cat bt | buildscripts/mongosymb.py --debug-file-resolver=path bazel-bin/instal
## Stack Trace Schema
Stack traces are typically logged as log message 31380, having a `bt` attribute
that holds a JSON object value:
Stack traces are typically logged as log message 31380, having a `bt` attribute that holds a JSON
object value:
```json
"bt": {
@ -86,10 +86,9 @@ that holds a JSON object value:
}
```
The "processInfo" subobject has other information about the process, but
the most important thing for the stack trace is the "somap", which is an
array of all dynamically linked ELF files, including the main executable,
and where in memory they were loaded.
The "processInfo" subobject has other information about the process, but the most important thing
for the stack trace is the "somap", which is an array of all dynamically linked ELF files, including
the main executable, and where in memory they were loaded.
Partial example showing a few typical frames:

View File

@ -2,27 +2,55 @@
## Project Impetus
We frequently encounter Python errors that are caused by a python dependency author updating their package that is backward breaking. The following tickets are a few examples of this happening:
[SERVER-79126](https://jira.mongodb.org/browse/SERVER-79126), [SERVER-79798](https://jira.mongodb.org/browse/SERVER-79798), [SERVER-53348](https://jira.mongodb.org/browse/SERVER-53348), [SERVER-57036](https://jira.mongodb.org/browse/SERVER-57036), [SERVER-44579](https://jira.mongodb.org/browse/SERVER-44579), [SERVER-70845](https://jira.mongodb.org/browse/SERVER-70845), [SERVER-63974](https://jira.mongodb.org/browse/SERVER-63974), [SERVER-61791](https://jira.mongodb.org/browse/SERVER-61791), and [SERVER-60950](https://jira.mongodb.org/browse/SERVER-60950). We have always known this was a problem and have known there was a way to fix it. We finally had the bandwidth to tackle this problem.
We frequently encounter Python errors that are caused by a python dependency author updating their
package that is backward breaking. The following tickets are a few examples of this happening:
[SERVER-79126](https://jira.mongodb.org/browse/SERVER-79126),
[SERVER-79798](https://jira.mongodb.org/browse/SERVER-79798),
[SERVER-53348](https://jira.mongodb.org/browse/SERVER-53348),
[SERVER-57036](https://jira.mongodb.org/browse/SERVER-57036),
[SERVER-44579](https://jira.mongodb.org/browse/SERVER-44579),
[SERVER-70845](https://jira.mongodb.org/browse/SERVER-70845),
[SERVER-63974](https://jira.mongodb.org/browse/SERVER-63974),
[SERVER-61791](https://jira.mongodb.org/browse/SERVER-61791), and
[SERVER-60950](https://jira.mongodb.org/browse/SERVER-60950). We have always known this was a
problem and have known there was a way to fix it. We finally had the bandwidth to tackle this
problem.
## Project Prework
First, we wanted to test out using poetry so we converted mongo-container project to use poetry [SERVER-76974](https://jira.mongodb.org/browse/SERVER-76974). This showed promise and we considered this a green light to move forward on converting the server python to use poetry.
First, we wanted to test out using poetry so we converted mongo-container project to use poetry
[SERVER-76974](https://jira.mongodb.org/browse/SERVER-76974). This showed promise and we considered
this a green light to move forward on converting the server python to use poetry.
Before we could start the project we had to upgrade python to a version that was not EoL. This work is captured in [SERVER-72262](https://jira.mongodb.org/browse/SERVER-72262). We upgraded python to 3.10 on every system except windows. Windows could not be upgraded due to a test problem relating to some cipher suites [SERVER-79172](https://jira.mongodb.org/browse/SERVER-79172).
Before we could start the project we had to upgrade python to a version that was not EoL. This work
is captured in [SERVER-72262](https://jira.mongodb.org/browse/SERVER-72262). We upgraded python to
3.10 on every system except windows. Windows could not be upgraded due to a test problem relating to
some cipher suites [SERVER-79172](https://jira.mongodb.org/browse/SERVER-79172).
## Conversion to Poetry
After the prework was done we wrote, tested, and merged [SERVER-76751](https://jira.mongodb.org/browse/SERVER-76751) which is converting the mongo python dependencies to poetry. This ticket had an absurd amount of dependencies and required a significant amount of patch builds. The total number of changes was pretty small but it affected a lot of different projects.
After the prework was done we wrote, tested, and merged
[SERVER-76751](https://jira.mongodb.org/browse/SERVER-76751) which is converting the mongo python
dependencies to poetry. This ticket had an absurd amount of dependencies and required a significant
amount of patch builds. The total number of changes was pretty small but it affected a lot of
different projects.
Knowing there was a lot this touched we expected to see some bugs and were quick to try to fix them. Some of these were caught before merge and some were caught after.
Knowing there was a lot this touched we expected to see some bugs and were quick to try to fix them.
Some of these were caught before merge and some were caught after.
[BUILD-17860](https://jira.mongodb.org/browse/BUILD-17860) required the build team to rebuild python on macosx arm. This was caught before merging.
[BUILD-17860](https://jira.mongodb.org/browse/BUILD-17860) required the build team to rebuild python
on macosx arm. This was caught before merging.
[SERVER-81122](https://jira.mongodb.org/browse/SERVER-81122) found that poetry broke the spawnhost script. This was caught after merge.
[SERVER-81122](https://jira.mongodb.org/browse/SERVER-81122) found that poetry broke the spawnhost
script. This was caught after merge.
[SERVER-81061](https://jira.mongodb.org/browse/SERVER-81061) and [BF-29909](https://jira.mongodb.org/browse/BF-29909) were found by sys-perf since they run their own build and do not use the standard build process. Therefore it was very hard to test for this one. This was caught post merge.
[SERVER-81061](https://jira.mongodb.org/browse/SERVER-81061) and
[BF-29909](https://jira.mongodb.org/browse/BF-29909) were found by sys-perf since they run their own
build and do not use the standard build process. Therefore it was very hard to test for this one.
This was caught post merge.
[SERVER-80799](https://jira.mongodb.org/browse/SERVER-80799) found that poetry broke mongo tooling metrics collection (not OTel). This was only found since an engineer on the team saw this bug in the code. This was caught post merge.
[SERVER-80799](https://jira.mongodb.org/browse/SERVER-80799) found that poetry broke mongo tooling
metrics collection (not OTel). This was only found since an engineer on the team saw this bug in the
code. This was caught post merge.
Overall, when changing something so foundational it is inevitable that some things will break.

View File

@ -1,10 +1,10 @@
# PrimaryOnlyService
The PrimaryOnlyService machinery provides a way to register tasks that should run only when current
node is Primary, and should be driven to completion across replica set failovers on the new
Primary. It is intended to be used by tasks that can be modeled as a state machine with a single
MongoDB document containing the current state, which newly-elected Primaries can use to rebuild the
state of the task after failover and pick up where the old Primary left off.
node is Primary, and should be driven to completion across replica set failovers on the new Primary.
It is intended to be used by tasks that can be modeled as a state machine with a single MongoDB
document containing the current state, which newly-elected Primaries can use to rebuild the state of
the task after failover and pick up where the old Primary left off.
## Classes
@ -62,16 +62,17 @@ what state it is in and thus what work still needs to be performed, and what wor
completed by the previous Primary.
To see an example bare-bones PrimaryOnlyService implementation to use as a reference, check out the
TestService defined in this unit test: https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/primary_only_service_test.cpp
TestService defined in this unit test:
https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/primary_only_service_test.cpp
## Behavior during state transitions
At stepUp, each PrimaryOnlyService queries its state document collection, and for each document
found, creates and launches a PrimaryOnlyService::Instance initialized off of the state
document. This happens asynchronously relative to the core replication stepUp process - there is no
guarantee that when stepUp completes and the RSTL lock is dropped that the PrimaryOnlyServices have
finished rebuilding all their Instances. At stepDown all Instances are interrupted, but the threads
running their work are not joined, and the Instance objects containing their in-memory state are not
found, creates and launches a PrimaryOnlyService::Instance initialized off of the state document.
This happens asynchronously relative to the core replication stepUp process - there is no guarantee
that when stepUp completes and the RSTL lock is dropped that the PrimaryOnlyServices have finished
rebuilding all their Instances. At stepDown all Instances are interrupted, but the threads running
their work are not joined, and the Instance objects containing their in-memory state are not
released, until the next stepUp. This is done to reduce the likelihood of blocking within the state
transition process and delaying it for the entire node. This behavior does, however, guarantee that
there will never be two Instances of the same PrimaryOnlyService with the same InstanceID running at

View File

@ -1,11 +1,14 @@
# Priority port support
`mongod` and `mongos` support a dedicated **priority port** intended for **internal, high-priority operations** such as automation monitoring, MongoTune, and critical intra-cluster replication traffic.
`mongod` and `mongos` support a dedicated **priority port** intended for **internal, high-priority
operations** such as automation monitoring, MongoTune, and critical intra-cluster replication
traffic.
With a priority port configured:
- The database listens on a second TCP port in addition to the main port.
- Connections accepted on the priority port are exempt from connection limits, connection establishment rate limiting, and ingress request rate limiting.
- Connections accepted on the priority port are exempt from connection limits, connection
establishment rate limiting, and ingress request rate limiting.
- gRPC is not supported.
The feature is **disabled by default**.
@ -35,7 +38,8 @@ net:
When the transport layer starts:
- A **separate listener thread** is created for the priority port in the ASIO transport layer.
- Sessions created from the priority port are tagged so downstream code can distinguish them from main-port sessions (similar to the load balancer port implementation).
- Sessions created from the priority port are tagged so downstream code can distinguish them from
main-port sessions (similar to the load balancer port implementation).
---
@ -47,27 +51,33 @@ Priority-port connections differ from normal connections in several ways.
When a new connection is accepted:
- Connections from the priority port are treated as **limit-exempt** in the session manager, reusing the existing exemption machinery used for CIDR-based exemptions.
- Connections from the priority port are treated as **limit-exempt** in the session manager, reusing
the existing exemption machinery used for CIDR-based exemptions.
- These connections can continue to be created even when the normal connection limit is reached.
Metrics:
- `serverStatus.connections.priority` counts current connections on the priority port only.
- These connections are also included in `connections.limitExempt` (along with CIDR-based exemptions).
- These connections are also included in `connections.limitExempt` (along with CIDR-based
exemptions).
## Rate limiters
Two ingress-side rate limiters recognize priority-port exemptions:
- [**SessionEstablishmentRateLimiter**](../src/mongo/db/admission/README.md#session-establishment-rate-limiter) (connection establishment)
- [**IngressRequestRateLimiter**](../src/mongo/db/admission/README.md#ingress-request-rate-limiting) (request rate limiting)
- [**SessionEstablishmentRateLimiter**](../src/mongo/db/admission/README.md#session-establishment-rate-limiter)
(connection establishment)
- [**IngressRequestRateLimiter**](../src/mongo/db/admission/README.md#ingress-request-rate-limiting)
(request rate limiting)
## Logging and profiling
For observability and debugging, the server records whether an operation came through the priority port:
For observability and debugging, the server records whether an operation came through the priority
port:
- `CurOp` / currentOp output includes a flag indicating the connection is from the priority port.
- Slow query log and profiler entries include whether the operation was executed via a priority-port connection.
- Slow query log and profiler entries include whether the operation was executed via a priority-port
connection.
- Client summary reports also distinguish clients on the main vs priority port.
---
@ -79,7 +89,8 @@ For observability and debugging, the server records whether an operation came th
To connect to a replica set via the priority port, a user must:
- Use a connection string that points directly at a specific host and priority port.
- Set `directConnection=true` to disable SDAM and prevent the driver from using hello-based host discovery, which currently does not advertise the priority port.
- Set `directConnection=true` to disable SDAM and prevent the driver from using hello-based host
discovery, which currently does not advertise the priority port.
Example:
@ -92,11 +103,14 @@ mongodb://hostA:27018/?directConnection=true
For `mongos`:
- You may connect directly to the `mongos` priority port.
- `directConnection=true` is **not required** for `mongos` connections, since SDAM is not used in the same way.
- `directConnection=true` is **not required** for `mongos` connections, since SDAM is not used in
the same way.
Important limitation:
- **Priority does not automatically propagate**:
- If a client connects to a `mongos` via the priority port and `mongos` forwards a command to shards, those shard-side connections still use the main ports and do **not** inherit priority-port behavior in the current implementation.
- If a client connects to a `mongos` via the priority port and `mongos` forwards a command to
shards, those shard-side connections still use the main ports and do **not** inherit
priority-port behavior in the current implementation.
---

View File

@ -37,9 +37,9 @@ Users can set or modify a server parameter at startup and/or runtime, depending
specified for `set_at`. For instance, `logLevel` may be set at both startup and runtime, as
indicated by `set_at` (see the above code snippet).
At startup, server parameters may be set using the `--setParameter` command line option.
At runtime, the `setParameter` command may be used to modify server parameters.
See the [`setParameter` documentation][set-parameter] for details.
At startup, server parameters may be set using the `--setParameter` command line option. At runtime,
the `setParameter` command may be used to modify server parameters. See the [`setParameter`
documentation][set-parameter] for details.
## How to get the value provided for a parameter
@ -99,27 +99,28 @@ must be unique across the server instance. More information on the specific fiel
- `set_at` (required): Must contain the value `startup`, `runtime`, [`startup`, `runtime`], or
`cluster`. If `runtime` is specified along with `cpp_varname`, then `decltype(cpp_varname)` must
refer to a thread-safe storage type, specifically: `Atomic<T>`, `std::atomic<T>`,
or `boost::synchronized<T>`. Parameters declared as `cluster` can only be set at runtime and exhibit
refer to a thread-safe storage type, specifically: `Atomic<T>`, `std::atomic<T>`, or
`boost::synchronized<T>`. Parameters declared as `cluster` can only be set at runtime and exhibit
numerous differences. See [Cluster Server Parameters](cluster-server-parameters) below.
- `description` (required): Free-form text field currently used only for commenting the generated C++
code. Future uses may preserve this value for a possible `{listSetParameters:1}` command or other
programmatic and potentially user-facing purposes.
- `description` (required): Free-form text field currently used only for commenting the generated
C++ code. Future uses may preserve this value for a possible `{listSetParameters:1}` command or
other programmatic and potentially user-facing purposes.
- `cpp_vartype`: Declares the full storage type. If `cpp_vartype` is not defined, it may be inferred
from the C++ variable referenced by `cpp_varname`.
- `cpp_varname`: Declares the underlying variable or C++ `struct` member to use when setting or reading the
server parameter. If defined together with `cpp_vartype`, the storage will be declared as a global
variable, and externed in the generated header file. If defined alone, a variable of this name will
assume to have been declared and defined by the implementer, and its type will be automatically
inferred at compile time. If `cpp_varname` is not defined, then `cpp_class` must be specified.
- `cpp_varname`: Declares the underlying variable or C++ `struct` member to use when setting or
reading the server parameter. If defined together with `cpp_vartype`, the storage will be declared
as a global variable, and externed in the generated header file. If defined alone, a variable of
this name will assume to have been declared and defined by the implementer, and its type will be
automatically inferred at compile time. If `cpp_varname` is not defined, then `cpp_class` must be
specified.
- `cpp_class`: Declares a custom `ServerParameter` class in the generated header using the provided
string, or the name field in the associated map. The declared class will require an implementation
of `setFromString()`, and optionally `set()`, `append()`, and a constructor.
See [Specialized Server Parameters](#specialized-server-parameters) below.
of `setFromString()`, and optionally `set()`, `append()`, and a constructor. See
[Specialized Server Parameters](#specialized-server-parameters) below.
- `default`: String or expression map representation of the initial value.
@ -127,10 +128,10 @@ must be unique across the server instance. More information on the specific fiel
This is a required field and must be explicitly set to `false` to disable redaction.
- `omit_in_ftdc`: Only applies to cluster parameters. If set to `true`, then the cluster parameter
will be omitted when `getClusterParameter` is invoked with `omitInFTDC: true`.
In practice, FTDC runs `getClusterParameter` with this option periodically to
collect configuration metadata about the server and setting this flag to true
for a cluster parameter ensures that its value(s) will not be exposed in FTDC.
will be omitted when `getClusterParameter` is invoked with `omitInFTDC: true`. In practice, FTDC
runs `getClusterParameter` with this option periodically to collect configuration metadata about
the server and setting this flag to true for a cluster parameter ensures that its value(s) will
not be exposed in FTDC.
- `test_only`: Set to `true` to disable this set parameter if `enableTestCommands` is not specified.
@ -141,26 +142,27 @@ must be unique across the server instance. More information on the specific fiel
new value has been stored. Prototype: `Status(const cpp_vartype&);`
- `condition`: Up to five conditional rules for deciding whether or not to apply this server
parameter. `preprocessor` will be evaluated first, followed by `constexpr`, then finally `expr`. If
no provided setting evaluates to `false`, the server parameter will be registered. `feature_flag` and
`min_fcv` are evaluated after the parameter is registered, and instead affect whether the parameter
is enabled. `min_fcv` is a string of the form `X.Y`, representing the minimum FCV version for which
this parameter should be enabled. `feature_flag` is the name of a feature flag variable upon which
this server parameter depends -- if the feature flag is disabled, this parameter will be disabled.
`feature_flag` should be removed when all other instances of that feature flag are deleted, which
typically is done after the next LTS version of the server is branched. `min_fcv` should be removed
after it is no longer possible to downgrade to a FCV lower than that version - this occurs when the
next LTS version of the server is branched.
parameter. `preprocessor` will be evaluated first, followed by `constexpr`, then finally `expr`.
If no provided setting evaluates to `false`, the server parameter will be registered.
`feature_flag` and `min_fcv` are evaluated after the parameter is registered, and instead affect
whether the parameter is enabled. `min_fcv` is a string of the form `X.Y`, representing the
minimum FCV version for which this parameter should be enabled. `feature_flag` is the name of a
feature flag variable upon which this server parameter depends -- if the feature flag is disabled,
this parameter will be disabled. `feature_flag` should be removed when all other instances of that
feature flag are deleted, which typically is done after the next LTS version of the server is
branched. `min_fcv` should be removed after it is no longer possible to downgrade to a FCV lower
than that version - this occurs when the next LTS version of the server is branched.
- `validator`: Zero or many validation rules to impose on the setting. All specified rules must pass
to consider the new setting valid. `lt`, `gt`, `lte`, `gte` fields provide for simple numeric limits
or expression maps which evaluate to numeric values. For all other validation cases, specify
callback as a C++ function or static method. Note that validation rules (including callback) may run
in any order. To perform an action after all validation rules have completed, `on_update` should be
preferred instead. Callback prototype: `Status(const cpp_vartype&, const boost::optional<TenantId>&);`
to consider the new setting valid. `lt`, `gt`, `lte`, `gte` fields provide for simple numeric
limits or expression maps which evaluate to numeric values. For all other validation cases,
specify callback as a C++ function or static method. Note that validation rules (including
callback) may run in any order. To perform an action after all validation rules have completed,
`on_update` should be preferred instead. Callback prototype:
`Status(const cpp_vartype&, const boost::optional<TenantId>&);`
- `is_deprecated`: Mark the server parameter as deprecated. Warns users if the server parameter
is ever used. Defaults to false.
- `is_deprecated`: Mark the server parameter as deprecated. Warns users if the server parameter is
ever used. Defaults to false.
Any symbols such as global variables or callbacks used by a server parameter must be imported using
the usual IDL machinery via `globals.cpp_includes`. Similarly, all generated code will be nested
@ -240,9 +242,8 @@ to any other work, this custom constructor must invoke its parent's constructor.
Status {name}::set(const BSONElement& val, const boost::optional<TenantId>& tenantId);
```
Otherwise the base class implementation `ServerParameter::set` is used. It
invokes `setFromString` using a string representation of `val`, if the `val` is
holding one of the supported types.
Otherwise the base class implementation `ServerParameter::set` is used. It invokes `setFromString`
using a string representation of `val`, if the `val` is holding one of the supported types.
`override_validate`: If `true`, the implementer must provide a `validate` member function as:
@ -261,8 +262,8 @@ must be provided with the following signature:
Status {name}::append(OperationContext*, BSONObjBuilder*, StringData, const boost::optional<TenantId>& tenantId);
```
`override_warn_if_deprecated`: If `true`, allows a custom warnIfDeprecated() method to be defined, defaults
to `false`.
`override_warn_if_deprecated`: If `true`, allows a custom warnIfDeprecated() method to be defined,
defaults to `false`.
Lastly, a `setFromString` method must always be provided with the following signature:
@ -318,17 +319,17 @@ preferred to implementing custom parameter propagation whenever possible.
`setClusterParameter` persists the new value of the indicated cluster server parameter onto a
majority of nodes on non-sharded replica sets. On sharded clusters, it majority-writes the new value
onto every shard and the config server. This ensures that every **mongod** in the cluster will be able
to recover the most recently written value for all cluster server parameters on restart.
onto every shard and the config server. This ensures that every **mongod** in the cluster will be
able to recover the most recently written value for all cluster server parameters on restart.
Additionally, `setClusterParameter` blocks until the majority write succeeds in a replica set
deployment, which guarantees that the parameter value will not be rolled back after being set.
In a sharded cluster deployment, the new value has to be majority-committed on the config shard and
deployment, which guarantees that the parameter value will not be rolled back after being set. In a
sharded cluster deployment, the new value has to be majority-committed on the config shard and
locally-committed on all other shards.
The cluster parameters are persisted in the `config.clusterParameters` collections and cached in
memory on every **mongod**. The cache updates are done by the `ClusterServerParameterOpObserver` class.
Every **mongos** also maintains an in-memory cache by polling the config server for updated cluster
server parameter values every `clusterServerParameterRefreshIntervalSecs` using the
memory on every **mongod**. The cache updates are done by the `ClusterServerParameterOpObserver`
class. Every **mongos** also maintains an in-memory cache by polling the config server for updated
cluster server parameter values every `clusterServerParameterRefreshIntervalSecs` using the
`ClusterParameterRefresher` periodic job.
`getClusterParameter` returns the cached value of the requested cluster server parameter on the node
@ -347,10 +348,10 @@ following members to the resulting type:
was updated; used by runtime audit configuration, and to prevent concurrent and redundant cluster
parameter updates.
It is highly recommended to specify validation rules or a callback function via the `param.validator`
field. These validators are called before the new value of the cluster server parameter is written
to disk during `setClusterParameter`.
See [server_parameter_with_storage_test.idl][cluster-server-param-with-storage-test] and
It is highly recommended to specify validation rules or a callback function via the
`param.validator` field. These validators are called before the new value of the cluster server
parameter is written to disk during `setClusterParameter`. See
[server_parameter_with_storage_test.idl][cluster-server-param-with-storage-test] and
[server_parameter_with_storage_test_structs.idl][cluster-server-param-with-storage-test-structs] for
examples.
@ -394,21 +395,21 @@ Tue `reset()` method must be implemented and should update the cluster server pa
default value.
All cluster server parameters are tenant-aware, meaning that on serverless clusters, each tenant has
an isolated set of parameters. The `setClusterParameter` and `getClusterParameter` commands will pass
the `tenantId` on the command request to the `ServerParameter`'s methods. On dedicated
an isolated set of parameters. The `setClusterParameter` and `getClusterParameter` commands will
pass the `tenantId` on the command request to the `ServerParameter`'s methods. On dedicated
(non-serverless) clusters, `boost::none` will be passed. IDL-defined cluster server parameters will
handle the passed-in `tenantId` automatically and store separate parameter values per-tenant.
Specialized server parameters will have to take care to correctly handle the passed-in `tenantId` and
to enforce tenant isolation.
Specialized server parameters will have to take care to correctly handle the passed-in `tenantId`
and to enforce tenant isolation.
Like normal server parameters, cluster server parameters can be defined to be dependent on a minimum
FCV version or a specific feature flag using the `condition.min_fcv` and `condition.feature_flag` syntax discussed
above. During FCV downgrade, the cluster parameter value stored on disk will be deleted if either:
(1) The downgraded FCV is lower than the cluster parameter's `min_fcv`, or (2) The cluster
parameter's `feature_flag` is disabled on the downgraded FCV. While a cluster server parameter is
disabled due to either of these conditions, `setClusterParameter` on it will always fail, and
`getClusterParameter` will fail on **mongod**, and return the default value on **mongos** -- this
difference in behavior is due to **mongos** being unaware of the current FCV.
FCV version or a specific feature flag using the `condition.min_fcv` and `condition.feature_flag`
syntax discussed above. During FCV downgrade, the cluster parameter value stored on disk will be
deleted if either: (1) The downgraded FCV is lower than the cluster parameter's `min_fcv`, or (2)
The cluster parameter's `feature_flag` is disabled on the downgraded FCV. While a cluster server
parameter is disabled due to either of these conditions, `setClusterParameter` on it will always
fail, and `getClusterParameter` will fail on **mongod**, and return the default value on **mongos**
-- this difference in behavior is due to **mongos** being unaware of the current FCV.
See [server_parameter_specialized_test.idl][specialized-cluster-server-param-test-idl] and
[server_parameter_specialized_test.h][specialized-cluster-server-param-test-data] for examples.
@ -582,9 +583,11 @@ classDiagram
[parameters.idl]: ../src/mongo/db/commands/parameters.idl
[set-parameter]: https://docs.mongodb.com/manual/reference/parameters/#synopsis
[get-parameter]: https://docs.mongodb.com/manual/reference/command/getParameter/#getparameter
[quiet-param]: https://github.com/mongodb/mongo/search?q=serverGlobalParams+quiet+extension:idl&type=code
[quiet-param]:
https://github.com/mongodb/mongo/search?q=serverGlobalParams+quiet+extension:idl&type=code
[ftdc-file-size-param]: ../src/mongo/db/ftdc/ftdc_server.idl
[cluster-server-param-with-storage-test]: ../src/mongo/idl/server_parameter_with_storage_test.idl
[cluster-server-param-with-storage-test-structs]: ../src/mongo/idl/server_parameter_with_storage_test_structs.idl
[cluster-server-param-with-storage-test-structs]:
../src/mongo/idl/server_parameter_with_storage_test_structs.idl
[specialized-cluster-server-param-test-idl]: ../src/mongo/idl/server_parameter_specialized_test.idl
[specialized-cluster-server-param-test-data]: ../src/mongo/idl/server_parameter_specialized_test.h

View File

@ -1,7 +1,7 @@
# Test Commands
All test commands are denoted with the `.testOnly()` modifier to the `MONGO_REGISTER_COMMAND` invocation.
For example:
All test commands are denoted with the `.testOnly()` modifier to the `MONGO_REGISTER_COMMAND`
invocation. For example:
```c++
MONGO_REGISTER_COMMAND(EchoCommand).testOnly();
@ -9,9 +9,9 @@ MONGO_REGISTER_COMMAND(EchoCommand).testOnly();
## How to enable
To be able to run these commands, the server must be started with the `enableTestCommands=1`
server parameter (e.g. `--setParameter enableTestCommands=1`). Resmoke.py often sets this server
parameter for testing.
To be able to run these commands, the server must be started with the `enableTestCommands=1` server
parameter (e.g. `--setParameter enableTestCommands=1`). Resmoke.py often sets this server parameter
for testing.
## Examples

View File

@ -1,7 +1,7 @@
# Testing
Most tests for MongoDB are run through resmoke, our test runner and orchestration tool.
The entry point for resmoke can be found at `buildscripts/resmoke.py`
Most tests for MongoDB are run through resmoke, our test runner and orchestration tool. The entry
point for resmoke can be found at `buildscripts/resmoke.py`
## Concepts
@ -9,9 +9,12 @@ Learn more about related topics using their own targeted documentation:
- [resmoke](../../buildscripts/resmokelib/README.md), the test runner
- [suites](../../buildscripts/resmokeconfig/suites/README.md), how tests are grouped and configured
- [fixtures](../../buildscripts/resmokelib/testing/fixtures/README.md), specify the server topology that tests run against
- [hooks](../../buildscripts/resmokelib/testing/hooks/README.md), logic to run before, after and/or between individual tests
- [testcases](../../buildscripts/resmokelib/testing/testcases/README.md), Python-based unittest interfaces that resmoke can run as different "kinds" of tests.
- [fixtures](../../buildscripts/resmokelib/testing/fixtures/README.md), specify the server topology
that tests run against
- [hooks](../../buildscripts/resmokelib/testing/hooks/README.md), logic to run before, after and/or
between individual tests
- [testcases](../../buildscripts/resmokelib/testing/testcases/README.md), Python-based unittest
interfaces that resmoke can run as different "kinds" of tests.
## Basic Example
@ -35,4 +38,7 @@ Now, **run the test content** from one test file:
buildscripts/resmoke.py run --suites=no_passthrough jstests/noPassthrough/shell/js/string.js
```
The suite defined in [buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml) includes that `string.js` file via glob selections, specifies no fixtures, no hooks, and a minimal config for the executor.
The suite defined in
[buildscripts/resmokeconfig/suites/no_passthrough.yml](../../buildscripts/resmokeconfig/suites/no_passthrough.yml)
includes that `string.js` file via glob selections, specifies no fixtures, no hooks, and a minimal
config for the executor.

View File

@ -2,80 +2,69 @@
## Overview
The FSM tests are meant to exercise concurrency within MongoDB. The suite
consists of workloads, which define discrete units of work as states in a FSM,
and runners, which define which tests to run and how they should be run. Each
workload defines states, which are JS functions that perform some meaningful
series of tasks and assertions, and transitions, which define how to move
between those states. A single workload begins by executing its setup function,
which is called once during the runner's thread of execution. Next, the runner
generates the number of threads specified by the workload, and each spawned
thread executes the start state (typically named "init") defined by the
workload. From this point on, each worker thread executes its own independent
copy of the FSM, and will randomly move between states (after executing the
function) based on the probabilities defined in the workload's transition table.
Each worker thread continues doing so until the number of transitions it makes
has reached the number of iterations defined by the workload. Once all the
worker threads have finished, the runner executes the workload's teardown
function.
The FSM tests are meant to exercise concurrency within MongoDB. The suite consists of workloads,
which define discrete units of work as states in a FSM, and runners, which define which tests to run
and how they should be run. Each workload defines states, which are JS functions that perform some
meaningful series of tasks and assertions, and transitions, which define how to move between those
states. A single workload begins by executing its setup function, which is called once during the
runner's thread of execution. Next, the runner generates the number of threads specified by the
workload, and each spawned thread executes the start state (typically named "init") defined by the
workload. From this point on, each worker thread executes its own independent copy of the FSM, and
will randomly move between states (after executing the function) based on the probabilities defined
in the workload's transition table. Each worker thread continues doing so until the number of
transitions it makes has reached the number of iterations defined by the workload. Once all the
worker threads have finished, the runner executes the workload's teardown function.
![fsm.png](../images/testing/fsm.png)
The runner provides two modes of execution for workloads: serial and parallel.
Serial mode runs the provided workloads one after the other,
waiting for all threads of a workload to complete before moving on to the next
workload. Parallel mode runs subsets of the provided workloads in separate
The runner provides two modes of execution for workloads: serial and parallel. Serial mode runs the
provided workloads one after the other, waiting for all threads of a workload to complete before
moving on to the next workload. Parallel mode runs subsets of the provided workloads in separate
threads simultaneously.
New methods were added to allow for finer-grained assertions under different
situations. For example, a test that inserts a document into a collection, and
wants to assert its existence will fail if another test removes that document.
One option would have been to disable all assertions when running a mixture of
different workloads together, but doing so would make the system incapable of
detecting anything other than server crashes. Another option would have been to
design the workloads to be conflict-free (e.g. writing to separate collections,
using commutative operators), but this would leave large gaps in the achievable
test coverage. Neither of those options were found to be very appealing.
Instead, we chose to introduce the concept of an "assertion level" that acts as
a precondition for when an assertion is evaluated. This allows us to still make
some assertions, even when running a mixture of different workloads together.
There are three assertion levels: `ALWAYS`, `OWN_COLL`, and `OWN_DB`. They can
be thought of as follows:
New methods were added to allow for finer-grained assertions under different situations. For
example, a test that inserts a document into a collection, and wants to assert its existence will
fail if another test removes that document. One option would have been to disable all assertions
when running a mixture of different workloads together, but doing so would make the system incapable
of detecting anything other than server crashes. Another option would have been to design the
workloads to be conflict-free (e.g. writing to separate collections, using commutative operators),
but this would leave large gaps in the achievable test coverage. Neither of those options were found
to be very appealing. Instead, we chose to introduce the concept of an "assertion level" that acts
as a precondition for when an assertion is evaluated. This allows us to still make some assertions,
even when running a mixture of different workloads together. There are three assertion levels:
`ALWAYS`, `OWN_COLL`, and `OWN_DB`. They can be thought of as follows:
- `ALWAYS`: A statement that remains unequivocally true, regardless of what
another workload might be doing to the collection I was given (hint: think
defensively). Examples include "1 = 1" or inserting a document into a
collection (disregarding any unique indices).
- `ALWAYS`: A statement that remains unequivocally true, regardless of what another workload might
be doing to the collection I was given (hint: think defensively). Examples include "1 = 1" or
inserting a document into a collection (disregarding any unique indices).
- `OWN_COLL`: A statement that is true only if I am the only workload operating
on the collection I was given. Examples include counting the number of
documents in a collection or updating a previously inserted document.
- `OWN_COLL`: A statement that is true only if I am the only workload operating on the collection I
was given. Examples include counting the number of documents in a collection or updating a
previously inserted document.
- `OWN_DB`: A statement that is true only if I am the only workload operating on
the database I was given. Examples include renaming a collection or verifying
that a collection is capped. The workload typically relies on the use of
another collection aside from the one given.
- `OWN_DB`: A statement that is true only if I am the only workload operating on the database I was
given. Examples include renaming a collection or verifying that a collection is capped. The
workload typically relies on the use of another collection aside from the one given.
## Creating your own workload
All workloads are stored in `jstests/concurrency/fsm_workloads` and as specific
examples you can refer to
All workloads are stored in `jstests/concurrency/fsm_workloads` and as specific examples you can
refer to
1. `jstests/concurrency/fsm_example.js`
1. `jstests/concurrency/fsm_example_inheritance.js`
for writing new workloads. Every workload is loaded in as inline JavaScript
using the "load" function, which is a lot more like a `#include` than
`require.js`. This means that whatever variables are declared in the global
scope of the file will become part of the scope where load is called. The runner
will be looking for a variable called `$config` which will store the
for writing new workloads. Every workload is loaded in as inline JavaScript using the "load"
function, which is a lot more like a `#include` than `require.js`. This means that whatever
variables are declared in the global scope of the file will become part of the scope where load is
called. The runner will be looking for a variable called `$config` which will store the
configuration of your workload.
### The $config object
There should be exactly one `$config` per workload. For style consistency as
well as safety, be sure to wrap the value of `$config` in an anonymous function.
This will create a JS closure and a new scope:
There should be exactly one `$config` per workload. For style consistency as well as safety, be sure
to wrap the value of `$config` in an anonymous function. This will create a JS closure and a new
scope:
```javascript
$config = (function() {
@ -93,19 +82,17 @@ $config = (function() {
)();
```
When finished executing, `$config` must return an object containing the properties
above (some of which are optional, see below).
When finished executing, `$config` must return an object containing the properties above (some of
which are optional, see below).
### Defining states
It's best to also declare states within its own closure so as not to interfere
with the scope of $config. Each state takes two arguments, the db object and the
collection name. For later, note that this db and collection are the only one
that you can be guaranteed to "own" when asserting. Try to make each state a
discrete unit of work that can stand alone without the other states.
Additionally, try to define each function that makes up a state
with a name as opposed to anonymously - this makes easier to read backtraces
when things go wrong.
It's best to also declare states within its own closure so as not to interfere with the scope of
$config. Each state takes two arguments, the db object and the collection name. For later, note that
this db and collection are the only one that you can be guaranteed to "own" when asserting. Try to
make each state a discrete unit of work that can stand alone without the other states. Additionally,
try to define each function that makes up a state with a name as opposed to anonymously - this makes
easier to read backtraces when things go wrong.
```javascript
$config = (function () {
@ -146,14 +133,12 @@ $config = (function () {
### Defining transitions
The transitions object defines the probabilities of moving from one state to a
different state. When a state's function is finished executing, the FSM randomly
chooses the next state using the probabilities provided in the transitions
object. The probabilities of the transitions object do not necessarily need to
sum to 1.0, since the mechanism for choosing the next state uses normalized
random values. Here it is not necessary to use a separate closure. In the
example below, we're denoting an equal probability of moving to either of the
scan states from the init state:
The transitions object defines the probabilities of moving from one state to a different state. When
a state's function is finished executing, the FSM randomly chooses the next state using the
probabilities provided in the transitions object. The probabilities of the transitions object do not
necessarily need to sum to 1.0, since the mechanism for choosing the next state uses normalized
random values. Here it is not necessary to use a separate closure. In the example below, we're
denoting an equal probability of moving to either of the scan states from the init state:
```javascript
$config = (function () {
@ -174,15 +159,13 @@ $config = (function () {
### Setup and teardown functions
The setup and teardown functions are special in that they'll only be executed in
one thread. See the Runners section for more information about when they're run
relative to other workloads in various modes. The setup and teardown functions
take three arguments: db, coll, and cluster. The setup function (and
corresponding teardown) should perform most of the initialization your workload
needs, for example setting parameters on the server, adding seed data, or
setting up indexes. Note that rather than executing adminCommands (and others)
against the provided `db` you should use the provided
`cluster.executeOnMongodNodes` and `cluster.executeOnMongosNodes` functionality.
The setup and teardown functions are special in that they'll only be executed in one thread. See the
Runners section for more information about when they're run relative to other workloads in various
modes. The setup and teardown functions take three arguments: db, coll, and cluster. The setup
function (and corresponding teardown) should perform most of the initialization your workload needs,
for example setting parameters on the server, adding seed data, or setting up indexes. Note that
rather than executing adminCommands (and others) against the provided `db` you should use the
provided `cluster.executeOnMongodNodes` and `cluster.executeOnMongosNodes` functionality.
```javascript
$config = (function () {
@ -224,18 +207,16 @@ $config = (function () {
### The `data` object
The `data` object preserves information between different states of an FSM within
an individual thread. Within a single state, the data object becomes the 'this'
context in which the state executes. Additionally, a tid attribute is added to
data by the runner to allow each thread to access a unique ID. Data is usually
defined above states inside the config, but left below it in the returned
object. Data is also available as the 'this' context in setup and teardown
functions. Note that once the FSM begins, the context data that was passed to
the setup function is copied into each thread - meaning each thread has its own
copy of the data and modifications to data will not be passed back to the
teardown function outside of what was changed in setup. Additionally, in
composition, each workload has its own data, meaning you don't have to worry
about properties being overridden by workloads other than the current one.
The `data` object preserves information between different states of an FSM within an individual
thread. Within a single state, the data object becomes the 'this' context in which the state
executes. Additionally, a tid attribute is added to data by the runner to allow each thread to
access a unique ID. Data is usually defined above states inside the config, but left below it in the
returned object. Data is also available as the 'this' context in setup and teardown functions. Note
that once the FSM begins, the context data that was passed to the setup function is copied into each
thread - meaning each thread has its own copy of the data and modifications to data will not be
passed back to the teardown function outside of what was changed in setup. Additionally, in
composition, each workload has its own data, meaning you don't have to worry about properties being
overridden by workloads other than the current one.
```javascript
$config = (function () {
@ -255,57 +236,50 @@ $config = (function () {
#### `threadCount`
threadCount is the number of threads that will be used to run your workload in
Serial and Parallel modes. In both modes, the number of threads you provide will
execute the FSM simultaneously, cycling through different states of the
workload. Note that in serial mode, no other threads will be running outside of
those pertaining to this workload, and in parallel mode, other workloads will
also be given threads to execute their FSM. In some cases in parallel mode, this
number will be scaled down to make sure that all workloads can fit within the
number of threads available due to system or performance constraints.
threadCount is the number of threads that will be used to run your workload in Serial and Parallel
modes. In both modes, the number of threads you provide will execute the FSM simultaneously, cycling
through different states of the workload. Note that in serial mode, no other threads will be running
outside of those pertaining to this workload, and in parallel mode, other workloads will also be
given threads to execute their FSM. In some cases in parallel mode, this number will be scaled down
to make sure that all workloads can fit within the number of threads available due to system or
performance constraints.
#### `iterations`
This is just the number of states the FSM will go through before exiting. NOTE:
it is _not_ the number of times each state will be executed.
This is just the number of states the FSM will go through before exiting. NOTE: it is _not_ the
number of times each state will be executed.
#### `startState` (optional)
Default value is 'init'. If your workload does not have an init state than you
must specify in which state to begin.
Default value is 'init'. If your workload does not have an init state than you must specify in which
state to begin.
### Workload helpers
`jstests/concurrency/fsm_workload_helpers` contains a few files that you can
include using 'load' at the top of a workload. These provide auxiliary
functionality that might be necessary for some workloads. The most important of
which is probably server_types.js
`jstests/concurrency/fsm_workload_helpers` contains a few files that you can include using 'load' at
the top of a workload. These provide auxiliary functionality that might be necessary for some
workloads. The most important of which is probably server_types.js
#### server_types.js
This helper file contains four functions: isMongos, isMongod, isMMAPv1, and
isWiredTiger. These can be used to restrict operations on different
functionality available in sharded environments, as well as based on storage
engine, and work as you would expect. One thing to note is that before calling
either isMMAPv1 or isWiredTiger, first verify isMongod. When special casing
functionality for sharded environments or storage engines, try to special case a
test for the exceptionality while still leaving in place assertions for either
case.
This helper file contains four functions: isMongos, isMongod, isMMAPv1, and isWiredTiger. These can
be used to restrict operations on different functionality available in sharded environments, as well
as based on storage engine, and work as you would expect. One thing to note is that before calling
either isMMAPv1 or isWiredTiger, first verify isMongod. When special casing functionality for
sharded environments or storage engines, try to special case a test for the exceptionality while
still leaving in place assertions for either case.
#### indexed_noindex.js
This helper can be used along with inheritance, to create a workload that is
exactly the same as an existing workload, but with the index created during
setup removed. In order to use this replace the function you provide to the
extendWorkload function with indexedNoindex. Additionally, ensure that the
workload you are extending has a function in its data object called
"getIndexSpec" that returns the spec for the index to be removed.
This helper can be used along with inheritance, to create a workload that is exactly the same as an
existing workload, but with the index created during setup removed. In order to use this replace the
function you provide to the extendWorkload function with indexedNoindex. Additionally, ensure that
the workload you are extending has a function in its data object called "getIndexSpec" that returns
the spec for the index to be removed.
```javascript
import {extendWorkload} from "jstests/concurrency/fsm_libs/extend_workload.js";
load(
"jstests/concurrency/fsm_workload_modifiers/collection_write_path/indexed_noindex.js",
); // for indexedNoindex
load("jstests/concurrency/fsm_workload_modifiers/collection_write_path/indexed_noindex.js"); // for indexedNoindex
import {$config as $baseConfig} from "jstests/concurrency/fsm_workloads/workload_with_index.js";
export const $config = extendWorkload($baseConfig, indexedNoIndex);
@ -313,90 +287,80 @@ export const $config = extendWorkload($baseConfig, indexedNoIndex);
#### drop_utils.js
These helpers provide safe methods for dropping collections, databases, roles,
and users created during a workload's execution. The methods take a regular
expression that the collection, database, role, or user name must match for it
to be dropped. Prefixing the items in any of these categories you create with a
prefix defined by your workload name is a good idea since the workload file name
can be assumed unique and will allow you to only affect your workload in these
cases.
These helpers provide safe methods for dropping collections, databases, roles, and users created
during a workload's execution. The methods take a regular expression that the collection, database,
role, or user name must match for it to be dropped. Prefixing the items in any of these categories
you create with a prefix defined by your workload name is a good idea since the workload file name
can be assumed unique and will allow you to only affect your workload in these cases.
## Test runners
By default, all runners below are allowed to open a maximum of
`maxAllowedConnections` (= 100 by default) explicit connections. In replicated
and sharded environments, implicit connections are created to the original
mongod provided to the mongo shell executing the runner (one for each thread).
This behavior cannot be controlled, but it highlights the importance of always
using the db object provided in the FSM states rather than the global db which
will always correspond to the mongod the mongo shell initially connected to.
By default, all runners below are allowed to open a maximum of `maxAllowedConnections` (= 100 by
default) explicit connections. In replicated and sharded environments, implicit connections are
created to the original mongod provided to the mongo shell executing the runner (one for each
thread). This behavior cannot be controlled, but it highlights the importance of always using the db
object provided in the FSM states rather than the global db which will always correspond to the
mongod the mongo shell initially connected to.
### Execution modes
#### Serial
Serial is the simplest of all three modes and basically works as explained
above. Setup is run single threaded, data is copied into multiple threads where
the states are executed, and once all the threads have finished a teardown
function is run and the runner moves onto the next workload.
Serial is the simplest of all three modes and basically works as explained above. Setup is run
single threaded, data is copied into multiple threads where the states are executed, and once all
the threads have finished a teardown function is run and the runner moves onto the next workload.
![fsm_serial_example.png](../images/testing/fsm_serial_example.png)
#### Parallel (Simultaneous)
In parallel or simultaneous mode (the naming convention has been slightly
inconsistent), the ordering becomes a little different. All workloads have their
setup functions run, then threads are spawned for each workload, and once they
all complete, all threads have their teardown function run.
In parallel or simultaneous mode (the naming convention has been slightly inconsistent), the
ordering becomes a little different. All workloads have their setup functions run, then threads are
spawned for each workload, and once they all complete, all threads have their teardown function run.
![fsm_simultaneous_example.png](../images/testing/fsm_simultaneous_example.png)
### Existing runners
The existing runners all use `jstests/concurrency/fsm_libs/runner.js` to
actually execute the workloads. Most information about arguments and available
runWorkloads methods can be found by inspecting the source. Below you can find
the existing runners explained. The first argument to the three runWorkloads
methods (each corresponding to a different run mode), is an array of workload
files to run. clusterOptions, the second argument to the runWorkloads functions,
is explained in the other components section below. Execution options for
runWorkloads functions, the third argument, can contain the following options
(some depend on the run mode):
The existing runners all use `jstests/concurrency/fsm_libs/runner.js` to actually execute the
workloads. Most information about arguments and available runWorkloads methods can be found by
inspecting the source. Below you can find the existing runners explained. The first argument to the
three runWorkloads methods (each corresponding to a different run mode), is an array of workload
files to run. clusterOptions, the second argument to the runWorkloads functions, is explained in the
other components section below. Execution options for runWorkloads functions, the third argument,
can contain the following options (some depend on the run mode):
- `numSubsets` - Not available in serial mode, determines how many subsets of
workloads to execute in parallel mode
- `subsetSize` - Not available in serial mode, determines how large each subset of
workloads executed is
- `numSubsets` - Not available in serial mode, determines how many subsets of workloads to execute
in parallel mode
- `subsetSize` - Not available in serial mode, determines how large each subset of workloads
executed is
#### fsm_all.js
Runs all workloads serially. For each workload, `$config.threadCount` threads
are spawned and each thread runs for exactly `$config.iterations` steps starting
at `$config.startState` and transitioning to other states based on the
transition probabilities defined in $config.transitions.
Runs all workloads serially. For each workload, `$config.threadCount` threads are spawned and each
thread runs for exactly `$config.iterations` steps starting at `$config.startState` and
transitioning to other states based on the transition probabilities defined in $config.transitions.
#### fsm_all_simultaneous.js
options: numSubsets, subsetSize
Runs numSubsets subsets of size subsetSize of all workloads. The workloads in
each subset are started in parallel and each workload is run according to
settings in `$config`.
Runs numSubsets subsets of size subsetSize of all workloads. The workloads in each subset are
started in parallel and each workload is run according to settings in `$config`.
#### fsm_all_replication.js
Sets up a replica set (with 3 mongods by default) and runs workloads serially or
in parallel. For example,
Sets up a replica set (with 3 mongods by default) and runs workloads serially or in parallel. For
example,
`runWorkloadsSerially([<workload1>, <workload2>, ...], { replication: true } )`
creates a replica set with 3 members and runs some workloads serially on the
primary.
creates a replica set with 3 members and runs some workloads serially on the primary.
#### fsm_all_sharded.js
Sets up a sharded cluster (with 2 shards and 1 mongos by default) and runs
workloads serially or in parallel. For example,
Sets up a sharded cluster (with 2 shards and 1 mongos by default) and runs workloads serially or in
parallel. For example,
`runWorkloadsInParallel([<workload1>, <workload2>, ...], { sharded: true } )`
@ -404,36 +368,33 @@ creates a sharded cluster and runs workloads in parallel.
#### fsm_all_sharded_replication.js
Sets up a sharded cluster (with 2 shards, each having 3 replica set members, and
1 mongos by default) and runs workloads serially or in parallel.
Sets up a sharded cluster (with 2 shards, each having 3 replica set members, and 1 mongos by
default) and runs workloads serially or in parallel.
### Excluding a workload
If any workloads fail because of known bugs in MongoDB, persistent MCI failures
or timeouts, the troublesome workload can be excluded from running by placing it
in the exclusion array in the corresponding runner. Please remember to place a
comment next to the excluded workload name identifying the reason a workload is
being excluded. For example,
If any workloads fail because of known bugs in MongoDB, persistent MCI failures or timeouts, the
troublesome workload can be excluded from running by placing it in the exclusion array in the
corresponding runner. Please remember to place a comment next to the excluded workload name
identifying the reason a workload is being excluded. For example,
`'agg_sort_external.js', // SERVER-16700 Deadlock on WiredTiger LSM`
Each file should also have two predefined sections - one for known bugs and one
for restrictions. The one above would be considered a known bug. However,
excluding a compact workload from sharded runners would be a restriction because
compact can only be run against individual mongods.
Each file should also have two predefined sections - one for known bugs and one for restrictions.
The one above would be considered a known bug. However, excluding a compact workload from sharded
runners would be a restriction because compact can only be run against individual mongods.
## Other components of the FSM library
Most of these components live in jstests/concurrency/fsm_libs and provide the
functionality used by the runner.
Most of these components live in jstests/concurrency/fsm_libs and provide the functionality used by
the runner.
### ThreadManager
Responsible for spawning and joining worker threads. Each spawned thread is
wrapped in a try/finally block to ensure that the database connection implicitly
created during the thread's execution is eventually closed explicitly. The
ThreadManager sets a random seed `([0, randInt(1e13))` which is the range of
`new Date().getTime())` before executing each workload.
Responsible for spawning and joining worker threads. Each spawned thread is wrapped in a try/finally
block to ensure that the database connection implicitly created during the thread's execution is
eventually closed explicitly. The ThreadManager sets a random seed `([0, randInt(1e13))` which is
the range of `new Date().getTime())` before executing each workload.
### Worker Thread
@ -441,36 +402,30 @@ Thread spawned by ThreadManager and used to run a Finite State Machine.
### Cluster
cluster.js is responsible for providing the cluster object that is passed to
setup and teardown functions, and the initial connection to a db to be used by
runner to pass to the workloads. For anything except for standalone, it makes
use of the shell's built-in cluster test helpers like `ShardingTest` and
`ReplSetTest`. clusterOptions are passed to cluster.js for initialization.
cluster.js is responsible for providing the cluster object that is passed to setup and teardown
functions, and the initial connection to a db to be used by runner to pass to the workloads. For
anything except for standalone, it makes use of the shell's built-in cluster test helpers like
`ShardingTest` and `ReplSetTest`. clusterOptions are passed to cluster.js for initialization.
clusterOptions include:
- `replication`: boolean, whether or not to use replication in the cluster
- `sameCollection`: boolean, whether or not all workloads are passed the same
collection
- `sameCollection`: boolean, whether or not all workloads are passed the same collection
- `sameDB`: boolean, whether or not all workloads are passed the same DB
- `setupFunctions`: object, containing at most two functions under the keys
'mongod' and 'mongos'. This allows you to run a function against all mongod or
mongos nodes in the cluster as part of the cluster initialization. Each
function takes a single argument, the db object against which configuration
can be run (will be set for each mongod/mongos)
- `setupFunctions`: object, containing at most two functions under the keys 'mongod' and 'mongos'.
This allows you to run a function against all mongod or mongos nodes in the cluster as part of the
cluster initialization. Each function takes a single argument, the db object against which
configuration can be run (will be set for each mongod/mongos)
- `sharded`: boolean, whether or not to use sharding in the cluster
Note that sameCollection and sameDB can increase contention for a resource, but
will also decrease the strength of the assertions by ruling out the use of OwnDB
and OwnColl assertions.
Note that sameCollection and sameDB can increase contention for a resource, but will also decrease
the strength of the assertions by ruling out the use of OwnDB and OwnColl assertions.
### Miscellaneous Execution Notes
- A `CountDownLatch` (exposed through the v8-based mongo shell, as of MongoDB 3.0)
is used as a synchronization primitive by the ThreadManager to wait until all
spawned threads have finished being spawned before starting workload
execution.
- If more than 20% of the threads fail while spawning, we abort the test. If
fewer than 20% of the threads fail while spawning we allow the non-failed
threads to continue with the test. The 20% threshold is somewhat arbitrary;
the goal is to abort if "mostly all" of the threads failed but to tolerate "a
few" threads failing.
- A `CountDownLatch` (exposed through the v8-based mongo shell, as of MongoDB 3.0) is used as a
synchronization primitive by the ThreadManager to wait until all spawned threads have finished
being spawned before starting workload execution.
- If more than 20% of the threads fail while spawning, we abort the test. If fewer than 20% of the
threads fail while spawning we allow the non-failed threads to continue with the test. The 20%
threshold is somewhat arbitrary; the goal is to abort if "mostly all" of the threads failed but to
tolerate "a few" threads failing.

View File

@ -1,37 +1,34 @@
# Hang Analyzer
The hang analyzer is a tool to collect cores and other information from processes
that are suspected to have hung. Any task which exceeds its timeout in Evergreen
will automatically be hang-analyzed, with information being written compressed
and uploaded to S3.
The hang analyzer is a tool to collect cores and other information from processes that are suspected
to have hung. Any task which exceeds its timeout in Evergreen will automatically be hang-analyzed,
with information being written compressed and uploaded to S3.
The hang analyzer can also be invoked locally at any time. For all non-Jepsen
tasks, the invocation is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python`. You may need to substitute `python` with the name of the python binary
you are using, which may be one of `python`, `python3`, or on Windows: `Python`,
`Python3`.
The hang analyzer can also be invoked locally at any time. For all non-Jepsen tasks, the invocation
is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python`. You may need to
substitute `python` with the name of the python binary you are using, which may be one of `python`,
`python3`, or on Windows: `Python`, `Python3`.
For jepsen tasks, the invocation is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -p dbtest,java,mongo,mongod,mongos,python,_test`.
For jepsen tasks, the invocation is
`buildscripts/resmoke.py hang-analyzer -o file -o stdout -p dbtest,java,mongo,mongod,mongos,python,_test`.
## Interesting Processes
The hang analyzer detects and runs against processes which are considered
interesting.
The hang analyzer detects and runs against processes which are considered interesting.
Tasks whose name contains "jepsen": any process whose name exactly matches one
of `dbtest,java,mongo,mongod,mongos,python,_test`.
Tasks whose name contains "jepsen": any process whose name exactly matches one of
`dbtest,java,mongo,mongod,mongos,python,_test`.
In all other scenarios, including local use of the hang-analyzer, an interesting
process is any of:
In all other scenarios, including local use of the hang-analyzer, an interesting process is any of:
- process that starts with `python` or `live-record`
- one which has been spawned as a child process of resmoke.
The resmoke subcommand `hang-analyzer` will send SIGUSR1/use SetEvent to signal
resmoke to:
The resmoke subcommand `hang-analyzer` will send SIGUSR1/use SetEvent to signal resmoke to:
- Print stack traces for all python threads
- Collect core dumps and other information for any non-python child
processes, see `Data Collection` below
- Collect core dumps and other information for any non-python child processes, see `Data Collection`
below
- Re-signal any python child processes to do the same
## Data Collection
@ -41,8 +38,8 @@ Data collection occurs in the following sequence:
- Pause all non-python processes
- Grab debug symbols on non-Sanitizer builds
- Signal python Processes
- Dump cores of as many processes as possible, until the disk quota is exceeded.
The default quota is 90% of total volume space.
- Dump cores of as many processes as possible, until the disk quota is exceeded. The default quota
is 90% of total volume space.
- Collect additional, non-core data. Ideally:
- Print C++ Stack traces
@ -54,13 +51,12 @@ Data collection occurs in the following sequence:
- Dump java processes (Jepsen tests) with jstack
- SIGABRT (Unix)/terminate (Windows) go processes
Note that the list of non-core data collected is only accurate on Linux. Other
platforms only perform a subset of these operations.
Note that the list of non-core data collected is only accurate on Linux. Other platforms only
perform a subset of these operations.
Additionally, note that the hang analyzer is subject to Evergreen post task
timeouts, and may not have enough time to collect all information before
being terminated by the Evergreen agent. When running locally there is no
timeout, and the hang analyzer may ironically hang indefinitely.
Additionally, note that the hang analyzer is subject to Evergreen post task timeouts, and may not
have enough time to collect all information before being terminated by the Evergreen agent. When
running locally there is no timeout, and the hang analyzer may ironically hang indefinitely.
### Implementations

View File

@ -2,11 +2,23 @@
## Overview
[Mongobridge](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L1) is a network fault injection testing tool that allows test authors to intentionally simulate network issues such as connection failures, message delays, or packet loss during communication to any node in a cluster. It acts as a transparent proxy between MongoDB processes and their clients, enabling controlled network fault injection for testing distributed system behavior.
[Mongobridge](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L1)
is a network fault injection testing tool that allows test authors to intentionally simulate network
issues such as connection failures, message delays, or packet loss during communication to any node
in a cluster. It acts as a transparent proxy between MongoDB processes and their clients, enabling
controlled network fault injection for testing distributed system behavior.
## How It Works
When `ReplSetTest` or `ShardingTest` are instructed to use `mongobridge`, they will [set up a mongobridge process](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/jstests/libs/replsettest.js#L2962) for each node that [creates a ProxiedConnection](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L323-L324) between the node and any clients (including other nodes in the cluster) attempting to communicate with it. When test authors send a command to a node, mongobridge [intercepts the command and applies any configured actions](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L395-L430) onto the commands before forwarding the command along to the node itself. This allows simple fault injection from the test author's perspective.
When `ReplSetTest` or `ShardingTest` are instructed to use `mongobridge`, they will
[set up a mongobridge process](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/jstests/libs/replsettest.js#L2962)
for each node that
[creates a ProxiedConnection](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L323-L324)
between the node and any clients (including other nodes in the cluster) attempting to communicate
with it. When test authors send a command to a node, mongobridge
[intercepts the command and applies any configured actions](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L395-L430)
onto the commands before forwarding the command along to the node itself. This allows simple fault
injection from the test author's perspective.
## Quick Start
@ -23,7 +35,8 @@ To use mongobridge in your tests:
});
```
- **Test commands must be enabled**: Mongobridge's `*From` commands require `enableTestCommands: true` (which is the default in test environments)
- **Test commands must be enabled**: Mongobridge's `*From` commands require
`enableTestCommands: true` (which is the default in test environments)
2. **Inject network faults** using bridge commands:
@ -38,11 +51,16 @@ To use mongobridge in your tests:
st.rs0.getPrimary().acceptConnectionsFrom(st.rs0.getSecondary());
```
3. Operations that depend on communication between the affected nodes will fail or timeout as expected.
3. Operations that depend on communication between the affected nodes will fail or timeout as
expected.
## What to keep in mind
Be aware that there are consequences to injecting network faults between nodes that can cause downstream impact in (for example) heartbeats, sync source selection, and SDAM, and so after a fault has been injected the test may not be in the state you expect it to be in for future commands. It is best to keep mongobridge tests relatively short and targeted to ensure that flakiness due to these faults doesn't impact the rest of your testing.
Be aware that there are consequences to injecting network faults between nodes that can cause
downstream impact in (for example) heartbeats, sync source selection, and SDAM, and so after a fault
has been injected the test may not be in the state you expect it to be in for future commands. It is
best to keep mongobridge tests relatively short and targeted to ensure that flakiness due to these
faults doesn't impact the rest of your testing.
## Command Reference
@ -71,7 +89,8 @@ node.acceptConnectionsFrom([node1, node2, node3]); // Multiple nodes
node.rejectConnectionsFrom(otherNode);
```
**Effect**: New connections are rejected, existing connections are closed when a new request is sent over them
**Effect**: New connections are rejected, existing connections are closed when a new request is sent
over them
**Use case**: Simulating complete network partitions
@ -183,7 +202,8 @@ primary.discardMessagesFrom(secondary, 0.3);
### Limitations
- **OP_QUERY exhaust**: Not supported for legacy exhaust queries (OP_MSG exhaust cursors are supported)
- **OP_QUERY exhaust**: Not supported for legacy exhaust queries (OP_MSG exhaust cursors are
supported)
- **Direct connections**: Only works when connections go through the bridge proxy
- **TLS support**: Mongobridge is not supported if the cluster is using TLS.

View File

@ -11,26 +11,32 @@ Using OTel we capture the following things
3. Duration of hooks before and after test/suite
4. Resmoke archiver (when there is a failure we archive core dumps)
To see this visually navigate to the [resmoke dataset](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/home) and view a recent trace.
To see this visually navigate to the
[resmoke dataset](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/home)
and view a recent trace.
## A look at source code
### Configuration
The bulk of configuration is done in the
`_set_up_tracing(...)` method in [configure_resmoke.py#L164](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/configure_resmoke.py#L164). This method includes documentation on how it works.
The bulk of configuration is done in the `_set_up_tracing(...)` method in
[configure_resmoke.py#L164](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/configure_resmoke.py#L164).
This method includes documentation on how it works.
## BatchedBaggageSpanProcessor
See documentation [batched_baggage_span_processor.py#L8](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/batched_baggage_span_processor.py#L8)
See documentation
[batched_baggage_span_processor.py#L8](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/batched_baggage_span_processor.py#L8)
## FileSpanExporter
See documentation [file_span_exporter.py#L16](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/file_span_exporter.py#L16)
See documentation
[file_span_exporter.py#L16](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/file_span_exporter.py#L16)
## Capturing Data
We mostly capture data by using a decorator on methods. Example taken from [job.py#L200](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L200)
We mostly capture data by using a decorator on methods. Example taken from
[job.py#L200](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L200)
```
TRACER = trace.get_tracer("resmoke")
@ -41,7 +47,11 @@ def func_name(...):
span.set_attribute("attr1", True)
```
This system is nice because the decorator captures exceptions and other failures and a user can never forget to close a span. On occasion we will also start a span using the `with` clause in python. However, the decorator method is preferred since the method below makes more of a readability impact on the code. This example is taken from [job.py#L215](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L215)
This system is nice because the decorator captures exceptions and other failures and a user can
never forget to close a span. On occasion we will also start a span using the `with` clause in
python. However, the decorator method is preferred since the method below makes more of a
readability impact on the code. This example is taken from
[job.py#L215](https://github.com/mongodb/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L215)
```
with TRACER.start_as_current_span("func_name", attributes={}):
@ -51,4 +61,9 @@ with TRACER.start_as_current_span("func_name", attributes={}):
## Insights We Have Made (so far)
Using [this dashboard](https://ui.honeycomb.io/mongodb-4b/environments/production/board/3bATQLb38bh/Server-CI) and [this query](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/result/GFa2YJ6d4vU/a/7EYuMJtH8KX/Slowest-Resmoke-Tests) we can see the most expensive single js tests. We plan to make tickets for teams to fix these long running tests for cloud savings as well as developer time savings.
Using
[this dashboard](https://ui.honeycomb.io/mongodb-4b/environments/production/board/3bATQLb38bh/Server-CI)
and
[this query](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/result/GFa2YJ6d4vU/a/7EYuMJtH8KX/Slowest-Resmoke-Tests)
we can see the most expensive single js tests. We plan to make tickets for teams to fix these long
running tests for cloud savings as well as developer time savings.

View File

@ -1,10 +1,14 @@
# Resmoke Module Configuration
This configuration allows additional modules to be added to Resmoke, providing more context about their associated directories. Modules can specify directories for fixtures, hooks, suites, and JavaScript tests, which Resmoke incorporates during its testing process.
This configuration allows additional modules to be added to Resmoke, providing more context about
their associated directories. Modules can specify directories for fixtures, hooks, suites, and
JavaScript tests, which Resmoke incorporates during its testing process.
## Adding a New Module
To add a new module to Resmoke, define the module name and specify its `fixture_dirs`, `hook_dirs`, `suite_dirs`, and `jstest_dirs` in the YAML configuration. Each field should be a list of directory paths.
To add a new module to Resmoke, define the module name and specify its `fixture_dirs`, `hook_dirs`,
`suite_dirs`, and `jstest_dirs` in the YAML configuration. Each field should be a list of directory
paths.
### Example YAML Configuration
@ -25,9 +29,12 @@ my_new_module:
- **`fixture_dirs`**: Directories containing fixtures associated with the module.
- **`hook_dirs`**: Directories containing hooks associated with the module.
- **`suite_dirs`**: Directories containing suites with test configurations.
- **`jstest_dirs`**: Directories containing JavaScript tests specific to the module. This ensures module-specific tests are excluded from other suite configurations when the module is disabled.
- **`jstest_dirs`**: Directories containing JavaScript tests specific to the module. This ensures
module-specific tests are excluded from other suite configurations when the module is disabled.
## Notes
- Any suite can use jstests from any directory, when the module is enabled the configured jstest dirs does nothing. Only when the module is disabled does it filter out the tests that might be configured in a suite from a different module.
- Any suite can use jstests from any directory, when the module is enabled the configured jstest
dirs does nothing. Only when the module is disabled does it filter out the tests that might be
configured in a suite from a different module.
- Fields can be omitted or empty lists

View File

@ -1,55 +1,48 @@
# Thread Pools
A thread pool ([Wikipedia][thread_pools_wikipedia]) accepts and executes
lightweight work items called "tasks", using a carefully managed group
of dedicated long-running worker threads. The worker threads perform
the work items in parallel without forcing each work item to assume the
burden of starting and destroying a dedicated thead.
A thread pool ([Wikipedia][thread_pools_wikipedia]) accepts and executes lightweight work items
called "tasks", using a carefully managed group of dedicated long-running worker threads. The worker
threads perform the work items in parallel without forcing each work item to assume the burden of
starting and destroying a dedicated thead.
## Classes
### `ThreadPoolInterface`
The [`ThreadPoolInterface`][thread_pool_interface.h] abstract interface is
an extension of the `OutOfLineExecutor` (see [the executors architecture
guide][executors]) abstract interface, adding `startup`, `shutdown`, and
`join` virtual member functions. It is the base class for our thread
pool classes.
The [`ThreadPoolInterface`][thread_pool_interface.h] abstract interface is an extension of the
`OutOfLineExecutor` (see [the executors architecture guide][executors]) abstract interface, adding
`startup`, `shutdown`, and `join` virtual member functions. It is the base class for our thread pool
classes.
### `ThreadPool`
[`ThreadPool`][thread_pool.h] is the most basic concrete thread pool. The
number of worker threads is adaptive, but configurable with a min/max
range. Idle worker threads are reaped (down to the configured min), while
new worker threads can be created when needed (up to the configured max).
[`ThreadPool`][thread_pool.h] is the most basic concrete thread pool. The number of worker threads
is adaptive, but configurable with a min/max range. Idle worker threads are reaped (down to the
configured min), while new worker threads can be created when needed (up to the configured max).
### `ThreadPoolTaskExecutor`
[`ThreadPoolTaskExecutor`][thread_pool_task_executor.h] is not a thread
pool, but rather a `TaskExecutor` that uses a `ThreadPoolInterface` and
a `NetworkInterface` to execute scheduled tasks. It's configured with a
`ThreadPoolInterface` over which it _takes_ ownership, and a
`NetworkInterface`, of which it _shares_ ownership. With these resources
it implements the elaborate `TaskExecutor` interface (see [executors]).
[`ThreadPoolTaskExecutor`][thread_pool_task_executor.h] is not a thread pool, but rather a
`TaskExecutor` that uses a `ThreadPoolInterface` and a `NetworkInterface` to execute scheduled
tasks. It's configured with a `ThreadPoolInterface` over which it _takes_ ownership, and a
`NetworkInterface`, of which it _shares_ ownership. With these resources it implements the elaborate
`TaskExecutor` interface (see [executors]).
### `NetworkInterfaceThreadPool`
[`NetworkInterfaceThreadPool`][network_interface_thread_pool.h] is a
thread pool implementation that doesn't actually own any worker threads.
It runs its tasks on the background thread of a
[`NetworkInterfaceThreadPool`][network_interface_thread_pool.h] is a thread pool implementation that
doesn't actually own any worker threads. It runs its tasks on the background thread of a
[`NetworkInterface`][network_interface.h].
Incoming tasks that are scheduled from the `NetworkInterface`'s thread
are run immediately. Otherwise they are queued to be run by the
`NetworkInterface` thread when it is available.
Incoming tasks that are scheduled from the `NetworkInterface`'s thread are run immediately.
Otherwise they are queued to be run by the `NetworkInterface` thread when it is available.
### `ThreadPoolMock`
[`ThreadPoolMock`][thread_pool_mock.h] is a `ThreadPoolInterface`. It is not
a mock of a `ThreadPool`. It has no configurable stored responses. It has
one worker thread and a pointer to a `NetworkInterfaceMock`, and with these
resources it simulates a thread pool well enough to be used by a
`ThreadPoolTaskExecutor` in unit tests.
[`ThreadPoolMock`][thread_pool_mock.h] is a `ThreadPoolInterface`. It is not a mock of a
`ThreadPool`. It has no configurable stored responses. It has one worker thread and a pointer to a
`NetworkInterfaceMock`, and with these resources it simulates a thread pool well enough to be used
by a `ThreadPoolTaskExecutor` in unit tests.
[thread_pools_wikipedia]: https://en.wikipedia.org/wiki/Thread_pool
[executors]: ../src/mongo/executor/README.md

View File

@ -1,13 +1,14 @@
Note: this doc is being continuously updated while changes are being made to the unit test framework.
Note: this doc is being continuously updated while changes are being made to the unit test
framework.
# Overview
# Features
The MongoDB unit test framework is a thin layer built atop GoogleTest, so most GoogleTest features
(see [Google Test documentation][google_test_docs]) are available for use aside from anything
listed out in [Banned Features](#banned-features). The unit testing framework also includes
enhanced reporting of test output (see
(see [Google Test documentation][google_test_docs]) are available for use aside from anything listed
out in [Banned Features](#banned-features). The unit testing framework also includes enhanced
reporting of test output (see
[Enhanced Reporting of Test Output](#enhanced-reporting-of-test-output)).
The core unittest features can be accessed by including the `mongo/unittest/unittest.h` header and
@ -18,8 +19,8 @@ using the `mongo_cc_unit_test` bazel rule.
### Parameterized tests
Parameterized tests are a GoogleTest feature that allows the same test logic to be run with
different values or types (see GoogleTest docs on
[Value-Parameterized Tests][value_parameterized_tests] and [Typed Tests][typed_tests]).
different values or types (see GoogleTest docs on [Value-Parameterized
Tests][value_parameterized_tests] and [Typed Tests][typed_tests]).
```cpp
class TestFixture :
@ -41,8 +42,8 @@ TEST_P(TestFixture, MongoTest) {
### GoogleMock
GoogleMock can be used by including the `mongo/unittest/unittest.h` header. You should never
directly include `<gmock/gmock.h>`. There are matchers for common mongo types such as `BSONObj`
in `mongo/unittest/matcher.h`.
directly include `<gmock/gmock.h>`. There are matchers for common mongo types such as `BSONObj` in
`mongo/unittest/matcher.h`.
## Banned Features
@ -63,9 +64,9 @@ GoogleTest fatal assertions, such as no fatal assertions allowed in non-void hel
## Enhanced Reporting of Test Output
The Enhanced Reporter improves test reporting by colorizing and formatting output, maintaining
a progress indicator, printing enhanced failure information, and suppressing log output on
passing tests.
The Enhanced Reporter improves test reporting by colorizing and formatting output, maintaining a
progress indicator, printing enhanced failure information, and suppressing log output on passing
tests.
These command line flags may be used to configure the Enhanced Reporter:
@ -74,9 +75,9 @@ These command line flags may be used to configure the Enhanced Reporter:
## Death Tests
The MongoDB unit testing framework uses `DEATH_TEST` (with `DEATH_TEST_F`, `DEATH_TEST_REGEX`,
and `DEATH_TEST_REGEX_F` variants) to test code that is expected to cause the process to
terminate. This should replace all uses of the `ASSERT_DEATH` macro from GoogleTest (see
The MongoDB unit testing framework uses `DEATH_TEST` (with `DEATH_TEST_F`, `DEATH_TEST_REGEX`, and
`DEATH_TEST_REGEX_F` variants) to test code that is expected to cause the process to terminate. This
should replace all uses of the `ASSERT_DEATH` macro from GoogleTest (see
[unittest/death_test.h][death_test_h] for more details).
Similar to GoogleTest, `DEATH_TEST` test suite names should be suffixed with `DeathTest`. For
@ -98,8 +99,10 @@ DEATH_TEST_F(FixtureNameDeathTest, TestName) {
}
```
[death_test_naming]: https://github.com/google/googletest/blob/main/docs/advanced.md#death-test-naming
[death_test_naming]:
https://github.com/google/googletest/blob/main/docs/advanced.md#death-test-naming
[death_test_h]: ../src/mongo/unittest/death_test.h
[google_test_docs]: https://github.com/google/googletest/blob/main/docs/primer.md
[value_parameterized_tests]: https://github.com/google/googletest/blob/main/docs/advanced.md#value-parameterized-tests
[value_parameterized_tests]:
https://github.com/google/googletest/blob/main/docs/advanced.md#value-parameterized-tests
[typed_tests]: https://github.com/google/googletest/blob/main/docs/advanced.md#typed-tests

View File

@ -56,9 +56,10 @@ Contact for more Information: https://www.mongodb.com/contact
### Note to 1194.22
The Board interprets paragraphs (a) through (k) of this section as consistent with the following
priority 1 Checkpoints of the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) (May 5 1999) published by the Web
Accessibility Initiative of the World Wide Web Consortium: Paragraph (a) - 1.1, (b) - 1.4, (c\) - 2.1, (d) - 6.1,
(e) - 1.2, (f) - 9.1, (g) - 5.1, (h) - 5.2, (i) - 12.1, (j) - 7.1, (k) - 11.4.
priority 1 Checkpoints of the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) (May 5 1999)
published by the Web Accessibility Initiative of the World Wide Web Consortium: Paragraph (a) - 1.1,
(b) - 1.4, (c\) - 2.1, (d) - 6.1, (e) - 1.2, (f) - 9.1, (g) - 5.1, (h) - 5.2, (i) - 12.1, (j) - 7.1,
(k) - 11.4.
## Section 1194.23 Telecommunications Products Detail

View File

@ -1,84 +1,160 @@
# Javascript Test Guide
At MongoDB we write integration tests in JavaScript. These are tests written to exercise some behavior of a running MongoDB server, replica set, or sharded cluster. This guide aims to provide some general guidelines and best practices on how to write good tests.
At MongoDB we write integration tests in JavaScript. These are tests written to exercise some
behavior of a running MongoDB server, replica set, or sharded cluster. This guide aims to provide
some general guidelines and best practices on how to write good tests.
## Principles
### Minimize the test case as much as possible while still exercising and testing the desired behavior.
- For example, if you are testing that document deletion works correctly, it may be entirely sufficient to insert just a single document and then delete that document. Inserting multiple documents would be unnecessary. A guiding principle on this is to ask yourself how easy it would be for a new person coming to this test to quickly understand it. If there are multiple documents being inserted into a collection, in a test that only tests document deletion, a newcomer might ask the question: “is it important that the test uses multiple documents, or incidental?”. It is best if you can remove these kinds of questions from a persons mind, by keeping only the absolute essential parts of a test.
- We should always strive for unittesting when possible, so if the functionality you want to test can be covered by a unit test, we should write a unit test instead.
- For example, if you are testing that document deletion works correctly, it may be entirely
sufficient to insert just a single document and then delete that document. Inserting multiple
documents would be unnecessary. A guiding principle on this is to ask yourself how easy it would
be for a new person coming to this test to quickly understand it. If there are multiple documents
being inserted into a collection, in a test that only tests document deletion, a newcomer might
ask the question: “is it important that the test uses multiple documents, or incidental?”. It is
best if you can remove these kinds of questions from a persons mind, by keeping only the absolute
essential parts of a test.
- We should always strive for unittesting when possible, so if the functionality you want to test
can be covered by a unit test, we should write a unit test instead.
### Add a block comment at the top of the JavaScript test file giving a clear and concise overview of what a test is trying to verify.
- For tests that are more complicated, a brief description of the test steps might be useful as well.
- For tests that are more complicated, a brief description of the test steps might be useful as
well.
### Keep debuggability in mind.
- Assertion error messages should contain all information relevant to debugging the test. This means the servers response from the failed command should almost always be included in the assertion error message. It can also be helpful to include parameters that vary during the test to avoid requiring the investigator to use the logs/backtrace to determine what the test was attempting to do.
- Think about how easy it would be to debug your test if something failed and a newcomer only had the logs of the test to look at. This can help guide your decision on what log messages to include and to what level of detail. The jsTestLog function is useful for this, as it is good at visually demarcating different phases of a test. As a tip, run your test a few times and just study the log messages, imagining you are an engineer debugging the test with only these logs to look at. Think about how understandable the logs would be to a newcomer. It is easy to add log messages to a test but then forget to see how they would actually appear.
- Never insert identical documents unless necessary. It is very useful in debugging to be able to figure out where a given piece of data came from.
- If a test does the same thing multiple times, consider factoring it out into a library. Shorter running tests are easier to debug and code duplication is always bad.
- Assertion error messages should contain all information relevant to debugging the test. This means
the servers response from the failed command should almost always be included in the assertion
error message. It can also be helpful to include parameters that vary during the test to avoid
requiring the investigator to use the logs/backtrace to determine what the test was attempting to
do.
- Think about how easy it would be to debug your test if something failed and a newcomer only had
the logs of the test to look at. This can help guide your decision on what log messages to include
and to what level of detail. The jsTestLog function is useful for this, as it is good at visually
demarcating different phases of a test. As a tip, run your test a few times and just study the log
messages, imagining you are an engineer debugging the test with only these logs to look at. Think
about how understandable the logs would be to a newcomer. It is easy to add log messages to a test
but then forget to see how they would actually appear.
- Never insert identical documents unless necessary. It is very useful in debugging to be able to
figure out where a given piece of data came from.
- If a test does the same thing multiple times, consider factoring it out into a library. Shorter
running tests are easier to debug and code duplication is always bad.
### Do not hardcode collection or database names, especially if they are used multiple times throughout a test.
It is best to use variable names that attempt to describe what a value is used for. For example, naming a variable that stores a collection named `collectionToDrop` is much better than just naming the variable `collName`.
It is best to use variable names that attempt to describe what a value is used for. For example,
naming a variable that stores a collection named `collectionToDrop` is much better than just naming
the variable `collName`.
### Make every effort to make your test as deterministic as possible.
- Non-deterministic tests add noise to our build system and, in general, make it harder for yourself and other engineers to determine if the system really is working correctly or not. Flaky integration tests should be considered bugs, and we should not allow them to be committed to the server codebase. One way to make jstests more deterministic is to use failpoints to force the events happening in expected order. However, if we have to use failpoints to make this test deterministic, we should consider write a unit test instead.
- Note that our fuzzer and concurrency test suites are often an exception to this rule. In those cases we sometimes give up some level of determinism in order to trigger a wider class of rare edge cases. For targeted JavaScript integration tests, however, highly deterministic tests should be the goal.
- Non-deterministic tests add noise to our build system and, in general, make it harder for yourself
and other engineers to determine if the system really is working correctly or not. Flaky
integration tests should be considered bugs, and we should not allow them to be committed to the
server codebase. One way to make jstests more deterministic is to use failpoints to force the
events happening in expected order. However, if we have to use failpoints to make this test
deterministic, we should consider write a unit test instead.
- Note that our fuzzer and concurrency test suites are often an exception to this rule. In those
cases we sometimes give up some level of determinism in order to trigger a wider class of rare
edge cases. For targeted JavaScript integration tests, however, highly deterministic tests should
be the goal.
### Think hard about all the assumptions that the test relies on.
- For example, if a certain phase of the test ran much slower or much faster, would it cause your test to fail for the wrong reason?
- If your test includes hard-coded timeouts, make sure they are set appropriately. If a test is waiting for a certain condition to be true, and the test should not proceed until that condition is met, it is often correct to just wait “indefinitely”, instead of adding some arbitrary timeout value, like 30 seconds. In practice this usually means setting some reasonable upper limit, for example, 10 minutes.
- Also, for replication tests, make sure data exists on the right nodes at the right time. For example, if you a do a write and dont explicitly wait for it to replicate, it might not reach a secondary node before you try to do the next step of the test.
- Does your test require data to be stored persistently? Remember that we have test variants that run on in-memory/ephemeral storage engines
- There are timeouts in the test suites and we aim to make all tests in the same suite finish before timeout. That says we should always make the test run quickly to keep the test short in terms of duration.
- For example, if a certain phase of the test ran much slower or much faster, would it cause your
test to fail for the wrong reason?
- If your test includes hard-coded timeouts, make sure they are set appropriately. If a test is
waiting for a certain condition to be true, and the test should not proceed until that condition
is met, it is often correct to just wait “indefinitely”, instead of adding some arbitrary timeout
value, like 30 seconds. In practice this usually means setting some reasonable upper limit, for
example, 10 minutes.
- Also, for replication tests, make sure data exists on the right nodes at the right time. For
example, if you a do a write and dont explicitly wait for it to replicate, it might not reach a
secondary node before you try to do the next step of the test.
- Does your test require data to be stored persistently? Remember that we have test variants that
run on in-memory/ephemeral storage engines
- There are timeouts in the test suites and we aim to make all tests in the same suite finish before
timeout. That says we should always make the test run quickly to keep the test short in terms of
duration.
### Make tests fail as early as possible.
- If something goes wrong early in the test, its much harder to diagnose when that error becomes visible much later.
- Wrap every command in assert.commandWorked, or assert.commandFailedWithCode. There is also assert.commandFailed that won't check the return error code, but we should always try to use assert.commandFailedWithCode to make sure the test won't pass on an unexpected error.
- If something goes wrong early in the test, its much harder to diagnose when that error becomes
visible much later.
- Wrap every command in assert.commandWorked, or assert.commandFailedWithCode. There is also
assert.commandFailed that won't check the return error code, but we should always try to use
assert.commandFailedWithCode to make sure the test won't pass on an unexpected error.
### Be aware of all the configurations and variants that your test might run under.
- Make sure that your test still works correctly if is run in a different configuration or on a different platform than the one you might have tested on.
- Varying storage engines and suites can often affect a tests behavior. For example, maybe your test fails unexpectedly if it runs with authentication turned on with an in-memory storage engine. You dont have to run a new test on every possible platform before committing it, but you should be confident that your test doesnt break in an unexpected configuration.
- Make sure that your test still works correctly if is run in a different configuration or on a
different platform than the one you might have tested on.
- Varying storage engines and suites can often affect a tests behavior. For example, maybe your
test fails unexpectedly if it runs with authentication turned on with an in-memory storage engine.
You dont have to run a new test on every possible platform before committing it, but you should
be confident that your test doesnt break in an unexpected configuration.
### Avoid assertions that verify properties indirectly.
All assertions in a test should attempt to verify the most specific property possible. For example, if you are trying to test that a certain collection exists, it is better to assert that the collections exact name exists in the list of collections, as opposed to verifying that the collection count is equal to 1. The desired collections existence is sufficient for the collection count to be 1, but not necessary (a different collection could exist in its place). Be wary of adding these kind of indirect assertions in a test.
All assertions in a test should attempt to verify the most specific property possible. For example,
if you are trying to test that a certain collection exists, it is better to assert that the
collections exact name exists in the list of collections, as opposed to verifying that the
collection count is equal to 1. The desired collections existence is sufficient for the collection
count to be 1, but not necessary (a different collection could exist in its place). Be wary of
adding these kind of indirect assertions in a test.
### Test Isolation
Your JS test will likely be running with many other files before and after it. It's important to start from a known state, and to restore that state (to a reasonable extent) at the end of your test content.
Your JS test will likely be running with many other files before and after it. It's important to
start from a known state, and to restore that state (to a reasonable extent) at the end of your test
content.
- **Before**: If there are critical assumptions about the environment that your test needs, assert for it explicitly before proceeding to the real test content (instead of debugging side effects of that not being the case)
- If you have a precondition on the _environment_, use [`@tags`](./tags.md) instead of just an early-return. This will avoid the test being scheduled in the first place if the environment is not supported.
- **After**: If you are modifying the fixture, do everything possible to safely restore those changes at the end of your test content, even after a test failure. Resmokes' `--continueOnFailure` flag is used in CI, so the fixture is shared across many test files, and is only torn down at the end.
- Note, a fixture _can_ immediately "abort" after a test failure, only if [archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is configured, but that shouldn't be assumed because that is a per-suite configuration (and your test can run in many passthrough suite combinations).
- One easy approach to restoring your state is to use the [Mocha-style](#use-mocha-style-constructs) `after` hooks in your test content.
- **Before**: If there are critical assumptions about the environment that your test needs, assert
for it explicitly before proceeding to the real test content (instead of debugging side effects of
that not being the case)
- If you have a precondition on the _environment_, use [`@tags`](./tags.md) instead of just an
early-return. This will avoid the test being scheduled in the first place if the environment is
not supported.
- **After**: If you are modifying the fixture, do everything possible to safely restore those
changes at the end of your test content, even after a test failure. Resmokes'
`--continueOnFailure` flag is used in CI, so the fixture is shared across many test files, and is
only torn down at the end.
- Note, a fixture _can_ immediately "abort" after a test failure, only if
[archiving](../../../../buildscripts/resmokeconfig/suites/README.md#executorarchive) is
configured, but that shouldn't be assumed because that is a per-suite configuration (and your
test can run in many passthrough suite combinations).
- One easy approach to restoring your state is to use the
[Mocha-style](#use-mocha-style-constructs) `after` hooks in your test content.
## Modern JS: Modules in Practice
We have fully migrated to the modularized JavaScript world so any new test should use modules and adapt the new style.
We have fully migrated to the modularized JavaScript world so any new test should use modules and
adapt the new style.
### Only import/export what you need.
It's always important to keep the test context clean so we should only import/export what we need.
- The unused import is against [no-unused-vars](https://eslint.org/docs/latest/rules/no-unused-vars) rule in ESLint though we haven't enforced it.
- We don't have a linter to check export since it's hard to tell the necessity, but we should only export the modules that are imported by other tests or will be needed in the future.
- The unused import is against [no-unused-vars](https://eslint.org/docs/latest/rules/no-unused-vars)
rule in ESLint though we haven't enforced it.
- We don't have a linter to check export since it's hard to tell the necessity, but we should only
export the modules that are imported by other tests or will be needed in the future.
### Declare variables in proper scope.
In the past, we have seen tests referring some "undeclared" or "redeclared" variables, which are actually introduced through `load()`. Now with modules, the scope is more clear. We can use global variables properly to setup the test and don't need to worry about polluting other tests.
In the past, we have seen tests referring some "undeclared" or "redeclared" variables, which are
actually introduced through `load()`. Now with modules, the scope is more clear. We can use global
variables properly to setup the test and don't need to worry about polluting other tests.
### Name variables properly when exporting.
To avoid naming conflicts, we should not make the name of exported variables too general which could easily conflict with another variable from the test which import your module. For example, in the following case, the module exported a variable named `alphabet` and it will lead to a re-declaration error.
To avoid naming conflicts, we should not make the name of exported variables too general which could
easily conflict with another variable from the test which import your module. For example, in the
following case, the module exported a variable named `alphabet` and it will lead to a re-declaration
error.
```
import {alphabet} from "/matts/module.js";
@ -87,7 +163,9 @@ const alphabet = "xyz"; // ERROR
### Prefer let/const over var
`let/const` should be preferred over `var` since these can help detect double declaration at the first place. Like, in the naming conflict example, if the second line is using var, it could easily mess up without throwing an error.
`let/const` should be preferred over `var` since these can help detect double declaration at the
first place. Like, in the naming conflict example, if the second line is using var, it could easily
mess up without throwing an error.
### Export in ES6 style
@ -116,7 +194,8 @@ This can help the language server to discover the methods and provide code navig
### Use Mocha-style Constructs
The [mochalite.js](../jstests/libs/mochalite.js) library ports over a subset of [MochaJS](https://mochajs.org/) functionality for the shell, including:
The [mochalite.js](../jstests/libs/mochalite.js) library ports over a subset of
[MochaJS](https://mochajs.org/) functionality for the shell, including:
- `it` test contruction
- `describe` suite structures
@ -125,19 +204,13 @@ The [mochalite.js](../jstests/libs/mochalite.js) library ports over a subset of
- `before` and `after` hooks, to run _once_ around _all_ `it` tests
- `beforeEach` and `afterEach` hooks, to run around _each_ `it` test
- The above (excluding `describe` variants) also support `async` functions
- Resmoke test filtering using the `--mochagrep` flag, which mirrors the [`grep`](https://mochajs.org/#-grep-regexp-g-regexp) flag from MochaJS
- Resmoke test filtering using the `--mochagrep` flag, which mirrors the
[`grep`](https://mochajs.org/#-grep-regexp-g-regexp) flag from MochaJS
Example using several APIs:
```js
import {
after,
afterEach,
before,
beforeEach,
describe,
it,
} from "jstests/libs/mochalite.js";
import {after, afterEach, before, beforeEach, describe, it} from "jstests/libs/mochalite.js";
describe("simple inserts and finds", () => {
before(() => {
@ -157,9 +230,7 @@ describe("simple inserts and finds", () => {
assert.eq(this.fixtureDB.find({name: "test"}).count(), 1);
});
it("should error on invalid data", () => {
const e = assert.throws(() =>
this.fixtureDB.insert({notafield: undefined}),
);
const e = assert.throws(() => this.fixtureDB.insert({notafield: undefined}));
assert.eq(e.message, "Field 'notafield' not found");
});
});
@ -182,7 +253,9 @@ buildscripts/resmoke.py run --suites=no_passthrough --mochagrep "do something" j
## Test Tags
JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary. Not scheduling a test to run is much faster than the test doing an early-return when preconditions are not met.
JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary.
Not scheduling a test to run is much faster than the test doing an early-return when preconditions
are not met.
The simplest use case is having something like the following at the top of your js test file:

View File

@ -4,19 +4,31 @@ For a short introduction to property-based testing or fast-check, see [Appendix]
## Core PBT Design
The 'Core PBTs' are a subset of our property-based tests that use a shared schema and models. Their purpose is to provide basic coverage of our query language that may not be tested by the rest of our jstests. This means only simple stages such as $project, $match, $sort, etc are covered. More complicated stages such as $lookup or $facet are not tested. PBTs outside of the core set may test these more complex features.
The 'Core PBTs' are a subset of our property-based tests that use a shared schema and models. Their
purpose is to provide basic coverage of our query language that may not be tested by the rest of our
jstests. This means only simple stages such as $project, $match, $sort, etc are covered. More
complicated stages such as $lookup or $facet are not tested. PBTs outside of the core set may test
these more complex features.
These tests have been highly effective at finding bugs. As of writing they have caught 24 bugs in 8 months. See [SERVER-89308](https://jira.mongodb.org/browse/SERVER-89308) for a full list of issues.
These tests have been highly effective at finding bugs. As of writing they have caught 24 bugs in 8
months. See [SERVER-89308](https://jira.mongodb.org/browse/SERVER-89308) for a full list of issues.
The Core PBT design is built off of a few key principles about randomized testing:
### Properties Dictate the Models
In our fuzzer, we have grammar for most of MQL. While this provides more coverage, it means the property we assert is weaker. We can add as much as we'd like to the model, because the property comes second to the model. We're willing to add exceptions to the property to make it work.
In our fuzzer, we have grammar for most of MQL. While this provides more coverage, it means the
property we assert is weaker. We can add as much as we'd like to the model, because the property
comes second to the model. We're willing to add exceptions to the property to make it work.
However, the "model dictates the property" design also backfired, because in addition to exceptions in the property, we need to post-process the generated queries. Adding $sort to several places throughout an aggregation pipeline means we are no longer testing MQL, but rather an artificial subset of MQL that a user would never write.
However, the "model dictates the property" design also backfired, because in addition to exceptions
in the property, we need to post-process the generated queries. Adding $sort to several places
throughout an aggregation pipeline means we are no longer testing MQL, but rather an artificial
subset of MQL that a user would never write.
For this reason, the properties come first in our Core PBTs, and have few exceptions. They dictate what model we use so no postprocessing is needed. The PBT models are significantly smaller than the fuzzer models.
For this reason, the properties come first in our Core PBTs, and have few exceptions. They dictate
what model we use so no postprocessing is needed. The PBT models are significantly smaller than the
fuzzer models.
### Small Schema
@ -24,19 +36,32 @@ For this reason, the properties come first in our Core PBTs, and have few except
A small number of fields in our schema allows us to find interesting interactions more easily.
An example of an interaction could be query optimizations. Let's say an optimization on `[{$match: {*field*: 5}}, {$sort: {*field*: 1}}]` only kicks in when the two fields are the same. In a PBT where there are one thousand possible fields (`a`, `b`, `c`, but also `a.b.c`, `a.a.a` and all combinations), the probability of finding this optimization is `1/1000`. With six fields, it's increased to `1/6`.
An example of an interaction could be query optimizations. Let's say an optimization on
`[{$match: {*field*: 5}}, {$sort: {*field*: 1}}]` only kicks in when the two fields are the same. In
a PBT where there are one thousand possible fields (`a`, `b`, `c`, but also `a.b.c`, `a.a.a` and all
combinations), the probability of finding this optimization is `1/1000`. With six fields, it's
increased to `1/6`.
Another interaction is between queries and indexes. Queries and indexes generated from a small schema make the indexes more likely to be used.
Another interaction is between queries and indexes. Queries and indexes generated from a small
schema make the indexes more likely to be used.
Bugs tend to come from interactions and special cases. A query that has no optimizations applied and does not use an index requires much less complicated logic, which is correlated to less bugs.
Bugs tend to come from interactions and special cases. A query that has no optimizations applied and
does not use an index requires much less complicated logic, which is correlated to less bugs.
#### Simple Values to Avoid MQL Inconsistencies
Related to [Properties Dictate the Models](#properties-dictate-the-models), a simpler document model also allows for stronger properties.
Related to [Properties Dictate the Models](#properties-dictate-the-models), a simpler document model
also allows for stronger properties.
There are inconsistencies in our query language that are accepted behavior, but cause issues in property-based testing. We can work around them by being careful about the values we allow in documents.
There are inconsistencies in our query language that are accepted behavior, but cause issues in
property-based testing. We can work around them by being careful about the values we allow in
documents.
[SERVER-12869](https://jira.mongodb.org/browse/SERVER-12869) is an issue that stems from null and missing being encoded the same way in our index format. This means a covering plan (a plan with no `FETCH` node) cannot distinguish between null and missing. This inconsistency is the cause of lots of noise from our fuzzer, since one differing value in a query result can propogate. In our Core PBTs, we do not allow missing fields. This means:
[SERVER-12869](https://jira.mongodb.org/browse/SERVER-12869) is an issue that stems from null and
missing being encoded the same way in our index format. This means a covering plan (a plan with no
`FETCH` node) cannot distinguish between null and missing. This inconsistency is the cause of lots
of noise from our fuzzer, since one differing value in a query result can propogate. In our Core
PBTs, we do not allow missing fields. This means:
- Documents must have all fields in the schema
- We can only index fields in the schema
@ -44,7 +69,9 @@ There are inconsistencies in our query language that are accepted behavior, but
`null` is allowed.
Floating point values are another area the PBTs avoid. Results can differ depending on the order of floating point operations. These differences can propogate. For this reason the only number values allowed are integers.
Floating point values are another area the PBTs avoid. Results can differ depending on the order of
floating point operations. These differences can propogate. For this reason the only number values
allowed are integers.
## Modeling Workloads
@ -62,8 +89,9 @@ A workload consists of a collection model and an aggregation model, in the follo
}
```
Using one workload model instead of separate (and independent) collection models and agg models allows them to be interrelated.
For example, if we want to model a PBT to test partial indexes where every query should satisfy the partial index filter, we can write:
Using one workload model instead of separate (and independent) collection models and agg models
allows them to be interrelated. For example, if we want to model a PBT to test partial indexes where
every query should satisfy the partial index filter, we can write:
```
fc.record({
@ -78,7 +106,8 @@ fc.record({
});
```
and this is a valid workload model. If the collection and aggregation models are passed separately, they would be independent an unable to coordinate with shared arbitraries (like `partialFilter`).
and this is a valid workload model. If the collection and aggregation models are passed separately,
they would be independent an unable to coordinate with shared arbitraries (like `partialFilter`).
### Schema
@ -95,11 +124,13 @@ The Core PBT schema is:
}
```
For now, this is also a valid model for a document in a time-series collection (where `t` is the time field and `m` is the meta field), but the models may diverge.
For now, this is also a valid model for a document in a time-series collection (where `t` is the
time field and `m` is the meta field), but the models may diverge.
### Query Generation
These models cover a limited number of aggregation stages, located in `jstests/libs/property_test_helpers/models`. The supported stages are:
These models cover a limited number of aggregation stages, located in
`jstests/libs/property_test_helpers/models`. The supported stages are:
- $project
- $addFields
@ -112,7 +143,8 @@ These models cover a limited number of aggregation stages, located in `jstests/l
#### Query Families
Rather than generating single, standalone queries, our query model generates a "family" of queries.
At its leaves, a query family contains multiple values that the leaf could take on. For example instead of generating a single query with a concrete value `1` at the leaf:
At its leaves, a query family contains multiple values that the leaf could take on. For example
instead of generating a single query with a concrete value `1` at the leaf:
```
[{$match: {a: 1}}, {$project: {b: 0}}]
@ -133,7 +165,8 @@ Then we extract several queries that have the same shape.
```
This allows us to write properties that use the plan cache more often rather than relying on chance.
Properties can use the `getQuery` interface to ask for queries with different shapes, or the same shape with different leaf values plugged in.
Properties can use the `getQuery` interface to ask for queries with different shapes, or the same
shape with different leaf values plugged in.
## Core PBTs
@ -143,15 +176,15 @@ Details are provided at the top of each file.
## Debugging a PBT Failure
Currently, all PBTs have a fixed seed.
This means that as long as the bug it found is deterministic on the server's side, the PBT will consistently run into the issue.
If the bug is not deterministic, the PBT may or may not fail.
Currently, all PBTs have a fixed seed. This means that as long as the bug it found is deterministic
on the server's side, the PBT will consistently run into the issue. If the bug is not deterministic,
the PBT may or may not fail.
### Shrinking (Minimizing)
Once a counterexample (a failing case) to the property is found, fast-check tests will automatically attempt to shrink the issue.
Shrinking often does not reach the global minimum counterexample, since fast-check cannot make certain jumps.
For example it has no way of knowing that
Once a counterexample (a failing case) to the property is found, fast-check tests will automatically
attempt to shrink the issue. Shrinking often does not reach the global minimum counterexample, since
fast-check cannot make certain jumps. For example it has no way of knowing that
`{$and: [{a: {$eq: 1}}]}`
@ -163,30 +196,39 @@ or even
`{a: 1}`
This could be solved if fast-check had domain-specific knowledge about MQL or if it fuzzed counterexamples during shrinking.
However the counterexamples are usually small enough where there isn't much left to shrink.
This could be solved if fast-check had domain-specific knowledge about MQL or if it fuzzed
counterexamples during shrinking. However the counterexamples are usually small enough where there
isn't much left to shrink.
For non-deterministic issues, fast-check's shrinking is not as effective because it receives mixed signals from the property on whether the shrunk counterexamples fail or not.
For non-deterministic issues, fast-check's shrinking is not as effective because it receives mixed
signals from the property on whether the shrunk counterexamples fail or not.
### Failure Output
After a failure is minimized, the counterexample is printed out.
This includes debug data such as the counterexample that fast-check found and the error it ran into.
The counterexample will be a workload (see [Modeling Workloads](#modeling-workloads)), containing all information about the collection and queries run against it.
After a failure is minimized, the counterexample is printed out. This includes debug data such as
the counterexample that fast-check found and the error it ran into. The counterexample will be a
workload (see [Modeling Workloads](#modeling-workloads)), containing all information about the
collection and queries run against it.
To reproduce the issue, the workload can be copied and pasted into the failing property-based test, specifically by passing it in as the `examples` argument to `testProperty`.
fast-check will take these hand-written examples and run them before trying randomized examples.
See `partial_index_pbt.js` (which references `pbt_resolved_bugs.js`) for an example of this.
`partial_index_pbt.js` uses the `examples` argument to ensure workloads that previously would fail are run.
It can be used in the same way to repro existing bugs from BFs.
To reproduce the issue, the workload can be copied and pasted into the failing property-based test,
specifically by passing it in as the `examples` argument to `testProperty`. fast-check will take
these hand-written examples and run them before trying randomized examples. See
`partial_index_pbt.js` (which references `pbt_resolved_bugs.js`) for an example of this.
`partial_index_pbt.js` uses the `examples` argument to ensure workloads that previously would fail
are run. It can be used in the same way to repro existing bugs from BFs.
# Appendix
## Property-Based Testing (PBT)
Property-based testing is a testing method that asserts properties hold over many example inputs. In our use of PBT, it involves two components, a "model" and a "property function". The model is a description of the object we are testing. It is used to generate examples of what the object looks like. These examples are routed into the property function, which asserts that the object has the characteristics we expect them to have.
Property-based testing is a testing method that asserts properties hold over many example inputs. In
our use of PBT, it involves two components, a "model" and a "property function". The model is a
description of the object we are testing. It is used to generate examples of what the object looks
like. These examples are routed into the property function, which asserts that the object has the
characteristics we expect them to have.
Let's say we wrote a new integer addition function `add` that we'd like to test. We could calculate the correct answer to different addition problems, and assert that `add` behaves correctly.
Let's say we wrote a new integer addition function `add` that we'd like to test. We could calculate
the correct answer to different addition problems, and assert that `add` behaves correctly.
```
assert.eq(add(1, 2), 3);
@ -194,7 +236,9 @@ assert.eq(add(-1, 1), 0);
...
```
In addition to tests written with concrete values, we could also write a PBT to test for characteristics we expect `add` to have. Addition is commutative for example, meaning `add(a, b)` should always equal `add(b, a)`. We can write a function for this:
In addition to tests written with concrete values, we could also write a PBT to test for
characteristics we expect `add` to have. Addition is commutative for example, meaning `add(a, b)`
should always equal `add(b, a)`. We can write a function for this:
```
function testAdd(a, b){
@ -202,12 +246,20 @@ function testAdd(a, b){
}
```
The input to `testAdd` could use the builtin Javascript `Random` package, or a PBT library such as fast-check.
The input to `testAdd` could use the builtin Javascript `Random` package, or a PBT library such as
fast-check.
The way the query team uses PBT tends to be more complex, and almost always involves modeling a subset of our query language, documents, and indexes. Our fuzzer is a form of property-based testing, since we generate random queries and assert correctness against different controls (an older mongo version, a collection without indexes, etc)
The way the query team uses PBT tends to be more complex, and almost always involves modeling a
subset of our query language, documents, and indexes. Our fuzzer is a form of property-based
testing, since we generate random queries and assert correctness against different controls (an
older mongo version, a collection without indexes, etc)
## fast-check
fast-check (located in jstests/third_party/fast_check/fc-3.1.0.js) is a property-based testing framework for javascript/typescript. It provides building-block components to use for larger models, and has functionality to test properties against these models. It also has built-in logic for shrinking (minimizing) counterexamples to properties.
fast-check (located in jstests/third_party/fast_check/fc-3.1.0.js) is a property-based testing
framework for javascript/typescript. It provides building-block components to use for larger models,
and has functionality to test properties against these models. It also has built-in logic for
shrinking (minimizing) counterexamples to properties.
For an example of how to use fast-check to write a property-based test, see [project_coalescing.js](../../aggregation/sources/project/project_coalescing.js)
For an example of how to use fast-check to write a property-based test, see
[project_coalescing.js](../../aggregation/sources/project/project_coalescing.js)

View File

@ -4,5 +4,7 @@ These tests test upgrade/downgrade behavior expected between different versions
Those that begin failing upon branching should be assessed by the owner teams:
- Is the test only applicable to specific versions during specific development cycles? If so, delete it from irrelevant branches and master.
- Does the test add value for "last" (dynamic) version features? If so, modify the test to be more robust. These should always pass regardless of MongoDB version.
- Is the test only applicable to specific versions during specific development cycles? If so, delete
it from irrelevant branches and master.
- Does the test add value for "last" (dynamic) version features? If so, modify the test to be more
robust. These should always pass regardless of MongoDB version.

View File

@ -1,3 +1,4 @@
# FCV / setFCV core infrastructure
This folder contains tests the core FCV and setFCV upgrade/downgrade infrastructure. It does not contain tests linked to any other particular feature.
This folder contains tests the core FCV and setFCV upgrade/downgrade infrastructure. It does not
contain tests linked to any other particular feature.

View File

@ -1,6 +1,8 @@
# Introduction
The plan_stability tests record the current winning plan for a set of ~ 1K queries produced by SPM-3816. If those plans ever change, the test is expected to fail at which point a human would decide if the changed plans are for the better or for the worse.
The plan_stability tests record the current winning plan for a set of ~ 1K queries produced by
SPM-3816. If those plans ever change, the test is expected to fail at which point a human would
decide if the changed plans are for the better or for the worse.
# Running
@ -13,7 +15,8 @@ $ buildscripts/resmoke.py run \
jstests/query_golden/plan_stability.js
```
There are several resmoke suites predefined for different plan ranking modes, for which it is not needed to add mongod parameters:
There are several resmoke suites predefined for different plan ranking modes, for which it is not
needed to add mongod parameters:
```bash
query_golden_cbr_automatic
@ -42,7 +45,9 @@ To obtain a diff that contains an individual diff fragment for each changed plan
2. Edit the `~/.golden_test_config.yml` to use a customized diff command:
```yml
diffCmd: 'git -c diff.plan_stability.xfuncname=">>>pipeline" diff --unified=0 --function-context --no-index "{{expected}}" "{{actual}}"'
diffCmd:
'git -c diff.plan_stability.xfuncname=">>>pipeline" diff --unified=0 --function-context --no-index
"{{expected}}" "{{actual}}"'
```
3. You can now run `buildscripts/golden_test.py diff` as usual and the output will look like this:
@ -68,15 +73,20 @@ This provides the plan that changed, the pipeline it belonged to, and the execut
## Using the summarization scripts
The `feature-extractor` internal repository contains a summarization script that can be used to obtain a summary of the failed test as well as information on the individual regressions that should be looked into. Please see `scripts/cbr/README.md` in that repository for more information.
The `feature-extractor` internal repository contains a summarization script that can be used to
obtain a summary of the failed test as well as information on the individual regressions that should
be looked into. Please see `scripts/cbr/README.md` in that repository for more information.
# Debugging failures
## Which pipeline is the problematic one?
In Evergreen, the diff will most likely show a pipeline **below** the counters. This is however the following pipeline in the test, not the one you are looking for. The problematic pipeline is the one that comes **before** it in the `expected_output` file.
In Evergreen, the diff will most likely show a pipeline **below** the counters. This is however the
following pipeline in the test, not the one you are looking for. The problematic pipeline is the one
that comes **before** it in the `expected_output` file.
In local execution, if your environment is configured as described above, the diff will show the actual pipeline of interest, **above** the counters.
In local execution, if your environment is configured as described above, the diff will show the
actual pipeline of interest, **above** the counters.
## Running the offending pipelines manually
@ -98,7 +108,8 @@ and wait until the script has advanced to the following log line:
[js_test:plan_stability] [jsTest] ----
```
2. Connect to `mongodb://127.0.0.1:20000` and run the offending pipeline against the `db.plan_stability` collection.
2. Connect to `mongodb://127.0.0.1:20000` and run the offending pipeline against the
`db.plan_stability` collection.
```bash
mongosh mongodb://127.0.0.1:20000
@ -113,7 +124,10 @@ db.plan_stability.aggregate(pipeline).explain().queryPlanner.rejectedPlans.sort(
## Converting the pipeline to JavaScript
The pipelines in the diff are **EJSON**-ish, while the mongosh shell expects **JavaScript**. EJSON-ish and JavaScript are identical when it comes to basic types, such as strings and integers, but if the pipeline contains timestamps and decimals, the JSON needs to be converted to JavaScript using `EJSON.parse()`:
The pipelines in the diff are **EJSON**-ish, while the mongosh shell expects **JavaScript**.
EJSON-ish and JavaScript are identical when it comes to basic types, such as strings and integers,
but if the pipeline contains timestamps and decimals, the JSON needs to be converted to JavaScript
using `EJSON.parse()`:
```js
> pipelineStr = '[{"$match":{"field20_Timestamp_idx":{"$gt":{"$timestamp":{"t":1760551205,"i":0}}}},"field12_Decimal128_idx":{"$lte":{"$numberDecimal":"35.1"}}}]';
@ -130,23 +144,26 @@ The pipelines in the diff are **EJSON**-ish, while the mongosh shell expects **J
db.plan_stability2.aggregate(pipeline);
```
Note that **ISO Timestamps** need to be handled separately. JSON will store those as strings, resulting in loss of typing information that `EJSON.parse()` can not recover. This will result in a semantic change in the query unless manually converted to an `ISODate` object:
Note that **ISO Timestamps** need to be handled separately. JSON will store those as strings,
resulting in loss of typing information that `EJSON.parse()` can not recover. This will result in a
semantic change in the query unless manually converted to an `ISODate` object:
```js
// Manually convert
// [{"$match":{"field19_datetime_idx":{"$gte":"2024-01-27T00:00:00.000Z"}}}]
// to the correct JavaScript
pipeline = [
{$match: {field19_datetime_idx: {$gte: ISODate("2024-01-27T00:00:00.000Z")}}},
];
pipeline = [{$match: {field19_datetime_idx: {$gte: ISODate("2024-01-27T00:00:00.000Z")}}}];
```
## Is the new plan better or worse?
For the majority of the plans, it will be obvious if the new plan is better or worse because all the execution counters would have moved in the same direction without any ambiguity.
For the majority of the plans, it will be obvious if the new plan is better or worse because all the
execution counters would have moved in the same direction without any ambiguity.
Some plans, such as those involving `$sort` or `$limit` will sometimes change in a way that makes some counters better while others become worse. For those queries, consider running them manually multiple times to compare their wallclock execution times:
Some plans, such as those involving `$sort` or `$limit` will sometimes change in a way that makes
some counters better while others become worse. For those queries, consider running them manually
multiple times to compare their wallclock execution times:
```javascript
pipeline = [...];
@ -162,11 +179,15 @@ You can also modify `collSize` in `plan_stability.js` to temporarily use a large
If you want to run a comparison between estimation methods `X` and `Y`:
1. If method `X` is not multi-planning, place the `jstests/query_golden/expected_files/X` for estimation method `X` in the root of `expected_files`, so that they are used as the base for the comparison;
1. If method `X` is not multi-planning, place the `jstests/query_golden/expected_files/X` for
estimation method `X` in the root of `expected_files`, so that they are used as the base for the
comparison;
2. Temporary remove the expected files for method `Y` from `expected_files/query_golden/expected_files/Y` so that they are not considered;
2. Temporary remove the expected files for method `Y` from
`expected_files/query_golden/expected_files/Y` so that they are not considered;
3. Run the test as described above, specifying `featureFlagCostBasedRanker`/`internalQueryCBRCEMethod`;
3. Run the test as described above, specifying
`featureFlagCostBasedRanker`/`internalQueryCBRCEMethod`;
4. Use the summarization script as described above to produce a report.
@ -179,5 +200,5 @@ To accept the new plans, use `buildscripts/golden_test.py accept`, as with any o
## Removing individual pipelines
If a given pipeline proves flaky, that is, is flipping between one plan and another for no reason,
you can comment it out from the test with a note. Re-run the test and then run `buildscripts/golden_test.py accept`
to persist the change.
you can comment it out from the test with a note. Re-run the test and then run
`buildscripts/golden_test.py accept` to persist the change.

View File

@ -1,21 +1,26 @@
# Introduction
The plan stability tests for join optimization are golden tests that execute a number of joins against the TPC-H dataset.
The plan stability tests for join optimization are golden tests that execute a number of joins
against the TPC-H dataset.
For each pipeline we persist the following in the golden test output:
- the MQL command, including the base table and the pipeline
- a concise representation of the winning plan for the query
- execution counters that quantify the effort it took to execute the query in terms of docs and keys examined
- execution counters that quantify the effort it took to execute the query in terms of docs and keys
examined
- data about the resultset, such as the number of rows returned
## Prerequisites
This test requires the following:
- The `mongorestore` tool, accessible on the $PATH. This tool is part of the [MongoDB Database Tools](https://www.mongodb.com/try/download/database-tools) package.
- The `mongorestore` tool, accessible on the $PATH. This tool is part of the
[MongoDB Database Tools](https://www.mongodb.com/try/download/database-tools) package.
- The TPC-H dataset, located in a directory named `tpc-h` that is on the same level as the mongodb repository. The dataset is available from the `query-benchmark-data` S3 bucket. You can retrieve it as follows:
- The TPC-H dataset, located in a directory named `tpc-h` that is on the same level as the mongodb
repository. The dataset is available from the `query-benchmark-data` S3 bucket. You can retrieve
it as follows:
```bash
mkdir ~/tpc-h
@ -26,7 +31,8 @@ aws sso login
aws s3 cp s3://query-benchmark-data/tpc-h/tpch-0.1-normalized.archive.gz tpc-h/tpch-0.1-normalized.archive.gz --region us-east-1
```
In evergreen, tasks such as `query_golden_join_optimization_plan_stability` make sure the prerequisites are already in place.
In evergreen, tasks such as `query_golden_join_optimization_plan_stability` make sure the
prerequisites are already in place.
- The golden test framework configured with a custom diff rule
@ -77,13 +83,16 @@ The report contains the following information:
- the most-improved queries, useful for obtaining examples for presentation purposes;
- all individual failures, categorized and pretty-printed.
The report has one section per jstest -- if you are running multiple tests, each one will be processed and reported separately.
The report has one section per jstest -- if you are running multiple tests, each one will be
processed and reported separately.
## Debugging
> [!WARNING] > **_WARNING:_** The queries dumped by this test, the diff tooling or the summary report may contain EJSON constructs, such as $numberDecimal
> that are not properly processed by `coll.aggregate()` unless converted using `EJSON.parse()`. Typing information around ISO dates may have also been lost, so manually recreate those as `ISODate(...)`.
> See the "A note on the queries" section below for more information.
> [!WARNING] > **_WARNING:_** The queries dumped by this test, the diff tooling or the summary
> report may contain EJSON constructs, such as $numberDecimal that are not properly processed by
> `coll.aggregate()` unless converted using `EJSON.parse()`. Typing information around ISO dates may
> have also been lost, so manually recreate those as `ISODate(...)`. See the "A note on the queries"
> section below for more information.
### Determining the offending query
@ -91,7 +100,9 @@ Each query has an `idx` key that can be used to track it across files and report
### Starting a populated MongoDB instance
To obtain a running, populated MongoDB instance, run `resmoke.py run` with the `--pauseAfterPopulate` option. This will start mongod, load the data and then pause resmoke at the following line:
To obtain a running, populated MongoDB instance, run `resmoke.py run` with the
`--pauseAfterPopulate` option. This will start mongod, load the data and then pause resmoke at the
following line:
```
[js_test:plan_stability_join_opt_tpch] [jsTest] TestData.pauseAfterPopulate is set. Pausing indefinitely ...
@ -124,15 +135,18 @@ The collections will be restored to the `tpch` database.
## A note on the queries
The queries you see in files, diffs, bug reports may be in various formats, depending on whether they were dumped using JavaScript, python, or some other method.
The queries you see in files, diffs, bug reports may be in various formats, depending on whether
they were dumped using JavaScript, python, or some other method.
Therefore, it is important to obtain the query plan of the query and make sure that what you are seeing locally matches the plan from the bug report.
Therefore, it is important to obtain the query plan of the query and make sure that what you are
seeing locally matches the plan from the bug report.
The following caveats are currently known:
### Typing information for timestamps
Typing information for timestamps is frequently lost, so a query may contain ISO timestamps as strings:
Typing information for timestamps is frequently lost, so a query may contain ISO timestamps as
strings:
```json
{"l_commitdate": {"$lt": "1993-03-17T00:00:00"}}
@ -146,7 +160,8 @@ You will need to manually convert this into a timestamp:
{'l_commitdate': {'$lt': new ISODate('1993-03-17T00:00:00')}}
```
Since the typing information has been lost somewhere along the pipeline, no existing library is available to restore it for you.
Since the typing information has been lost somewhere along the pipeline, no existing library is
available to restore it for you.
### EJSON output
@ -158,6 +173,8 @@ Sometimes the query will be provided in EJSON, so you will see:
in the output.
In mongosh, `aggregate()` does not support EJSON directly, so passing EJSON to it will succeed but will not produce the expected results.
In mongosh, `aggregate()` does not support EJSON directly, so passing EJSON to it will succeed but
will not produce the expected results.
Either pass this output as `EJSON.parse()` (if your input is a string), `EJSON.deserialize()` (if your input is parsed already) or manually convert it to standard MQL.
Either pass this output as `EJSON.parse()` (if your input is a string), `EJSON.deserialize()` (if
your input is parsed already) or manually convert it to standard MQL.

View File

@ -2,15 +2,18 @@
Bazel test targets for resmoke suites.
For documentation of the `resmoke_suite_test` rule, see [bazel/resmoke/README.md](bazel/resmoke/README.md).
For documentation of the `resmoke_suite_test` rule, see
[bazel/resmoke/README.md](bazel/resmoke/README.md).
## Configuring
In addition to attributes for `resmoke_suite_test`, the following are options for configuring test targets.
In addition to attributes for `resmoke_suite_test`, the following are options for configuring test
targets.
### tags
Arbitrary tags may also be added to group test targets for batch execution. For example, a custom tag lets you run all matching suites at once:
Arbitrary tags may also be added to group test targets for batch execution. For example, a custom
tag lets you run all matching suites at once:
```
bazel test //jstests/suites/... --test_tag_filters=my_tag
@ -26,7 +29,8 @@ The following tags have special meaning:
### target_compatible_with
Configure platforms/build options that the test is compatible with. Use this to exclude the test suite from platforms in CI.
Configure platforms/build options that the test is compatible with. Use this to exclude the test
suite from platforms in CI.
Example — exclude the test on PPC/S390x, MacOS, and TSAN builds:

View File

@ -1,6 +1,8 @@
# JS Test Tags
JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary. Not scheduling a test to run is much faster than the test doing an early-return when preconditions are not met.
JS Test files can leverage "tags" that suites can key off of to include and/or exclude as necessary.
Not scheduling a test to run is much faster than the test doing an early-return when preconditions
are not met.
The simplest use case is having something like the following at the top of your js test file:
@ -38,7 +40,10 @@ and can also include (meta) comments:
*/
```
The tags are meant to be used in suite configurations, to [`include_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorinclude_with_any_tags) and/or [`exclude_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorexclude_with_any_tags):
The tags are meant to be used in suite configurations, to
[`include_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorinclude_with_any_tags)
and/or
[`exclude_with_any_tags`](../buildscripts/resmokeconfig/suites/README.md#selectorexclude_with_any_tags):
```bash
test_kind: js_test
@ -50,7 +55,8 @@ selector:
- disabled_for_fcv_6_1_upgrade
```
Build variants can also use tags via the `test_flags` expansion, which facilitates tag-exclusions _across suites_ that run with the variant:
Build variants can also use tags via the `test_flags` expansion, which facilitates tag-exclusions
_across suites_ that run with the variant:
```
expansions:
@ -60,6 +66,9 @@ Build variants can also use tags via the `test_flags` expansion, which facilitat
## Available Tags
There is no current exhaustive list, since tags are arbitrary labels and do not need to be "registered". However, tags are always "global", and many are reused. Names should have communicate clear intent; and be reused/consolidated when appropriate.
There is no current exhaustive list, since tags are arbitrary labels and do not need to be
"registered". However, tags are always "global", and many are reused. Names should have communicate
clear intent; and be reused/consolidated when appropriate.
> Use `buildscripts/resmoke.py list-tags` to find which tags are actively referenced by suite configs, although there may be more in JS files and Build Variant expansions.
> Use `buildscripts/resmoke.py list-tags` to find which tags are actively referenced by suite
> configs, although there may be more in JS files and Build Variant expansions.

Some files were not shown because too many files have changed in this diff Show More